R: a better lexer to CudaText

jcfaria.uesc · Post by **jcfaria.uesc** » 07.05.2024 19:36

Good morning.

R's current lexer for CudaText may be much better than what is currently available. I'm trying to do it, but I need initial help from more experienced CudaText developers.

The basic problem is that a large number of functions and other objects in R have symbols ('.' and '_') in the middle of the character string. for example:

Code: Select all

- all.equal.character
- anyNA.numeric_version
- as.data.frame.numeric_version
- aspell_package_vignettes

It also needs to be case dependent, as the two objects below are different things:

Code: Select all

- LETTERS
- letters

Objects can have letters and numbers:

Code: Select all

- state.x77
- CO2
- co2

Anyway, R is a very verbose language!

I know regular expressions, but I'm not aware of the flavor that is used in the construction of current lexers. I think it's Python, but it doesn't match up well with the Python tests from REGEX101 (https://regex101.com/) that I use to test and document the regular expressions I make.

I am attaching a file (Tinn-R_recognized_words.R) from the Tinn-R editor (https://sourceforge.net/projects/tinn-r/) that contains a sample of how I intend to build the new R lexer for CudaText, if possible.

Please, can someone help me until I understand the process of creating a lexer in Cuda?

I've read all the documentation, I'm really missing the practice...

main Alexey · Post by **main Alexey** » 07.05.2024 19:49

>The basic problem is that a large number of functions and other objects in R have symbols ('.' and '_') in the middle of the character string.

So regex for id can be this:
[a-z_][\w\.]*

if id must end with a wordchar, then this:
[a-z_]([\w\.]*[a-z_])?\b

>also needs to be case dependent

Regex in lexer-parser is by default case-ignoring, but matches of IDs in keyword list is case-sensitive. when you fill the keyword lists (in Lexer Properties dialog of Synwrite) IDs will be case sensitive unless you toggle the checkbox.

>I'm not aware of the flavor that is used in the construction of current lexers

EControl flavor!
https://wiki.freepascal.org/CudaText#Re ... xpressions

jcfaria.uesc · Post by **jcfaria.uesc** » 08.05.2024 02:05

Alexey,

Thank you for your attention, but I still haven't managed to get it to work as expected!

I am attaching my proposed "R.lcf" file.

I believe that without the help of people more experienced in CudaText lexers I will not be able to progress to where I want to go.

If the basics I've done so far work well (for this you need to see the test file "Tinn-R_recognized_words.R" and all the identifiers in CudaText) I'll be able to continue the work.

main Alexey · Post by **main Alexey** » 08.05.2024 05:19

got your file.
I removed all trailing comments like {Note: KeywFunc} because lexer don't support them.
now lexer works better. most of std-funcs and all keywords (from Tinn-R sample file) are hilited OK.

tell me what else do you want to improve?
maybe you want to add 2 sets of keywords 'Plotting' + 'Datasets'?
I may change the lexer.

main Alexey · Post by **main Alexey** » 08.05.2024 05:23

also tell me how this line must be hilited? ->

Code: Select all

he's isn't we've they'd ('all not string')  # Shorttned forms

not like 3 strings? why not?

jcfaria.uesc · Post by **jcfaria.uesc** » 08.05.2024 13:34

Thank you Alexey,

Yes, I want to add these two elements (Plotting and Datasets):

Plotting: which are almost all graphical parameters in the R package "graphics"

Datasets: are all objects in the R package "datasets"

R uses the following forms as string:

"this is a string"

'This is also a string'

So, when the R lexer is used in the form of a container (2 or more lexers acting together) with others (LaTeX + R, Markdown + R, HTML+R) to write in the NOWEB paradigm, these contracted forms must receive special treatment. Do You understand?

Otherwise, everything between two

Code: Select all

will be considered strings, when in fact they are not, they are contracted forms of the English language.

For example:

he's isn't we've they'd ('all not string') # Shorttned forms

would be written as strings (and in fact they are not)::

's isn'

've they'

'all not string'

Did you understand?

main Alexey · Post by **main Alexey** » 08.05.2024 15:00

no, I did not understand your text about quotes/strings. I guess that lexer must not see 'string' when quote-char follows word-char: e.g.

they're here's

I corrected regex for strings, you can see the trick which I used.

I added 2 word lists for 'datasets' and 'graphics'.
pls test.
lexer attached.

jcfaria.uesc · Post by **jcfaria.uesc** » 09.05.2024 03:37

Good night

CudaText Lexers have a very particular way of being created!

I spent today trying to understand what you did, when I try to change it doesn't work well. So, I think I still haven't managed to understand the logic of how Lexers are made.

Well, that being the case, I will need to count on your patience and good will, until I can move forward on my own, further.

We're very close to getting to where I want to be with the R Lexer. Right now, I need you to make the following changes to what you've already done:

Code: Select all

    item
      DisplayName = 'Id funcs - graphics' ---> 'Id - plotting' 
      StyleName = 'Std func' ---> 'Plotting'
      BlockType = btTagDetect
      ConditionList = <

Code: Select all

    item
      DisplayName = 'Id funcs - datasets' ---> 'Id - datasets'
      StyleName = 'Std func' ---> 'Datasets'
      BlockType = btTagDetect
      ConditionList = <

In the attached figures (Plotting.png and Datasets.png), the words marked by the red box must have their own style that can be configured by the user, different from the "Std func" style.

Best,

jcfaria.uesc · Post by **jcfaria.uesc** » 09.05.2024 03:46

Take a look at the Tinn-R syntax...

main Alexey · Post by **main Alexey** » 09.05.2024 05:50

No problem, it is easy. here are new lexer styles for 3 word lists. theme styles are:
Id1
Id2
Id4

attached.

comments in R are shown with 2 styles inTinn-R. do you want the same? which comments are special? I may apply theme style 'comments doc'.

UVviewsoft forums

R: a better lexer to CudaText

R: a better lexer to CudaText

Re: R: a better lexer to CudaText

Re: R: a better lexer to CudaText

Re: R: a better lexer to CudaText

Re: R: a better lexer to CudaText

Re: R: a better lexer to CudaText

Re: R: a better lexer to CudaText

Re: R: a better lexer to CudaText

Re: R: a better lexer to CudaText

Re: R: a better lexer to CudaText