R: a better lexer to CudaText

All questions regarding lexer highlighting schemes are discussed here...
jcfaria.uesc
Posts: 27
Joined: 07.05.2024 18:58

R: a better lexer to CudaText

Post by jcfaria.uesc »

Good morning.

R's current lexer for CudaText may be much better than what is currently available. I'm trying to do it, but I need initial help from more experienced CudaText developers.

The basic problem is that a large number of functions and other objects in R have symbols ('.' and '_') in the middle of the character string. for example:

Code: Select all

- all.equal.character
- anyNA.numeric_version
- as.data.frame.numeric_version
- aspell_package_vignettes
It also needs to be case dependent, as the two objects below are different things:

Code: Select all

- LETTERS
- letters
Objects can have letters and numbers:

Code: Select all

- state.x77
- CO2
- co2
Anyway, R is a very verbose language!

I know regular expressions, but I'm not aware of the flavor that is used in the construction of current lexers. I think it's Python, but it doesn't match up well with the Python tests from REGEX101 (https://regex101.com/) that I use to test and document the regular expressions I make.

I am attaching a file (Tinn-R_recognized_words.R) from the Tinn-R editor (https://sourceforge.net/projects/tinn-r/) that contains a sample of how I intend to build the new R lexer for CudaText, if possible.

Please, can someone help me until I understand the process of creating a lexer in Cuda?

I've read all the documentation, I'm really missing the practice...
Attachments
Tinn-R_recognized_words.txt
Rename it from *.txt to *.R and open inside of Tinn-R editor
(31.38 KiB) Downloaded 139 times
main Alexey
Posts: 2236
Joined: 25.08.2021 18:15

Re: R: a better lexer to CudaText

Post by main Alexey »

>The basic problem is that a large number of functions and other objects in R have symbols ('.' and '_') in the middle of the character string.
So regex for id can be this:
[a-z_][\w\.]*

if id must end with a wordchar, then this:
[a-z_]([\w\.]*[a-z_])?\b
>also needs to be case dependent
Regex in lexer-parser is by default case-ignoring, but matches of IDs in keyword list is case-sensitive. when you fill the keyword lists (in Lexer Properties dialog of Synwrite) IDs will be case sensitive unless you toggle the checkbox.
>I'm not aware of the flavor that is used in the construction of current lexers
EControl flavor!
https://wiki.freepascal.org/CudaText#Re ... xpressions
jcfaria.uesc
Posts: 27
Joined: 07.05.2024 18:58

Re: R: a better lexer to CudaText

Post by jcfaria.uesc »

Alexey,

Thank you for your attention, but I still haven't managed to get it to work as expected!

I am attaching my proposed "R.lcf" file.

I believe that without the help of people more experienced in CudaText lexers I will not be able to progress to where I want to go.

If the basics I've done so far work well (for this you need to see the test file "Tinn-R_recognized_words.R" and all the identifiers in CudaText) I'll be able to continue the work.
Attachments
R.lcf.txt
Rename file to R.lcf
(66.15 KiB) Downloaded 149 times
main Alexey
Posts: 2236
Joined: 25.08.2021 18:15

Re: R: a better lexer to CudaText

Post by main Alexey »

got your file.
I removed all trailing comments like {Note: KeywFunc} because lexer don't support them.
now lexer works better. most of std-funcs and all keywords (from Tinn-R sample file) are hilited OK.

tell me what else do you want to improve?
maybe you want to add 2 sets of keywords 'Plotting' + 'Datasets'?
I may change the lexer.
main Alexey
Posts: 2236
Joined: 25.08.2021 18:15

Re: R: a better lexer to CudaText

Post by main Alexey »

also tell me how this line must be hilited? ->

Code: Select all

he's isn't we've they'd ('all not string')  # Shorttned forms
not like 3 strings? why not?
jcfaria.uesc
Posts: 27
Joined: 07.05.2024 18:58

Re: R: a better lexer to CudaText

Post by jcfaria.uesc »

Thank you Alexey,

Yes, I want to add these two elements (Plotting and Datasets):
  • Plotting: which are almost all graphical parameters in the R package "graphics"
  • Datasets: are all objects in the R package "datasets"

R uses the following forms as string:
  • "this is a string"
  • 'This is also a string'
So, when the R lexer is used in the form of a container (2 or more lexers acting together) with others (LaTeX + R, Markdown + R, HTML+R) to write in the NOWEB paradigm, these contracted forms must receive special treatment. Do You understand?

Otherwise, everything between two

Code: Select all

'
will be considered strings, when in fact they are not, they are contracted forms of the English language.

For example:
he's isn't we've they'd ('all not string') # Shorttned forms
would be written as strings (and in fact they are not)::
  • 's isn'
  • 've they'
  • 'all not string'
Did you understand?
main Alexey
Posts: 2236
Joined: 25.08.2021 18:15

Re: R: a better lexer to CudaText

Post by main Alexey »

no, I did not understand your text about quotes/strings. I guess that lexer must not see 'string' when quote-char follows word-char: e.g.
they're here's
I corrected regex for strings, you can see the trick which I used.

I added 2 word lists for 'datasets' and 'graphics'.
pls test.
lexer attached.
Attachments
lexer.R.zip
(13.74 KiB) Downloaded 144 times
jcfaria.uesc
Posts: 27
Joined: 07.05.2024 18:58

Re: R: a better lexer to CudaText

Post by jcfaria.uesc »

Good night

CudaText Lexers have a very particular way of being created!

I spent today trying to understand what you did, when I try to change it doesn't work well. So, I think I still haven't managed to understand the logic of how Lexers are made.

Well, that being the case, I will need to count on your patience and good will, until I can move forward on my own, further.

We're very close to getting to where I want to be with the R Lexer. Right now, I need you to make the following changes to what you've already done:

Code: Select all

    item
      DisplayName = 'Id funcs - graphics' ---> 'Id - plotting' 
      StyleName = 'Std func' ---> 'Plotting'
      BlockType = btTagDetect
      ConditionList = <

Code: Select all

    item
      DisplayName = 'Id funcs - datasets' ---> 'Id - datasets'
      StyleName = 'Std func' ---> 'Datasets'
      BlockType = btTagDetect
      ConditionList = <
In the attached figures (Plotting.png and Datasets.png), the words marked by the red box must have their own style that can be configured by the user, different from the "Std func" style.

Best,
Attachments
Datasets.png
Plotting.png
jcfaria.uesc
Posts: 27
Joined: 07.05.2024 18:58

Re: R: a better lexer to CudaText

Post by jcfaria.uesc »

Take a look at the Tinn-R syntax...
Attachments
R highlighter_02.png
R highlighter_01.png
Tinn-R sample.png
main Alexey
Posts: 2236
Joined: 25.08.2021 18:15

Re: R: a better lexer to CudaText

Post by main Alexey »

No problem, it is easy. here are new lexer styles for 3 word lists. theme styles are:
Id1
Id2
Id4

attached.

comments in R are shown with 2 styles inTinn-R. do you want the same? which comments are special? I may apply theme style 'comments doc'.
Attachments
lexer.R.zip
(13.75 KiB) Downloaded 148 times
Post Reply