Go to char number?

Post Reply
hexaae
Posts: 69
Joined: 25.02.2019 20:47

Go to char number?

Post by hexaae »

How to go to char number from the beginning of the text (not the usual line number)?
main Alexey
Posts: 2245
Joined: 25.08.2021 18:15

Re: Go to char number?

Post by main Alexey »

Docs:
https://wiki.freepascal.org/CudaText#.2 ... 22_dialogs
d100 (decimal with leading "d"): Jump to absolute decimal offset.
xFF00 (hex number with leading "x"): Jump to absolute hex offset.
hexaae
Posts: 69
Joined: 25.02.2019 20:47

Re: Go to char number?

Post by hexaae »

Not that simple: is a complex doc with mixed 3bytes (UTF8), 1byte (ASCII) etc. chars:
https://1drv.ms/t/s!ApMUGr0cuN39g8hxieZ ... Q?e=jXbozZ (file from my onedrive)
that's why I needed explicitly "the char number from the beginning", not byte position... 🤔
main Alexey
Posts: 2245
Joined: 25.08.2021 18:15

Re: Go to char number?

Post by main Alexey »

Seems that position is calculated in UTF16 chars. almost the same as in UTF8 chars. 3byte char will be taken, often, as 1. but emoji = UTF16 surrogate pair, will be taken as 2.
hexaae
Posts: 69
Joined: 25.02.2019 20:47

Re: Go to char number?

Post by hexaae »

Yes, it uses mixed bytes length for different chars, that's why position by hex or dec won't help... it would be very helpful an actual char counter from the beginning of the file...

Pic: https://i.ibb.co/hmwkT2P/image.png
main Alexey
Posts: 2245
Joined: 25.08.2021 18:15

Re: Go to char number?

Post by main Alexey »

Char which is on screenshot, is coded into 2 bytes in UTF16? if so, for GoTo dialog it is counted as size=1. Only UTF16 surrogate pairs (emoji) are counted as size=2.

Editor coding can be any, eg UTF8 or even cp1250. It don't affect GoTo offset.

So in most cases GoTo "d" prefix works ok.
hexaae
Posts: 69
Joined: 25.02.2019 20:47

Re: Go to char number?

Post by hexaae »

That strange "a" char is the one you can find at the bottom of MS-Gothic 11 with CharMap (U+FF41), and from the hex-editor (pic above) looks like is composed of 3 bytes...
But in the same doc there are also chars of 1 single byte like the "u" ($75) in the pic (bottom, red square)...
That's why for me is hard to understand what offset to use in such mixed docs 🤔 where not all chars have the same encoding and byte length (but I'm not expert with encodings)... the bin recompiler for that strange doc (localization for game La-Mulana) warns me of an error at "char 1458" (which I found was that % after the a, in the pic), so I tried with CudaText to go to d1458 but is nowehere near that char... so I wondered if there was a function/plugin to calculate chars, independently from their encoding byte length...
main Alexey
Posts: 2245
Joined: 25.08.2021 18:15

Re: Go to char number?

Post by main Alexey »

>That's why for me is hard to understand what offset to use in such mixed doc

use offset +1 for each such char. for ASCII char and for Unicode char (3 UTF8 bytes).
try to write small text file with these chars. mix them. and try to do Go To with offset 1,2,3,4,5,6.... and see how caret moves.

if you get wrong jumps with your big file, it may mean you have many emoj-chars and emoji-like-chars (which have 4 bytes in UTF16). I cannot solve it. i don't plan to rework this.
maybe plugin writers can do smthing.
hexaae
Posts: 69
Joined: 25.02.2019 20:47

Re: Go to char number?

Post by hexaae »

Do you think coding a plugin like this to just count selected chars (independently from their byte size in the file) and a "go to char num" would be technically possible with current CudaText?
https://www.charactercountonline.com/
Will try to kindly ask to some plugin coder in case...
main Alexey
Posts: 2245
Joined: 25.08.2021 18:15

Re: Go to char number?

Post by main Alexey »

I think yes, possible. If text converts to UTF8 without problems (ie not broken UTF8 text from some random binary).
Post Reply