Page 1 of 1
Go to char number?
Posted: 29.11.2023 18:57
by hexaae
How to go to char number from the beginning of the text (not the usual line number)?
Re: Go to char number?
Posted: 29.11.2023 18:59
by main Alexey
Docs:
https://wiki.freepascal.org/CudaText#.2 ... 22_dialogs
d100 (decimal with leading "d"): Jump to absolute decimal offset.
xFF00 (hex number with leading "x"): Jump to absolute hex offset.
Re: Go to char number?
Posted: 29.11.2023 19:24
by hexaae
Not that simple: is a complex doc with mixed 3bytes (UTF8), 1byte (ASCII) etc. chars:
https://1drv.ms/t/s!ApMUGr0cuN39g8hxieZ ... Q?e=jXbozZ (file from my onedrive)
that's why I needed explicitly "the char number from the beginning", not byte position...
Re: Go to char number?
Posted: 30.11.2023 05:04
by main Alexey
Seems that position is calculated in UTF16 chars. almost the same as in UTF8 chars. 3byte char will be taken, often, as 1. but emoji = UTF16 surrogate pair, will be taken as 2.
Re: Go to char number?
Posted: 30.11.2023 09:35
by hexaae
Yes, it uses mixed bytes length for different chars, that's why position by hex or dec won't help... it would be very helpful an actual char counter from the beginning of the file...
Pic:
https://i.ibb.co/hmwkT2P/image.png
Re: Go to char number?
Posted: 30.11.2023 10:34
by main Alexey
Char which is on screenshot, is coded into 2 bytes in UTF16? if so, for GoTo dialog it is counted as size=1. Only UTF16 surrogate pairs (emoji) are counted as size=2.
Editor coding can be any, eg UTF8 or even cp1250. It don't affect GoTo offset.
So in most cases GoTo "d" prefix works ok.
Re: Go to char number?
Posted: 30.11.2023 14:20
by hexaae
That strange "a" char is the one you can find at the bottom of MS-Gothic 11 with CharMap (
U+FF41), and from the hex-editor (pic above) looks like is composed of 3 bytes...
But in the same doc there are also chars of 1 single byte like the "u" ($75) in the pic (bottom, red square)...
That's why for me is hard to understand what offset to use in such mixed docs
where not all chars have the same encoding and byte length (but I'm not expert with encodings)... the bin recompiler for that strange doc (localization for game La-Mulana) warns me of an error at "char 1458" (which I found was that % after the a, in the pic), so I tried with CudaText to go to d1458 but is nowehere near that char... so I wondered if there was a function/plugin to calculate chars, independently from their encoding byte length...
Re: Go to char number?
Posted: 30.11.2023 15:03
by main Alexey
>That's why for me is hard to understand what offset to use in such mixed doc
use offset +1 for each such char. for ASCII char and for Unicode char (3 UTF8 bytes).
try to write small text file with these chars. mix them. and try to do Go To with offset 1,2,3,4,5,6.... and see how caret moves.
if you get wrong jumps with your big file, it may mean you have many emoj-chars and emoji-like-chars (which have 4 bytes in UTF16). I cannot solve it. i don't plan to rework this.
maybe plugin writers can do smthing.
Re: Go to char number?
Posted: 01.12.2023 11:20
by hexaae
Do you think coding a plugin like this to just count selected chars (independently from their byte size in the file) and a "go to char num" would be technically possible with current CudaText?
https://www.charactercountonline.com/
Will try to kindly ask to some plugin coder in case...
Re: Go to char number?
Posted: 01.12.2023 11:24
by main Alexey
I think yes, possible. If text converts to UTF8 without problems (ie not broken UTF8 text from some random binary).