Go to char number?
Go to char number?
How to go to char number from the beginning of the text (not the usual line number)?
-
- Posts: 2245
- Joined: 25.08.2021 18:15
Re: Go to char number?
Docs:
https://wiki.freepascal.org/CudaText#.2 ... 22_dialogs
https://wiki.freepascal.org/CudaText#.2 ... 22_dialogs
d100 (decimal with leading "d"): Jump to absolute decimal offset.
xFF00 (hex number with leading "x"): Jump to absolute hex offset.
Re: Go to char number?
Not that simple: is a complex doc with mixed 3bytes (UTF8), 1byte (ASCII) etc. chars:
https://1drv.ms/t/s!ApMUGr0cuN39g8hxieZ ... Q?e=jXbozZ (file from my onedrive)
that's why I needed explicitly "the char number from the beginning", not byte position...
https://1drv.ms/t/s!ApMUGr0cuN39g8hxieZ ... Q?e=jXbozZ (file from my onedrive)
that's why I needed explicitly "the char number from the beginning", not byte position...
-
- Posts: 2245
- Joined: 25.08.2021 18:15
Re: Go to char number?
Seems that position is calculated in UTF16 chars. almost the same as in UTF8 chars. 3byte char will be taken, often, as 1. but emoji = UTF16 surrogate pair, will be taken as 2.
Re: Go to char number?
Yes, it uses mixed bytes length for different chars, that's why position by hex or dec won't help... it would be very helpful an actual char counter from the beginning of the file...
Pic: https://i.ibb.co/hmwkT2P/image.png
Pic: https://i.ibb.co/hmwkT2P/image.png
-
- Posts: 2245
- Joined: 25.08.2021 18:15
Re: Go to char number?
Char which is on screenshot, is coded into 2 bytes in UTF16? if so, for GoTo dialog it is counted as size=1. Only UTF16 surrogate pairs (emoji) are counted as size=2.
Editor coding can be any, eg UTF8 or even cp1250. It don't affect GoTo offset.
So in most cases GoTo "d" prefix works ok.
Editor coding can be any, eg UTF8 or even cp1250. It don't affect GoTo offset.
So in most cases GoTo "d" prefix works ok.
Re: Go to char number?
That strange "a" char is the one you can find at the bottom of MS-Gothic 11 with CharMap (U+FF41), and from the hex-editor (pic above) looks like is composed of 3 bytes...
But in the same doc there are also chars of 1 single byte like the "u" ($75) in the pic (bottom, red square)...
That's why for me is hard to understand what offset to use in such mixed docs where not all chars have the same encoding and byte length (but I'm not expert with encodings)... the bin recompiler for that strange doc (localization for game La-Mulana) warns me of an error at "char 1458" (which I found was that % after the a, in the pic), so I tried with CudaText to go to d1458 but is nowehere near that char... so I wondered if there was a function/plugin to calculate chars, independently from their encoding byte length...
But in the same doc there are also chars of 1 single byte like the "u" ($75) in the pic (bottom, red square)...
That's why for me is hard to understand what offset to use in such mixed docs where not all chars have the same encoding and byte length (but I'm not expert with encodings)... the bin recompiler for that strange doc (localization for game La-Mulana) warns me of an error at "char 1458" (which I found was that % after the a, in the pic), so I tried with CudaText to go to d1458 but is nowehere near that char... so I wondered if there was a function/plugin to calculate chars, independently from their encoding byte length...
-
- Posts: 2245
- Joined: 25.08.2021 18:15
Re: Go to char number?
>That's why for me is hard to understand what offset to use in such mixed doc
use offset +1 for each such char. for ASCII char and for Unicode char (3 UTF8 bytes).
try to write small text file with these chars. mix them. and try to do Go To with offset 1,2,3,4,5,6.... and see how caret moves.
if you get wrong jumps with your big file, it may mean you have many emoj-chars and emoji-like-chars (which have 4 bytes in UTF16). I cannot solve it. i don't plan to rework this.
maybe plugin writers can do smthing.
use offset +1 for each such char. for ASCII char and for Unicode char (3 UTF8 bytes).
try to write small text file with these chars. mix them. and try to do Go To with offset 1,2,3,4,5,6.... and see how caret moves.
if you get wrong jumps with your big file, it may mean you have many emoj-chars and emoji-like-chars (which have 4 bytes in UTF16). I cannot solve it. i don't plan to rework this.
maybe plugin writers can do smthing.
Re: Go to char number?
Do you think coding a plugin like this to just count selected chars (independently from their byte size in the file) and a "go to char num" would be technically possible with current CudaText?
https://www.charactercountonline.com/
Will try to kindly ask to some plugin coder in case...
https://www.charactercountonline.com/
Will try to kindly ask to some plugin coder in case...
-
- Posts: 2245
- Joined: 25.08.2021 18:15
Re: Go to char number?
I think yes, possible. If text converts to UTF8 without problems (ie not broken UTF8 text from some random binary).