Go to char number?

hexaae · Post by **hexaae** » 29.11.2023 18:57

How to go to char number from the beginning of the text (not the usual line number)?

main Alexey · Post by **main Alexey** » 29.11.2023 18:59

Docs:
https://wiki.freepascal.org/CudaText#.2 ... 22_dialogs

d100 (decimal with leading "d"): Jump to absolute decimal offset.
xFF00 (hex number with leading "x"): Jump to absolute hex offset.

hexaae · Post by **hexaae** » 29.11.2023 19:24

Not that simple: is a complex doc with mixed 3bytes (UTF8), 1byte (ASCII) etc. chars:
https://1drv.ms/t/s!ApMUGr0cuN39g8hxieZ ... Q?e=jXbozZ (file from my onedrive)
that's why I needed explicitly "the char number from the beginning", not byte position...

main Alexey · Post by **main Alexey** » 30.11.2023 05:04

Seems that position is calculated in UTF16 chars. almost the same as in UTF8 chars. 3byte char will be taken, often, as 1. but emoji = UTF16 surrogate pair, will be taken as 2.

hexaae · Post by **hexaae** » 30.11.2023 09:35

Yes, it uses mixed bytes length for different chars, that's why position by hex or dec won't help... it would be very helpful an actual char counter from the beginning of the file...

Pic: https://i.ibb.co/hmwkT2P/image.png

main Alexey · Post by **main Alexey** » 30.11.2023 10:34

Char which is on screenshot, is coded into 2 bytes in UTF16? if so, for GoTo dialog it is counted as size=1. Only UTF16 surrogate pairs (emoji) are counted as size=2.

Editor coding can be any, eg UTF8 or even cp1250. It don't affect GoTo offset.

So in most cases GoTo "d" prefix works ok.

hexaae · Post by **hexaae** » 30.11.2023 14:20

That strange "a" char is the one you can find at the bottom of MS-Gothic 11 with CharMap (U+FF41), and from the hex-editor (pic above) looks like is composed of 3 bytes...
But in the same doc there are also chars of 1 single byte like the "u" ($75) in the pic (bottom, red square)...
That's why for me is hard to understand what offset to use in such mixed docs

where not all chars have the same encoding and byte length (but I'm not expert with encodings)... the bin recompiler for that strange doc (localization for game La-Mulana) warns me of an error at "char 1458" (which I found was that % after the a, in the pic), so I tried with CudaText to go to d1458 but is nowehere near that char... so I wondered if there was a function/plugin to calculate chars, independently from their encoding byte length...

main Alexey · Post by **main Alexey** » 30.11.2023 15:03

>That's why for me is hard to understand what offset to use in such mixed doc

use offset +1 for each such char. for ASCII char and for Unicode char (3 UTF8 bytes).
try to write small text file with these chars. mix them. and try to do Go To with offset 1,2,3,4,5,6.... and see how caret moves.

if you get wrong jumps with your big file, it may mean you have many emoj-chars and emoji-like-chars (which have 4 bytes in UTF16). I cannot solve it. i don't plan to rework this.
maybe plugin writers can do smthing.

hexaae · Post by **hexaae** » 01.12.2023 11:20

Do you think coding a plugin like this to just count selected chars (independently from their byte size in the file) and a "go to char num" would be technically possible with current CudaText?
https://www.charactercountonline.com/
Will try to kindly ask to some plugin coder in case...

main Alexey · Post by **main Alexey** » 01.12.2023 11:24

I think yes, possible. If text converts to UTF8 without problems (ie not broken UTF8 text from some random binary).

UVviewsoft forums

Go to char number?

Go to char number?

Re: Go to char number?

Re: Go to char number?

Re: Go to char number?

Re: Go to char number?

Re: Go to char number?

Re: Go to char number?

Re: Go to char number?

Re: Go to char number?

Re: Go to char number?