Encoding detection: cp1254 is forced, not like in Notepad++

ertank · Post by **ertank** » 19.06.2021 19:53

Hello,

I am using 1.134.0.2 win64 version.

When I open an UTF-8 file its encoding is wrongly detected as cp1254. I reload file as UTF-8 and still read encoding as cp1254.

I do not know such behavior of cudatext and just reporting in here.

I expected at least to read UTF-8 after I click reload as UTF-8.

BTW, Notepad++ file encoding is identified as UTF-8 at the time it is first opened.

Attached is a small video to better explain what I tried to explain above.

Thanks & Regards,
Ertan

uvviewsoft · Post by **uvviewsoft** » 19.06.2021 20:20

Windows-1254 is a code page used under Microsoft Windows to write Turkish.
so it seems you have option "def_encoding_utf8":false,
1) pls change option to 'true', it's better?
2) pls send the simple example to me- support(at)uvviewsoft.com , so I can see why UTF8 menu item don't work.

ertank · Post by **ertank** » 19.06.2021 20:36

1) Changing mentioned option to true does not load file as UTF-8. It is still loaded as cp1254
2) I sent file to mentioned e-mail address.

Thank you.

uvviewsoft · Post by **uvviewsoft** » 19.06.2021 21:28

Confirmed. I see that CudaText detects invalid utf8-chars so it blocks utf8 for that file.
Kate editor (on Linux) loads file in cp1252! and on trying to reload in utf8 it shows error!
picture added.
so file is broken?

uvviewsoft · Post by **uvviewsoft** » 19.06.2021 21:35

And Pluma (on Linux) shows the errror that it cannot load file in utf8.

ertank · Post by **ertank** » 21.06.2021 16:02

I just saw replies in the thread. Sorry for being late.

I have no problem opening firebird.trc file on Notepad++ (See attached picture). Indeed, there are some non printable characters in that file around line 6365, 6378. Probably bad saved data in database or something like that saved in trace file.

Thing is, I can still work with it on Notepad++ as UTF-8.

Moreover, I have another bigger SQL file (about 284MB received it today) which is identified as UCS-2 LE BOM by Notepad++ and CudaText still forces cp1254 displays file like hex and I cannot switch to UTF8. I can share that file compressed in earlier mentioned e-mail if needed.

Thanks & Regards,
Ertan

ertank · Post by **ertank** » 21.06.2021 16:17

Here you can see non printable characters in that firebird.trc file attached.

I also attached SublimeText (UNREGISTERED) display of the file. SublimeText opens file as Western (Windows 1252) by default. If I switch to Turkish (Windows 1254) There is no broken data anymore and text is displayed correctly.

BTW, I really like how Notepad++ displays these non printable characters (reversed and hex coded) and have no problem using such files without breaking them. It would be great if such support is included in CudaText. I admit that I have no idea how difficult to achieve that.

ertank · Post by **ertank** » 21.06.2021 16:28

Here is correctly displayed data after switching to encoding Turkish (Windows 1254) in SublimeText

It turns out that CudaText identified encoding correctly after all.

uvviewsoft · Post by **uvviewsoft** » 21.06.2021 19:01

I think that Notepad++ does it wrong-- it shows 2 turkish chars 'ÇÝFT' as some hex codes but they are not valid UTF8 chars. CudaText detects that fact, Notepad++ does not. (Linux editors Pluma, Kate also detect it.)
Notepad++ shows xC7xDD (I think they are not valid utf8 chars).

you can 'heal' the file: command 'change encoding, no reload: utf8 bom'. and then save file.

ertank · Post by **ertank** » 21.06.2021 19:12

CudaText uses cp1254 which is actually correct.

It was me who was wrong thinking that file is UTF8 encoded.