Encoding detection: cp1254 is forced, not like in Notepad++

Post Reply
ertank
Posts: 61
Joined: 06.06.2021 21:56

Encoding detection: cp1254 is forced, not like in Notepad++

Post by ertank »

Hello,

I am using 1.134.0.2 win64 version.

When I open an UTF-8 file its encoding is wrongly detected as cp1254. I reload file as UTF-8 and still read encoding as cp1254.

I do not know such behavior of cudatext and just reporting in here.

I expected at least to read UTF-8 after I click reload as UTF-8.

BTW, Notepad++ file encoding is identified as UTF-8 at the time it is first opened.

Attached is a small video to better explain what I tried to explain above.

Thanks & Regards,
Ertan
Attachments
Media1.7z
(990.99 KiB) Downloaded 82 times
uvviewsoft
Posts: 392
Joined: 01.12.2020 13:46

Post by uvviewsoft »

Windows-1254 is a code page used under Microsoft Windows to write Turkish.
so it seems you have option "def_encoding_utf8":false,
1) pls change option to 'true', it's better?
2) pls send the simple example to me- support(at)uvviewsoft.com , so I can see why UTF8 menu item don't work.
ertank
Posts: 61
Joined: 06.06.2021 21:56

Post by ertank »

1) Changing mentioned option to true does not load file as UTF-8. It is still loaded as cp1254
2) I sent file to mentioned e-mail address.

Thank you.
uvviewsoft
Posts: 392
Joined: 01.12.2020 13:46

Post by uvviewsoft »

Confirmed. I see that CudaText detects invalid utf8-chars so it blocks utf8 for that file.
Kate editor (on Linux) loads file in cp1252! and on trying to reload in utf8 it shows error!
picture added.
so file is broken?
Attachments
err.png
uvviewsoft
Posts: 392
Joined: 01.12.2020 13:46

Post by uvviewsoft »

And Pluma (on Linux) shows the errror that it cannot load file in utf8.
Attachments
Screenshot from 2021-06-20 00-34-03.png
ertank
Posts: 61
Joined: 06.06.2021 21:56

Post by ertank »

I just saw replies in the thread. Sorry for being late.

I have no problem opening firebird.trc file on Notepad++ (See attached picture). Indeed, there are some non printable characters in that file around line 6365, 6378. Probably bad saved data in database or something like that saved in trace file.

Thing is, I can still work with it on Notepad++ as UTF-8.

Moreover, I have another bigger SQL file (about 284MB received it today) which is identified as UCS-2 LE BOM by Notepad++ and CudaText still forces cp1254 displays file like hex and I cannot switch to UTF8. I can share that file compressed in earlier mentioned e-mail if needed.

Thanks & Regards,
Ertan
Attachments
CudaText opened firebird.trc file identified it as cp1254
CudaText opened firebird.trc file identified it as cp1254
Notepad++ opened big SQL file as UCS-2 LE BOM
Notepad++ opened big SQL file as UCS-2 LE BOM
Notepad++ opened firebird.trc file (with non printable characters) identified it as UTF-8
Notepad++ opened firebird.trc file (with non printable characters) identified it as UTF-8
ertank
Posts: 61
Joined: 06.06.2021 21:56

Post by ertank »

Here you can see non printable characters in that firebird.trc file attached.

I also attached SublimeText (UNREGISTERED) display of the file. SublimeText opens file as Western (Windows 1252) by default. If I switch to Turkish (Windows 1254) There is no broken data anymore and text is displayed correctly.

BTW, I really like how Notepad++ displays these non printable characters (reversed and hex coded) and have no problem using such files without breaking them. It would be great if such support is included in CudaText. I admit that I have no idea how difficult to achieve that.
Attachments
Unregistered version of SublimeText displaying same file
Unregistered version of SublimeText displaying same file
image_2021-06-21_191810.png (7.08 KiB) Viewed 2016 times
Line 6337
Line 6337
image_2021-06-21_191327.png (6.45 KiB) Viewed 2016 times
Line 6350
Line 6350
image_2021-06-21_191256.png (5.26 KiB) Viewed 2016 times
ertank
Posts: 61
Joined: 06.06.2021 21:56

Post by ertank »

Here is correctly displayed data after switching to encoding Turkish (Windows 1254) in SublimeText

It turns out that CudaText identified encoding correctly after all.
Attachments
Correctly displayed Turkish characters
Correctly displayed Turkish characters
image_2021-06-21_192755.png (11.5 KiB) Viewed 2015 times
uvviewsoft
Posts: 392
Joined: 01.12.2020 13:46

Post by uvviewsoft »

I think that Notepad++ does it wrong-- it shows 2 turkish chars 'ÇÝFT' as some hex codes but they are not valid UTF8 chars. CudaText detects that fact, Notepad++ does not. (Linux editors Pluma, Kate also detect it.)
Notepad++ shows xC7xDD (I think they are not valid utf8 chars).

you can 'heal' the file: command 'change encoding, no reload: utf8 bom'. and then save file.
ertank
Posts: 61
Joined: 06.06.2021 21:56

Post by ertank »

CudaText uses cp1254 which is actually correct.

It was me who was wrong thinking that file is UTF8 encoded.
Post Reply