opening cp1252 encoded text files occassionally replaces characters

cydonia · Post by **cydonia** » 26.07.2024 03:42

Subject: opening cp1252 encoded text files occassionally replaces characters
In cudatext version 1.214.6 I generally use cp1252 encoding however occassionally when I reopen a file previously saved in cudatext as encoding cp1252 certain characters valid in cp1252 encoding such as — and ‰ (this is replaced by â€°) and a few others get replaced by text starting with â€ but in most cases cp1252 encoded text files having these characters appear correctly when opened.

main Alexey · Post by **main Alexey** » 26.07.2024 06:20

can you attach small example file (the smaller the better)?
I must reopen it in cp1252 and see the problem.

main Alexey · Post by **main Alexey** » 26.07.2024 06:35

i cannot see the problem with text "such as — and ‰". it's saved and opened in cp1252 ok. but i am on Linux (code must work the same on Win/Linux).

cydonia · Post by **cydonia** » 28.07.2024 06:07

I've tried to reproduce the issue by saving and reopening a small sample file with a single change and also entering a "∕" unicode character triggering utf8 conversion and converting back to cp1252 but as this happens occasionally and I'm unsure of the circumstances that causes this I may need to suspend this post until I can reproduce it again and record the steps used

cydonia · Post by **cydonia** » 28.07.2024 13:45

I have been able to recreate the issue:

Due to the autoconversion of cp1252 to UTF8 by cudatext when saving or autosave when unicode characters are present I don't always notice thats its encoded as UTF8 after removing the unicode characters and forget to convert to cp1252

If I create an ANSI text file test1.txt which has been saved in cudatext as UTF8 which contains the following text (the characters are all valid in cp1252 encoding):
— test line 1
— test line 2
— test line 3
‰ test line 4

When I use a batchfile containing the following commandline to use encoding cp1252 to open it:
cudatext.exe" -r -e=cp1252 D:\test1.txt@1@1
The file opens with the following text.

â€” test line 1
â€” test line 2
â€” test line 3
â€° test line 4

main Alexey · Post by **main Alexey** » 28.07.2024 14:03

it don't look like a bug. if you see the file with binary viewer you will see that leading unicode-chars are saved as 3 bytes:
e.g. E2 80 94 for '—'.
https://www.compart.com/en/unicode/U+2014

so loading in cp1252 converts these 3 bytes to 3 unicode chars.
to avoid it, load file as utf8 + convert encoding to cp1252 (statusbar click) + save file.

UVviewsoft forums

opening cp1252 encoded text files occassionally replaces characters

opening cp1252 encoded text files occassionally replaces characters

Re: opening cp1252 encoded text files occassionally replaces characters

Re: opening cp1252 encoded text files occassionally replaces characters

Re: opening cp1252 encoded text files occassionally replaces characters

Re: opening cp1252 encoded text files occassionally replaces characters

Re: opening cp1252 encoded text files occassionally replaces characters