opening cp1252 encoded text files occassionally replaces characters

Post Reply
cydonia
Posts: 22
Joined: 26.07.2024 01:09

opening cp1252 encoded text files occassionally replaces characters

Post by cydonia »

Subject: opening cp1252 encoded text files occassionally replaces characters
In cudatext version 1.214.6 I generally use cp1252 encoding however occassionally when I reopen a file previously saved in cudatext as encoding cp1252 certain characters valid in cp1252 encoding such as — and ‰ (this is replaced by ‰) and a few others get replaced by text starting with †but in most cases cp1252 encoded text files having these characters appear correctly when opened.
main Alexey
Posts: 2245
Joined: 25.08.2021 18:15

Re: opening cp1252 encoded text files occassionally replaces characters

Post by main Alexey »

can you attach small example file (the smaller the better)?
I must reopen it in cp1252 and see the problem.
main Alexey
Posts: 2245
Joined: 25.08.2021 18:15

Re: opening cp1252 encoded text files occassionally replaces characters

Post by main Alexey »

i cannot see the problem with text "such as — and ‰". it's saved and opened in cp1252 ok. but i am on Linux (code must work the same on Win/Linux).
cydonia
Posts: 22
Joined: 26.07.2024 01:09

Re: opening cp1252 encoded text files occassionally replaces characters

Post by cydonia »

I've tried to reproduce the issue by saving and reopening a small sample file with a single change and also entering a "∕" unicode character triggering utf8 conversion and converting back to cp1252 but as this happens occasionally and I'm unsure of the circumstances that causes this I may need to suspend this post until I can reproduce it again and record the steps used
cydonia
Posts: 22
Joined: 26.07.2024 01:09

Re: opening cp1252 encoded text files occassionally replaces characters

Post by cydonia »

I have been able to recreate the issue:

Due to the autoconversion of cp1252 to UTF8 by cudatext when saving or autosave when unicode characters are present I don't always notice thats its encoded as UTF8 after removing the unicode characters and forget to convert to cp1252

If I create an ANSI text file test1.txt which has been saved in cudatext as UTF8 which contains the following text (the characters are all valid in cp1252 encoding):
— test line 1
— test line 2
— test line 3
‰ test line 4

When I use a batchfile containing the following commandline to use encoding cp1252 to open it:
cudatext.exe" -r -e=cp1252 D:\test1.txt@1@1
The file opens with the following text.

— test line 1
— test line 2
— test line 3
‰ test line 4
main Alexey
Posts: 2245
Joined: 25.08.2021 18:15

Re: opening cp1252 encoded text files occassionally replaces characters

Post by main Alexey »

it don't look like a bug. if you see the file with binary viewer you will see that leading unicode-chars are saved as 3 bytes:
e.g. E2 80 94 for '—'.
https://www.compart.com/en/unicode/U+2014

so loading in cp1252 converts these 3 bytes to 3 unicode chars.
to avoid it, load file as utf8 + convert encoding to cp1252 (statusbar click) + save file.
Post Reply