Subject: opening cp1252 encoded text files occassionally replaces characters
In cudatext version 1.214.6 I generally use cp1252 encoding however occassionally when I reopen a file previously saved in cudatext as encoding cp1252 certain characters valid in cp1252 encoding such as — and ‰ (this is replaced by ‰) and a few others get replaced by text starting with †but in most cases cp1252 encoded text files having these characters appear correctly when opened.
opening cp1252 encoded text files occassionally replaces characters
-
- Posts: 2245
- Joined: 25.08.2021 18:15
Re: opening cp1252 encoded text files occassionally replaces characters
can you attach small example file (the smaller the better)?
I must reopen it in cp1252 and see the problem.
I must reopen it in cp1252 and see the problem.
-
- Posts: 2245
- Joined: 25.08.2021 18:15
Re: opening cp1252 encoded text files occassionally replaces characters
i cannot see the problem with text "such as — and ‰". it's saved and opened in cp1252 ok. but i am on Linux (code must work the same on Win/Linux).
Re: opening cp1252 encoded text files occassionally replaces characters
I've tried to reproduce the issue by saving and reopening a small sample file with a single change and also entering a "∕" unicode character triggering utf8 conversion and converting back to cp1252 but as this happens occasionally and I'm unsure of the circumstances that causes this I may need to suspend this post until I can reproduce it again and record the steps used
Re: opening cp1252 encoded text files occassionally replaces characters
I have been able to recreate the issue:
Due to the autoconversion of cp1252 to UTF8 by cudatext when saving or autosave when unicode characters are present I don't always notice thats its encoded as UTF8 after removing the unicode characters and forget to convert to cp1252
If I create an ANSI text file test1.txt which has been saved in cudatext as UTF8 which contains the following text (the characters are all valid in cp1252 encoding):
— test line 1
— test line 2
— test line 3
‰ test line 4
When I use a batchfile containing the following commandline to use encoding cp1252 to open it:
cudatext.exe" -r -e=cp1252 D:\test1.txt@1@1
The file opens with the following text.
— test line 1
— test line 2
— test line 3
‰ test line 4
Due to the autoconversion of cp1252 to UTF8 by cudatext when saving or autosave when unicode characters are present I don't always notice thats its encoded as UTF8 after removing the unicode characters and forget to convert to cp1252
If I create an ANSI text file test1.txt which has been saved in cudatext as UTF8 which contains the following text (the characters are all valid in cp1252 encoding):
— test line 1
— test line 2
— test line 3
‰ test line 4
When I use a batchfile containing the following commandline to use encoding cp1252 to open it:
cudatext.exe" -r -e=cp1252 D:\test1.txt@1@1
The file opens with the following text.
— test line 1
— test line 2
— test line 3
‰ test line 4
-
- Posts: 2245
- Joined: 25.08.2021 18:15
Re: opening cp1252 encoded text files occassionally replaces characters
it don't look like a bug. if you see the file with binary viewer you will see that leading unicode-chars are saved as 3 bytes:
e.g. E2 80 94 for '—'.
https://www.compart.com/en/unicode/U+2014
so loading in cp1252 converts these 3 bytes to 3 unicode chars.
to avoid it, load file as utf8 + convert encoding to cp1252 (statusbar click) + save file.
e.g. E2 80 94 for '—'.
https://www.compart.com/en/unicode/U+2014
so loading in cp1252 converts these 3 bytes to 3 unicode chars.
to avoid it, load file as utf8 + convert encoding to cp1252 (statusbar click) + save file.