Page 1 of 1
cjk characters word wrap problem
Posted: 08.09.2022 09:35
by lookoutside
I remember I had poste this problem house ago. But I can't find it now. Very strange. So, I rewrite it here but a little shorter.
The example figure shows that cudatext wrap the line with too many spaces when it encounters mixed English/Cjk characters.
My question is if this problem will be resolved. I think it's hard, so I can accept any result.
Thank you.
Posted: 08.09.2022 09:40
by main Alexey
1. What is the language rule: CJK characters can be splitted in any place? or we need to split only on space/tab positions?
2. what if you add CJK "comma" (I see it over red underline on your pic) and CJK "small circle" to the option "nonword_chars"?
Posted: 08.09.2022 10:01
by main Alexey
I added these 2 cjk chars to option value using Options Editor Lite. user.json:
Code: Select all
"nonword_chars": "-+*=/\\()[]{}<>\"'.,:;~?!@#$%^&|`\u2026\uff0c\u3002\u4e00",
Posted: 08.09.2022 11:33
by lookoutside
main Alexey wrote:1. What is the language rule: CJK characters can be splitted in any place? or we need to split only on space/tab positions?
Answer:
1. CJK characters can be splitted in any place except for the following exceptions
2. must not be split before
which have similar alternatives in English but with different encoding value. Or these chactor could not be at the start of a line
3. must not be split after
I will post the standard here when I find them.
main Alexey wrote:1. What is the language rule: CJK characters can be splitted in any place? or we need to split only on space/tab positions?
2. what if you add CJK "comma" (I see it over red underline on your pic) and CJK "small circle" to the option "nonword_chars"?
It's much better than before.
I think when the first rule is applied, the result will be perfect.
Posted: 08.09.2022 11:44
by main Alexey
so CJK text can be wrapped at almost-any position. so app needs special handling for CJK word-wrap. no wish to make code so complex. maybe later.
editors Geany and Sublime Text also don't support this CJK issue good. tested on copy/pasted CudaText webpage.
Posted: 08.09.2022 13:04
by lookoutside
main Alexey wrote:so CJK text can be wrapped at almost-any position. so app needs special handling for CJK word-wrap. no wish to make code so complex. maybe later.
editors Geany and Sublime Text also don't support this CJK issue good. tested on copy/pasted CudaText webpage.
Yes, few text editors handle CJK word-wrap very well. So, I said I can accept any results.
By splitting at CJK punctuations, It's not so ugly now. In extreme condition, I can turn to VS Code temporarily. Thanks a lot.
Posted: 08.09.2022 13:37
by main Alexey
I tried now to improve it - added special code for these 3 unicode ranges
CJK Unified Ideographs 4E00-9FFF Common
CJK Unified Ideographs Extension A 3400-4DBF Rare
CJK Compatibility Ideographs F900-FAFF Duplicates, unifiable variants, corporate characters
test the windows demo from
http://uvviewsoft.com/c/ , better?
Posted: 08.09.2022 14:23
by main Alexey
>must not be split before , 。 ;!?: ” ’,.;!?:
Adjusted my demo, re-uploaded, pls test.
Posted: 09.09.2022 03:15
by lookoutside
main Alexey wrote:I tried now to improve it - added special code for these 3 unicode ranges
CJK Unified Ideographs 4E00-9FFF Common
CJK Unified Ideographs Extension A 3400-4DBF Rare
CJK Compatibility Ideographs F900-FAFF Duplicates, unifiable variants, corporate characters
test the windows demo from
http://uvviewsoft.com/c/ , better?
I can't believe it! You did it! It works! You work so hard, quickly, and productively! Thank you very very very much!
I will tell it to a very old cudatext user. He is a powerful technical writer in China. These days, he saw my posts about cudatext on zhihu.com and post a comment on it. He said he has been using cudatext as a total commander plugin for many years, although not very frequently. He pointed out that cudatext cannot handle CJK word wrap well and hope you can improve it, too. Now I will ask him to write a promotional article, since you afford the convenience for us generously.