cjk characters word wrap problem

Solved bugs are moved into this topic...
Post Reply
lookoutside
Posts: 51
Joined: 17.08.2022 13:42

cjk characters word wrap problem

Post by lookoutside »

I remember I had poste this problem house ago. But I can't find it now. Very strange. So, I rewrite it here but a little shorter.

The example figure shows that cudatext wrap the line with too many spaces when it encounters mixed English/Cjk characters.

My question is if this problem will be resolved. I think it's hard, so I can accept any result.

Thank you.
cudatext-word-wrap-problem.png
main Alexey
Posts: 2265
Joined: 25.08.2021 18:15

Post by main Alexey »

1. What is the language rule: CJK characters can be splitted in any place? or we need to split only on space/tab positions?
2. what if you add CJK "comma" (I see it over red underline on your pic) and CJK "small circle" to the option "nonword_chars"?
main Alexey
Posts: 2265
Joined: 25.08.2021 18:15

Post by main Alexey »

I added these 2 cjk chars to option value using Options Editor Lite. user.json:

Code: Select all

    "nonword_chars": "-+*=/\\()[]{}<>\"'.,:;~?!@#$%^&|`\u2026\uff0c\u3002\u4e00",
lookoutside
Posts: 51
Joined: 17.08.2022 13:42

Post by lookoutside »

main Alexey wrote:1. What is the language rule: CJK characters can be splitted in any place? or we need to split only on space/tab positions?
Answer:

1. CJK characters can be splitted in any place except for the following exceptions
2. must not be split before

Code: Select all

, 。 ;!?: ” ’,.;!?:
which have similar alternatives in English but with different encoding value. Or these chactor could not be at the start of a line
3. must not be split after

Code: Select all

“  ‘ " '
I will post the standard here when I find them.
main Alexey wrote:1. What is the language rule: CJK characters can be splitted in any place? or we need to split only on space/tab positions?
2. what if you add CJK "comma" (I see it over red underline on your pic) and CJK "small circle" to the option "nonword_chars"?
It's much better than before.

I think when the first rule is applied, the result will be perfect.
main Alexey
Posts: 2265
Joined: 25.08.2021 18:15

Post by main Alexey »

so CJK text can be wrapped at almost-any position. so app needs special handling for CJK word-wrap. no wish to make code so complex. maybe later.
editors Geany and Sublime Text also don't support this CJK issue good. tested on copy/pasted CudaText webpage.
lookoutside
Posts: 51
Joined: 17.08.2022 13:42

Post by lookoutside »

main Alexey wrote:so CJK text can be wrapped at almost-any position. so app needs special handling for CJK word-wrap. no wish to make code so complex. maybe later.
editors Geany and Sublime Text also don't support this CJK issue good. tested on copy/pasted CudaText webpage.
Yes, few text editors handle CJK word-wrap very well. So, I said I can accept any results.

By splitting at CJK punctuations, It's not so ugly now. In extreme condition, I can turn to VS Code temporarily. Thanks a lot.
main Alexey
Posts: 2265
Joined: 25.08.2021 18:15

Post by main Alexey »

I tried now to improve it - added special code for these 3 unicode ranges

CJK Unified Ideographs 4E00-9FFF Common
CJK Unified Ideographs Extension A 3400-4DBF Rare
CJK Compatibility Ideographs F900-FAFF Duplicates, unifiable variants, corporate characters

test the windows demo from http://uvviewsoft.com/c/ , better?
main Alexey
Posts: 2265
Joined: 25.08.2021 18:15

Post by main Alexey »

>must not be split before , 。 ;!?: ” ’,.;!?:

Adjusted my demo, re-uploaded, pls test.
lookoutside
Posts: 51
Joined: 17.08.2022 13:42

Post by lookoutside »

main Alexey wrote:I tried now to improve it - added special code for these 3 unicode ranges

CJK Unified Ideographs 4E00-9FFF Common
CJK Unified Ideographs Extension A 3400-4DBF Rare
CJK Compatibility Ideographs F900-FAFF Duplicates, unifiable variants, corporate characters

test the windows demo from http://uvviewsoft.com/c/ , better?
I can't believe it! You did it! It works! You work so hard, quickly, and productively! Thank you very very very much!

I will tell it to a very old cudatext user. He is a powerful technical writer in China. These days, he saw my posts about cudatext on zhihu.com and post a comment on it. He said he has been using cudatext as a total commander plugin for many years, although not very frequently. He pointed out that cudatext cannot handle CJK word wrap well and hope you can improve it, too. Now I will ask him to write a promotional article, since you afford the convenience for us generously.
Post Reply