-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support RTL languages in the canvas renderer #1899
Conversation
It's doing good in individual words (e.g., "اینجا" instead of "اجنیا"), but still puts the words in reverse order. The correct sentence is: "اینجا را بخوانید" (not "بخوانید را اینجا"). (I'm a Persian native.) |
@babakks I included space as well, I think it looks right now. |
I guess it will all fall apart when a line wraps though |
|
@babakks did you check the latest? |
That's perfect. Good job! |
@Tyriar I put the character joiner on my TODO list, it certainly would benefit from some overhaul after we are done with buffer transistions. |
Closing this off, would probably be just as bad having it work sometimes and really mess things up other times. If anything it had some lessons learned 😃 |
@Tyriar this looks like a definite improvement, what it exactly messed up? If it doesn't make problem for ltr, it would be better than current situation. |
@HKalbasi words ended up in reverse order and there were likely other problems with wrapped words. I found I didn't understand the problem fully so I wasn't comfortable making a half fix. |
I have an approximate knowledge of the bidi unicode algorithm, since I'm a native RTL speaker and this problem is so common. Let me try to explain it: There are four kind of unicode characters in bidi:
The original categories are a bit more verbose. For rendering a text:
An example:
From what I'm seeing here, it will become correct per the unicode spec if we join all RTLs and neutrals and weak LTRs. About line wrapping, I don't see the problem, and how RTL differs from LTR in it. The screen shot above looks good to me. |
@HKalbasi thanks, the spec for the algorithm is quite long though, so I imagine reading through that and working through the subtleties will take quite a while. There is also this terminal-specific one which recommends a different approach and is also very long https://terminal-wg.pages.freedesktop.org/bidi/. After reading all that and understanding the problem fully I'm guessing it's probably possible via character joiners but I'm not certain which is what concerns me. I feel like it'll take a pretty significant amount of effort to do this right, the ambiguity also bugs me as I cannot truly test this as I don't know any of these languages. |
I think anything here is better than nothing, reversed words are better than reversed letters, screenshots here are even better, and if we join RTLs with weaks and neutrals, it would make RTL fine, even if it doesn't work right for all cases. Also I would like to know what is the limit/cost of joining? Is it possible to join everything and let the browser solve bidi? |
Joining the whole line would cause some pretty bad performance issues. All joined characters get their own texture, meaning essentially all repeat characters/words would be cache misses and need to get re-rendered, that would thrash the backing texture and cause it to get cleared frequently. It seems like some combination of joined characters (for rtl words) and then the ability to reorder the words while keeping the word textures separate would be needed for it to function properly, right now the character joiner system isn't capable of this. |
Part of #701
Before:
After:
This does slow things down (~50% slower) but I think @jerch is working on improving the character joiner code to be faster, if the registerCharacterJoiner API works with codes instead of a big string it shouldn't make much difference.
DOM/WebGL renderer support will work when they support the character joiner API.