Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not split runs on font scaling if the run is RTL #7149

Closed
wants to merge 1 commit into from
Closed

Do not split runs on font scaling if the run is RTL #7149

wants to merge 1 commit into from

Conversation

schorrm
Copy link
Contributor

@schorrm schorrm commented Aug 2, 2020

Summary of the Pull Request

Right now if we have a glyph that needs scaling amidst some that don't, we split the GlyphRun and only apply scaling as necessary. This is generally fine, but it shouldn't be done in RTL since splitting the GlyphRun causes directionality issues.

References

#538

PR Checklist

  • Closes #xxx
  • CLA signed. If not, go over here and sign the CLA
  • Tests added/passed
  • Documentation updated. If checked, please file a pull request on our docs repo and link it here: #xxx
  • Schema updated.
  • I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

Detailed Description of the Pull Request / Additional comments

In the process of working on this, I think I may have come up with the correct algorithmic idea for the Grand Fix for almost all RTL behavior: on the font fallback routine, if the run would be split by the index from the MapCharacters phase, try again and see if the fallback font can do the entire run. I have just been unable to create an object from the fallback font factory to run through the normal method with MapChars, so if you could point me in the right direction with DWrite that'd be great

Validation Steps Performed

Cat-ing. Generally speaking this means that RTL words will be intact though in reverse order when cat-ed, instead of just garbled arbitrarily.

@j4james
Copy link
Collaborator

j4james commented Aug 3, 2020

Doing something like this will break any application that is designed to work with RTL text. Applications need to know that if they output a character at a particular coordinate, then that is where the character will be displayed. There are escape sequences an application can use to change the directionality of text, but if the terminal just starts moving characters around of its own accord, it will be impossible to work with.

@schorrm
Copy link
Contributor Author

schorrm commented Aug 3, 2020

@j4james are you critiquing my PR or the original behavior? My PR is about doing less reordering, not more.

@miniksa
Copy link
Member

miniksa commented Aug 3, 2020

Wait, I'm not sure about this. What is actually causing it to display incorrectly? Is this just the bidiLevel flag isn't copied to the new run when one is split?

I'm concerned about just opting-out of this codepath for RTL. That doesn't feel like the right answer here if it's still useful for LTR text.

Can you give a sample file or some sample text to this including screenshots?

@j4james
Copy link
Collaborator

j4james commented Aug 3, 2020

@j4james are you critiquing my PR or the original behavior? My PR is about doing less reordering, not more.

It's possibly I'm just looking at it wrong, but my initial impression was that your PR was making things worse than they were before. If I output the character א followed the character ב, then I expect to see א in column 1 and ב in column 2.

This is my test case:

printf "א"; sleep 1; printf "ב"; printf "\n"

And this is what that looks like in the preview build:

image

However, with your PR merged, I'm now seeing those characters reordered as they are output, so א shifts to column 2, and ב ends up in column 1.

image

That said, I don't think the current behaviour is correct either. More complicated examples seem to do some weird reordering too. But this PR doesn't help with that, at least as far as I can tell.

@schorrm
Copy link
Contributor Author

schorrm commented Aug 4, 2020

@j4james that's not a bug, that's the feature!
Look at the sequence אב for a second. Hebrew is an RTL language, so the alef, א, should be to the right of bet, ב.
So your second example is actually exactly what the fix is meant to be.

@schorrm
Copy link
Contributor Author

schorrm commented Aug 4, 2020

@miniksa so @j4james has a nice example actually, and here's another example:
I took the Hebrew alphabet in four letter chunks (the Arabic alphabet with the whole connectivity thing is too hard to read here)
The left is the bugfixed version. You can see that it maintains the integrity of the four letter chunks, however, it makes that space in Cascadia Code, not in fallback, so it does split the GlyphRun. The rightmost is current behavior on Liberation Sans, a pretty crummy font which I only have for Hebrew support. You can see that that it behaves perfectly there without the fallback. The middle is present (i.e. not this fix) behavior under font fallback, and the order is just completely bogus.
image
For those who aren't familiar with the Hebrew alphabet, we could say that they effectively are (using letter number):
Left: 4321 8765 12-11-10-9 ....
Right: 22-21 20-19-18-17 16-15-14-13 ...
Middle: 1432 5768 9-12-11-10 13-14-15-16 17-18-19-20 21-22

@j4james
Copy link
Collaborator

j4james commented Aug 4, 2020

For reference, here's what it looks like in conhost, which is technically the correct order, i.e. 1234 5678 9-10-11-12 ...

image

So while the middle version is clearly not right, it's probably the least broken of the three. At least some of the chunks are in the right order.

For a Hebrew speaker, the left version may appear to be more correct, because it's displaying the chunks in their correct language order, but that's not something the terminal should be trying to do - that's the application's responsibility.

@schorrm
Copy link
Contributor Author

schorrm commented Aug 4, 2020

No, ConHost is the wrong order since unlike ConHost WT aims (not at high priority, but aims) to support RTL. You can see in the linked issue.
The right version is the correct one for any application which supports RTL.

@j4james
Copy link
Collaborator

j4james commented Aug 4, 2020

If you think this is the correct solution for RTL, I invite you to try editing a line of Hebrew text in vi, or even just the command line. For example, start by enter אגדהוזח and then try to insert ב between the א and the ג. Let me know how many keystrokes it takes before you give up and just delete the whole line to start from scratch.

I'm not a Hebrew speaker (or any other bidirectional language for that matter), so this isn't going to effect me either way. But I've written enough bidirectional software to know that this approach will complete break any real bidi applications. And while it may sometimes give the appearance of "fixing" applications that aren't bidi-aware, it won't do that very well, and in the process will break a whole lot of other things.

@schorrm
Copy link
Contributor Author

schorrm commented Aug 4, 2020

You're absolutely correct that editing something in Vim is extremely nutty. However, a few things to keep in mind:

  1. It's already totally broken. It's a nightmare under all circumstances. And I tried your challenge out, and it's much much better when we're in the native font zone and not the fallback zone. Not ideal, but still better than typing everything backwards (and then needing to reverse your text files?)
  2. The approach of 'just print all the characters in reverse order' could sometimes let you hack something through in Hebrew, but that's fundamentally impossible with Arabic where it needs to be drawn as the words actually are.
  3. That'd also necessitate a major change since we'd need to say that we now split every individual RTL to its own GlyphRun.
  4. I think you're missing a lot here. As a developer whose primary language is English but works on Hebrew and Arabic NLP projects, I am well aware that I cannot expect a decent CLI editor for RTL text. Much more likely is that I'm peeking in on a file and then I can make out what it says. Or lets say I'm using some corpus dump and there are Hebrew file names. Let's take this little example.
    For reference, the name of the folder is תיקייה-עם-שם-בעברית-אבגד.
    Current Behavior with Font Fallback:
    image
    Current Behavior, Hebrew Supported Font:
    image

So while you argue that it doesn't do it well enough, I actually have three different profiles for Ubuntu in my WSL. Ubuntu, UbuntuHeb, and UbuntuAR. Because the RTL support when we're not glitching on fallback is actually to my taste very good, and it's really useful.
5. Lastly, if we want to address Bidi editing, the way to do that isn't to scramble the ordering of the text stream, it's to have the cursor jump around to the correct place in the stream. That's how every decent Bidi editor does it.

@schorrm
Copy link
Contributor Author

schorrm commented Aug 4, 2020

Under the PR, this specific use case with ls by the way, is not perfect like the example on the bottom, but better.

@schorrm
Copy link
Contributor Author

schorrm commented Aug 4, 2020

I'm concerned about just opting-out of this codepath for RTL. That doesn't feel like the right answer here if it's still useful for LTR text.

@miniksa I would formulate the case as follows: this codepath is generally beneficial since a single massive oversize character doesn't make everything else in the GlyphRun tiny. However, this is isn't worth it when splitting a GlyphRun has 'side effects', like it does with Bidi here. The scaling cases aren't worth the price of garbling the order, and any very unusually sized character like emoji is probably going to be a different BidiLevel anyway. It's really about the side effects that it has with the layout more than anything else.
(I say generally beneficial since in cases where you have a lot of LTR font fallback, like if you're trying to copy Cherokee into the terminal, when every glyph has a different scale the ransom-note aspect of the sizing maybe isn't ideal.)

@miniksa
Copy link
Member

miniksa commented Aug 4, 2020

OK two thoughts here:

  1. The correction algorithm probably shouldn't be "ransomnote"ing any fallback font run. Perhaps what it should be doing is identifying a scale factor that is appropriate for the entire font/run instead of splitting the runs further.
  2. I feel like a 1 run, 7 RTL characters and 7 runs, 1 RTL character each should both display properly (with all other properties being the same like scale factor). There must be something missing in the rendering portion that could be taught to account for RTL.

I think fixing either of those would also result in the same visual correction as your currently proposed fix while also getting deeper into the root of other related issues at the same time.

@miniksa
Copy link
Member

miniksa commented Aug 4, 2020

OK two thoughts here:

  1. The correction algorithm probably shouldn't be "ransomnote"ing any fallback font run. Perhaps what it should be doing is identifying a scale factor that is appropriate for the entire font/run instead of splitting the runs further.
  2. I feel like a 1 run, 7 RTL characters and 7 runs, 1 RTL character each should both display properly (with all other properties being the same like scale factor). There must be something missing in the rendering portion that could be taught to account for RTL.

I think fixing either of those would also result in the same visual correction as your currently proposed fix while also getting deeper into the root of other related issues at the same time.

A further thought on 2... if I were to have 7 RTL characters in a row in a stock IDWriteTextLayout from the DWriteFactory and I assigned a different color to each one... I would expect internally that it had to break it into 7 runs with a different brush per run to draw that. So it must know how to compensate the origins for adjacent RTL runs and we probably have just not handled that yet in CustomTextLayout or CustomTextRenderer

@j4james
Copy link
Collaborator

j4james commented Aug 4, 2020

For another example of software that will be broken by this PR, below are screenshots of VIM in Hebrew mode (i.e. vim -H), first in conhost and then in Windows Terminal with this PR applied. In both case I've just typed in the Hebrew alphabet, in four character chunks. Note how the Windows Terminal version has incorrectly reordered each chunk. Hebrew users will not thank you for this.

The current Windows Terminal implementation isn't right either, but if you want to fix that, you need to match the conhost behaviour.

image

@DHowett
Copy link
Member

DHowett commented Aug 4, 2020

So, this touches a bit on what it means to have host or guest-controlled BiDi, and it's going to make problems ever more apparent. There's a large body of work related to how to handle RTL in terminal emulators, but very little consensus on whose job it is to do so.

It looks like vim's Hebrew mode (which I didn't know existed!) prefers to handle RTL itself... but detecting what an application's intent is when it puts RTL text in the buffer is going to be nigh-unto impossible. ☹️

@j4james
Copy link
Collaborator

j4james commented Aug 4, 2020

detecting what an application's intent is when it puts RTL text in the buffer is going to be nigh-unto impossible. ☹️

This is exactly what I've been trying to say. This is not a problem that the terminal can solve. And I don't believe the VTE proposal is any better. The best thing a terminal can do is stay out of the way, and let the applications handle the issue themselves (like VIM's Hebrew mode as one example).

That said, there are cosmetic UI enhancements that a terminal could add to make things less painful when working with bidi-unaware applications. For example, a toggleable "mirror mode". which simply flips the entire screen. Or something like a bidi IME, that allows you to enter RTL text and have it pasted into the terminal in reverse order. I'd recommend looking at early Hebrew terminals and terminal emulators to see how they handled this sort of thing.

@schorrm
Copy link
Contributor Author

schorrm commented Aug 4, 2020

No, staying out of the way is a cop-out. I care a million times more about having ls work for me than I do about having vim.

@schorrm
Copy link
Contributor Author

schorrm commented Aug 4, 2020

You've only given one example of something that doesn't work, and that's vim -H. And I don't dispute that, but it doesn't work now either. And 100% we should break it so that everything else works. Because everything else in ConHost Hebrew is absolutely 100% unusable without mentally reading everything precisely backwards.

@schorrm
Copy link
Contributor Author

schorrm commented Aug 4, 2020

I wanna be clear. The ConHost behavior is unbearable. cat doesn't work. ls doesn't work. I actually use Hebrew. You can see from the original issue and the way people have commented on thread #538 (and all the closed dupe arabic threads) that nobody really wants the ConHost behavior in place.
So yeah, vim -H doesn't work and I don't care. But as for "Hebrew users" -- I am one. I have special profiles for fonts that support glyphs because I <3 the current RTL behavior when it doesn't run into font issues. Seriously. Because really, honest to god, 90% at least of my actual, real world, my student job doing Hebrew & Arabic NLP thingies do work.
I'm sorry that I've gotten so worked up. But you keep saying stuff like "Hebrew users will not thank you for this" and don't stop and consider why I've made three PRs already focused on RTL rendering.

@schorrm
Copy link
Contributor Author

schorrm commented Aug 4, 2020

@miniksa that's a really good point. I guess we could essentially put in a for loop to reorder the thingy in _DrawGlyphRuns?

if run is ltr: draw normally
else:
    from last consecutive rtl run to the present run draw the run (note the reversed order)
    advance to past the last consecutive rtl run

@j4james
Copy link
Collaborator

j4james commented Aug 5, 2020

I understand the need to make ls and cat work better with RTL languages - I'm very much in favour of fixing that. I'm just saying that it would be preferable if we could solve that problem in a way that doesn't break every other RTL application in existence, even if you don't personally use any of them.

Anyway, I've made my point. I'm not directly effected by this so I'll stay out of the discussion from now on. You guys can decide however you want to address the issue.

@DHowett
Copy link
Member

DHowett commented Aug 6, 2020

I wonder if it might be most appropriate to simply offer a config flag? experimental.rendering.rtlLanguageIsGuestManaged or something would be our escape hatch for vim -H or -A or -F...

I hate that there's no way for us to detect that the guest wants to handle it itself right now, and we have to support both cases relatively "seamlessly".

@j4james
Copy link
Collaborator

j4james commented Aug 6, 2020

I wonder if it might be most appropriate to simply offer a config flag?

FWIW, most terminals that try to do something clever with RTL will have an option or escape sequence to turn that behaviour off.

@schorrm
Copy link
Contributor Author

schorrm commented Aug 6, 2020

I mentioned this on the other PR, I agree that that's appropriate, and the behavior should be implemented by just marking every run as ltr if the flag is set. That'd cause the equivalent behavior.

@miniksa
Copy link
Member

miniksa commented Aug 6, 2020

I like taking @DHowett's flag and applying it to the @schorrm behavior of putting all the bidiLevel flags on the runs back to 0 if it's turned on.

DHowett pushed a commit that referenced this pull request Aug 7, 2020
Consecutive RTL GlyphRuns are drawn from the last to the first.

References
#538, #7149, all those issues asking for RTL closed as dupes.

As @miniksa suggested in a comment on #7149 -- handle the thingy on the
render side.

If we have GlyphRuns abcdEFGh, where EFG are RTL, we draw them now in
order abcdGFEh.

This has ransom-noting, because I didn't touch the font scaling at all.
This should fix the majority of RTL issues, except it *doesn't* fix
issues with colors, because those get split in the TextBuffer phase in
the renderer I think, so they show up separately by the GlyphRun phase.
@schorrm
Copy link
Contributor Author

schorrm commented Aug 9, 2020

With the generalized fix that came in #7190, I think this PR is partially obsolete. I think it's a good idea to take this idea of the uniform scaling (especially?) in the case of font fallback, but there's no reason that it should be unique to RTL anymore. So I think it makes sense to close this PR (after completing some discussion items here) and open a ransomnoting specific PR, as well as one for the flag described.

  1. @DHowett can you give me an outline of the code for making a new JSON flag? Also, once we're doing that flag, I think we might as well allow a flag to force all RTL (set all bidiLevel to 1). Rare use cases, but if we're already going through that part of code flow, might as well. I would suggest the names experimental.rendering.forceLTR and experimental.rendering.forceRTL.
  2. I don't have any real insight into determining optimal font scale splitting in general. However, one observation is that with fallback, when it rains, it pours. I think it's reasonable to assume in most fallback cases that it won't just be one glyph that gets resized, and rather the whole run is going to get ransomnoted. And perhaps, like the ultimate RTL fix in Batch RTL runs to ensure proper draw order #7190, this might make sense to analyze across multiple runs.

DHowett pushed a commit that referenced this pull request Aug 11, 2020
Consecutive RTL GlyphRuns are drawn from the last to the first.

References
#538, #7149, all those issues asking for RTL closed as dupes.

As @miniksa suggested in a comment on #7149 -- handle the thingy on the
render side.

If we have GlyphRuns abcdEFGh, where EFG are RTL, we draw them now in
order abcdGFEh.

This has ransom-noting, because I didn't touch the font scaling at all.
This should fix the majority of RTL issues, except it *doesn't* fix
issues with colors, because those get split in the TextBuffer phase in
the renderer I think, so they show up separately by the GlyphRun phase.

(cherry picked from commit 60b44c8)
@schorrm
Copy link
Contributor Author

schorrm commented Aug 12, 2020

Closing this, and I'll try and touch on the action items separately.

@schorrm schorrm closed this Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants