Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unicode display: CJK, powerline, tmux #1536

Merged
merged 9 commits into from
May 19, 2017

Conversation

tibotiber
Copy link
Contributor

@tibotiber tibotiber commented Feb 17, 2017

Hi! This is a follow up PR from #1376, taking 2e58e56 into consideration. I have isolated the issue with braille characters for vtop by decorating the corresponding nodes with a braille-node class additionally to the existing unicode-node class. It is ready to merge to the best of my knowledge.
Cc @rauchg

WIP summary:

tibotiber referenced this pull request Feb 17, 2017
@albinekb
Copy link
Contributor

Wow, this is awesome! Will try it tonight 👍

@tibotiber
Copy link
Contributor Author

Screenshots of the result here: vtop and unicode + tmux
vtop
unicode

@rauchg
Copy link
Member

rauchg commented Feb 18, 2017

Only issue is: shouldn't we respect the grid size for emojis? Both Terminal.app and iTerm seem to overlap them:

image

image

I wonder if by taking space wider than a character we would be breaking some UIs ?

@rauchg
Copy link
Member

rauchg commented Feb 18, 2017

Also, since other Terminals limit the width of Emojis, I worry that we'd be displaying them not as nicely in Hyper, because they'd essentially have an extra space (think Yarn)

@tibotiber
Copy link
Contributor Author

@rauchg: great point about emojis, one important breaking UI is the cursor position. However we need to address at least powerline unicode characters (e.g. #1336 (comment)), which is handled properly by iTerm as well. Emojis being spread over codepoints, it would be easier to isolate the powerline glyphs and give them more space. We can choose to apply flexible or double spacing, I would suggest double to keep the cursor position in check. From what I can see, this is the way iTerm handles it too.

After that, I think I should remove css decoration for braille, since it will fall in the normal unicode styling (?).

Sidenote: This is getting interesting, powerline glyphs are stored in the Unicode Private Use Area, along with medieval scripts, Klingon, and various languages of the Middle-earth... Should we display that with fix width or double width like powerline glyphs? 😂

@rauchg
Copy link
Member

rauchg commented Feb 18, 2017

@tibotiber that's a very good question. I think ultimately it would depend on each glyph in the private use area whether they want double or single width.

Let's do this: for now, let's give all private use area double (which would encompass Powerline glyphs), and then if a request comes up, we can think about adding a config flag for specifying ranges that you want to be double width

@tibotiber
Copy link
Contributor Author

OK, I'll push changes today (it's Saturday here already). I'll remove the braille part then.

@tibotiber
Copy link
Contributor Author

tibotiber commented Feb 18, 2017

Update: The issue with powerline glyphs is not a width issue. I couldn't get this to work in a way that supports basic powerline glyphs as well as user defined ones, so I checked iTerm a bit more and realised they do not treat powerline glyphs (new or old) as wide chars. What happens is wide unicode simply overflows on the next char. See screenshots for iTerm and Hyper:
hyper
iterm
If you look at the cat above, both terminals treat the glyphs exactly in the same way. Now, the difference is in the powerline prompt line where there is a background color. See the branch sign being clipped in Hyper? Because Hyper encapsulates each unicode char in its own <span />, the wide unicode do not visibly overflow on the char to its right, since that char's background color is on top of whatever overflows. This is not an issue in iTerm where unicode is probably not encapsulated like this. Why do we encapsulate unicode in individual containers exactly?

I removed my previous changes and only left the low hanging fruit: fixing of CJK wide chars. Still trying to find a way around the powerline issue. Summary of issues i'm trying to address and the ones fixed:

  • CJK chars display
  • powerline wide unicode glyphs display
  • multiple rendering of CJK in some cases (@BirkhoffLee report)
  • tmux empty prompt width, see screenshot below:
    tmux

@tibotiber tibotiber changed the title Fix unicode display with braille compatibility Fix unicode display: CJK, powerline, tmux Feb 18, 2017
@rauchg
Copy link
Member

rauchg commented Feb 20, 2017

Why do we encapsulate unicode in individual containers exactly?

As explained above, without the <span> with a fixed width, glyphs that are not included in a monospace font (such as braille which uses the "Apple Braille" font would show up in a different width (and this width varies per-character) and break rendering)

What happens is wide unicode [on iTerm] simply overflows on the next char

That's not true for Emojis it seems? Or emojis are not "wide unicode" (why not?)

image

Another thing I'd like to add to your list: ensure chinese characters render correctly

@rauchg
Copy link
Member

rauchg commented Feb 20, 2017

More food for thought: how should be rendered? (widest known unicode character that I could Google for 😅)

@rauchg
Copy link
Member

rauchg commented Feb 20, 2017

Personally, I like what Terminal.app does (similar to emoji handling). Take up one column, show overflow.

image

iTerm takes up one column, but only shows partial overflow:

image

@tibotiber
Copy link
Contributor Author

OK, that's what I had gathered about the encapsulation, just wanted to make sure there were no other reasons.

That's not true for Emojis it seems? Or emojis are not "wide unicode" (why not?)

I don't understand. Emojis are wide unicode, and they do overflow on the next char in iTerm, don't they? This is what your screenshot shows with the heart overflowing from its span onto the next (behind the a). Maybe it's a vocabulary issue, I have the feeling we're saying the same thing.

Good job finding ﷽ 😂, I'll target full overflow, like in Terminal.app.

Another thing I'd like to add to your list: ensure chinese characters render correctly

Which part exactly? The wc-node behaviour is done. Are you referring to the 2 chars that displayed as 3 mentioned in #1376 (comment) / #1376 (comment)? Or anything else?

@rauchg
Copy link
Member

rauchg commented Feb 20, 2017

Just wanted to be 100% sure! Seems like we're on the same page @tibotiber.
Unicode nirvana (for output) is within reach!

Next up once we merge this PR is to fix backspace (the behavior of the ANSI escape that is) for surrogate pairs and caret position

@rauchg
Copy link
Member

rauchg commented Feb 20, 2017

Yep I was referring to the report by @BirkhoffLee

@rauchg
Copy link
Member

rauchg commented Feb 20, 2017

One super tricky edge case that'll be difficult to solve (with my current knowledge of CSS):

We want it to overflow, but if it's at the edge, we would want the parent container to trigger scrolling somehow.

No terminal that i know of seems to handle this edge case currently

image

@rauchg
Copy link
Member

rauchg commented Feb 20, 2017

Also not a priority for now considering our broken backspace, but thought I'd mention it

@tibotiber
Copy link
Contributor Author

Took a while to get back to this, sorry. I pushed new changes to try and fix the display of unicode in powerline prompts:

  • removed existing unicode patching that was creating new containers for unicode chars. It was making it impossible for unicode to overflow as desired when there is a background color (e.g. powerline).
  • instead I override hterm's insertString to get back the node in which the string has been added and I decompose its textContent when it contains unicode to insert child nodes wrapping unicode in individual spans. Doing so we still manage to get monospaced unicode but it can now overflow as long as the background color doesn't change.

This is partially working and I can't figure out why only "partially". It basically works for general unicode display such as cat files or vtop, it also work with file editors when re-rendering rows etc, but the node's content is flattened as if nothing happened in powerline prompts, except for the last node in a row.

I think it has something to do with the way hterm handles data decoding in hterm.VT.prototype.parseUnknown_ but I can't manage to pinpoint exactly what. I could be wrong though and was hoping someone has an idea. I was originally not familiar with hterm's internals so I might miss something.

I left some debug logs that help to understand what the new insertString does.

@rauchg
Copy link
Member

rauchg commented May 19, 2017

Reverting

rauchg added a commit that referenced this pull request May 19, 2017
rauchg added a commit that referenced this pull request May 19, 2017
@albinekb
Copy link
Contributor

I'll have a look for a fix

@tibotiber
Copy link
Contributor Author

Hey, sorry about this, didn't try to run dist. I'll have a look as well.

@tibotiber
Copy link
Contributor Author

tibotiber commented May 20, 2017

I'm not able to reproduce (macOS 10.12.4 as well). I'm only getting a warning for an unknown OSC code.

screen shot 2017-05-20 at 12 37 56 pm

Might be coming from different prompts, what is yours like?

@albinekb
Copy link
Contributor

@tibotiber
Copy link
Contributor Author

Mmmmh, I'm on sindresorhus pure as well actually. I tried a few other prompts (antigen themes) and I'm getting no error. Not understanding this.

@albinekb
Copy link
Contributor

🤔 I'll diag & give you something to reproduce with tomorrow, need to get some 💤

@mengzhuo
Copy link

Why don't we just let the webkit do this work instead of calculate the width of the char?

@albinekb
Copy link
Contributor

albinekb commented May 23, 2017

@mengzhuo I'm sad to say it's a bit more complicated than that...

Sorry for not coming back yet, I'll get this sorted tomorrow 👼

@mengzhuo
Copy link

mengzhuo commented May 24, 2017

@albinekb

I use font "Cousine for Powerline", this screenshot is after I remove all unicode-node style.

2017-05-24 10-58-55

You can compare with gnome-terminal at the background.

I think it's a "web-based" app, we should due with it more "web-way" by using more css instead of messing with DOM.

@tibotiber
Copy link
Contributor Author

@mengzhuo, the issue with this approach is that you break the rendering for unicode that is not monospaced (see this comment). You can observe a bad example by running vtop after removing unicode styling. More generally, terminals expect monospace chars and handle exceptions such as CJK with "wide-chars" which are doubled-spaced.

@albinekb
Copy link
Contributor

albinekb commented May 26, 2017

Sorry for the delay @tibotiber

the problem is that not all nodes has className, for example text-nodes (see https://developer.mozilla.org/en-US/docs/Web/API/Node/nodeType)

I tried two things:

  • adding a check in hterm.Screen.prototype.wrapUnicode to only handle element nodes (type 3), no errors but emojis get scrambled.
  • adding a check for node.className, same problem

kapture 2017-05-26 at 8 08 13

even worse in fish:
kapture 2017-05-26 at 8 09 24

here's what's in my console when entering one emoji: (fish)
image
this is maybe my shell being f*d up 😄


After testing some more, it seems to be an issue with hterm.Screen.prototype.insertString ?

because if i print emojis (cat a file with emojis in it for example), they work as expected

@tibotiber
Copy link
Contributor Author

tibotiber commented May 26, 2017

@albinekb thanks for posting more details. I looked into it and here are some thoughts:

  1. we seem to have an issue with cursor position. Basically the cursor moves with double spacing on single spaced unicode. Since hterm.Terminal.prototype.eraseToRight uses the cursor position to calculate how many chars should be deleted, it creates an issue with unicode deletion. I'm thinking this might be the source of the unknown char that appears when we delete emojis. Note: this issue exists in master so it is not new to this PR and could be tackled in another PR. (edit: issue reported at A space before the cursor #1506)
  2. nice find on "the problem is that not all nodes has className", I completely missed it. I fixed this locally and will open a new PR soon since this one is now frozen. (edit: here it is -> Fix unicode display, third attempt :\ #1873)

tibotiber added a commit to tibotiber/hyper that referenced this pull request May 26, 2017
@tibotiber
Copy link
Contributor Author

Quite honestly, hterm is horrible to patch because the code is extremely imperative and convoluted. Do you know of any other terminal emulator that would be more "react-ish"? That would be really cool! @rauchg?

@LeonBlade
Copy link

LeonBlade commented May 28, 2017

I stumbled upon this PR and wanted to ask here before creating an issue first, so sorry in advance for posting this here.

By any chance is this rendering issue related to the problem produced in this screenshot:

Terminal and Hyper

On the left is Terminal and on the right is Hyper. I'm speaking of course about the box side which seems to warp as well as contain some overlap between the characters.

@tibotiber
Copy link
Contributor Author

tibotiber commented May 28, 2017

@LeonBlade this will need to be checked once the follow up PR #1873 lands, but it should fix your issue.

before:
before

after:
after

@uetchy
Copy link

uetchy commented Jun 2, 2017

I've tested it with Japanese and Korean characters on macOS El Capitan and there are some visual errors appeared.
There is an iTerm2 screen next to Hyper to clarify differences.
It looks like the width of Japanese characters is too narrow and Korean characters are weirdly broken.

screenshot 2017-06-02 22 01 32

@ELLIOTTCABLE
Copy link

ELLIOTTCABLE commented Jul 17, 2017

Hi! I don't necessarily have a horse in this race, besides the ones that the contributors above have (Pretty Unicode Interfaces [for english text] And Maybe Sometimes Emoji) — but I want to point out that this heavily affects users from other countries and cultures? This seems like a question that's ideal for a ‘reach out and ask users’ kind of moment. Maybe a request-for-comments notice in the README for six months, or something like that? (Also, a great way for beginners to contribute: “What does your language need from a terminal emulator?)

If nothing else, I suspect we need to get a ~Unicode expert~ to come in and answer some questions about character ranges to treat as ‘wide’ (our implementation gives them double width, automatically consuming an extra column) versus ones to ignore, treat as single-wide, and leave for userspace handling (“What? This emoji looks funny? I guess I'd better spit out an extra space after it.” aka The Yarn Approach).

@ELLIOTTCABLE
Copy link

ELLIOTTCABLE commented Jul 17, 2017

An aside — copied from a friend, as examples of why the emoji in particular are A Hard Problem, and “display all private-use-area characters as two columns” isn't enough:

  • 👨‍❤️‍💋‍👨 1f468-200d-2764-fe0f-200d-1f48b-200d-1f468 KISS: MAN, MAN
    8 codepoints, 11 UCS-2 units, one Wide character.

  • 🤼🏻 1f93c-1f3fb WRESTLERS + EMOJI MODIFIER FITZPATRICK TYPE-1-2
    This used to be 1 glyph in Emoji 3.0, but 4.0 deprecated group skin tones.

@sachin21
Copy link

sachin21 commented Aug 1, 2017

@tibotiber Great!! You saved my life. Thank you. I'll looking forward to be released 👍

@lucasqueiroz
Copy link

Seems like 1.4.x fixed this issue.
Vtop with 1.4.2 (PragmataPro Font):
hyper_001

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants