Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji width wrong and varying with U+FE0F and U+FE0E ignored. #8970

Closed
christianparpart opened this issue Jan 30, 2021 · 11 comments
Closed

Emoji width wrong and varying with U+FE0F and U+FE0E ignored. #8970

christianparpart opened this issue Jan 30, 2021 · 11 comments
Labels
Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal. Resolution-Duplicate There's another issue on the tracker that's pretty much the same thing.
Milestone

Comments

@christianparpart
Copy link

Try this in Window Terminal (I am using version 1.4 btw):

echo -ne "M\U0001F600M\nM\U0001F600\uFE0FM\nM\U0001F600\uFE0EM\n"

This is how it looks:
image

  1. U+FE0E modifier is ignored.
  2. U+FE0F modifier generates a glyph with a different with than without modifier.

The default emoji presentation mode should be respected (here emoji, but there are others that are text).
And the width for emoji emoji presentation should be 2 (as per unicode emoji spec) whereas emoji text presentation (including U+FE0E override) should be text and cell width 1.

@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Jan 30, 2021
@j4james
Copy link
Collaborator

j4james commented Jan 31, 2021

emoji text presentation (including U+FE0E override) should be text and cell width 1.

I agree that the current behaviour is wrong, but I don't think what you're claiming here is correct. I think in all three cases the emoji width should be 2 cells wide. You shouldn't be able to change the width of a character in a terminal with a variation selector.

I've just checked a bunch of terminals (XTerm, Gnome Terminal, Rxvt, st, Konsole, Alacritty, Mlterm, Mintty), and none of them alter the emoji width. And Gnome Terminal was the only that seemed to treat the color and text representations differently, although the others always used the text representation rather than a color glyph.

@j4james
Copy link
Collaborator

j4james commented Jan 31, 2021

For reference, here's a screenshot showing the output from the various terminals I tested:
image

@christianparpart
Copy link
Author

christianparpart commented Jan 31, 2021

Ah, hey @j4james ;-)
I'm not saying every terminal does that at its best ;-)

It is almost a year ago when I was about to literally eat the Unicode specs in order to implement emoji and especially multi codepoint grapheme support. I don't remember by heart where I was reading about that one problematic claim, and I may as well be wrong wrt. Emoji text presentation's width. I still think I am right though (I am trying to prove me... wrong? :-D)

Let's try that:

echo -ne "ABC\n\u00a9\n\u00a9\ufe0f\n\U0001F600\ufe0e\nABC\n"

image
(screenshot from Kitty)

This will show the copyright sign, I chose this now because the default emoji presentation style is indeed text and its width is one (East Asian Width = N := narrow).

If you try that in Kitty it actually does it the way I think it's right. (except I think there is a bug in kitty wrt "\U0001F600\uFE0F" showing only a white bubble at least on my screen, /cc @kovidgoyal).

image

I am trying to find myself again in the Unicode TS 51 though.

When I did the research early last year I was also looking how browsers render emojis (with monospace font surrounding them), and that seems to match my understanding too. Whether or not that's correct has to be found out though, but I would say that is exactly what users are expecting to see. It should not be rendered differently in the terminal, just because every emoji (regardless of VS15 or default presentation mode) has to be 2.

I am aware that emojis or unicode (especially multi codepoint grapheme clusters) are a sensible topic in the terminal land, but from the user's standpoint I think the above should be right. What do you think?

@kovidgoyal
Copy link

@christianparpart you want to see kovidgoyal/kitty#3211 for the bug you found in kitty.

@kovidgoyal
Copy link

As for the actual issue, as can be seen from kitty's behavior, I agree with @christianparpart

As per the unicode standard, variation selectors change the presentation base of the preceding character, if it has multiple presentation bases defined. The width of a character depends on its presentation. Ergo, variation selectors must change width. This is one (of the many) reasons one cannot use wcwidth() to calculate the width of text in a terminal, one must use wcswidth()

@j4james
Copy link
Collaborator

j4james commented Jan 31, 2021

This subject has been discussed at length many times before, and I don't think anyone has ever proposed a sensible way to make it work. If you think you do have a solution, though, I suggest you continue the discussion in issue 9 of terminal-wg. But if you can't convince the terminal-wg community that you have a workable solution (and I didn't see any indication that you had), then I don't see why Windows Terminal would want to diverge from the standard that almost everyone else is currently following.

@kovidgoyal
Copy link

kovidgoyal commented Jan 31, 2021 via email

@christianparpart
Copy link
Author

christianparpart commented Jan 31, 2021

just a very slight FYI, I am having a draft lying around where I attempted to formalize:

  • grapheme cluster handling (continuous text writes, including cursor positioning, ...)
  • character width (including VS15/VS16 overrides, emoji default presentation)
  • feature detection, mode switch, future extensibility.

I am not publishing it yet, and it'll sure take some more months (I first have to finish another proposal), especially since I know how things can sometimes be discussed to death in TWG, yielding into exodus. I surely will do when I feel I've addressed what I think is needed for a "Terminal Unicode Core" support, I'll let you know then.

@zadjii-msft
Copy link
Member

Alright so there's an easy bug here, and a hard discussion.

The easy bug is the spacing with the U+FE0E/U+FE0F. That can be fixed easier, to match the behavior of other terminals.

The hard discussion - should the version selector change the width - I'm gonna leave that discussion for another place. This thread isn't really the place to build that consensus. If consensus is found, I'm happy to match the consensus of the terminal emulator space.

@zadjii-msft zadjii-msft added Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal. labels Feb 1, 2021
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Feb 1, 2021
@zadjii-msft zadjii-msft added this to the Terminal v2.0 milestone Feb 1, 2021
@DHowett
Copy link
Member

DHowett commented Feb 4, 2021

The spacing issue is another form of /dup #1472. We don't handle 0-width combining codepoints well.

@ghost
Copy link

ghost commented Feb 4, 2021

Hi! We've identified this issue as a duplicate of another one that already exists on this Issue Tracker. This specific instance is being closed in favor of tracking the concern over on the referenced thread. Thanks for your report!

@ghost ghost closed this as completed Feb 4, 2021
@ghost ghost added Resolution-Duplicate There's another issue on the tracker that's pretty much the same thing. and removed Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Feb 4, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal. Resolution-Duplicate There's another issue on the tracker that's pretty much the same thing.
Projects
None yet
Development

No branches or pull requests

5 participants