Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple unicode version support #1714

Closed
wants to merge 13 commits into from
Closed

Conversation

jerch
Copy link
Member

@jerch jerch commented Sep 28, 2018

First version of multiple unicode version support. Shall fix #1709, partly copied over from #1707 (thx to @dnfield).

The class UnicodeProvider currently holds registered versions at the global object. An instance of that class carries the active version info for a single terminal instance and returns the correct wcwidth impl. The statically provided versions currently reside under unicode in the codebase and get added in UnicodeProvider.ts. More versions can be added at runtime (maybe in the future by addons), there is a callback slot onRegister to get notified about newly registered versions.

Still very hacky, would be good to get some conceptual feedback first before doing tests and such.

Edit: Currently adds ~20kB to the package size uncompressed (now at ~385kB).

/cc @Tyriar, @mofux, @bgw, @dnfield

@jerch jerch added the work-in-progress Do not merge label Sep 28, 2018
@jerch jerch requested a review from a team October 8, 2018 15:59
Copy link
Contributor

@mofux mofux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested it, works really well - finally I can use emojis in my prompt again! I've left a small note about the registeredVersions function name, apart from that it LGTM.

src/UnicodeProvider.ts Outdated Show resolved Hide resolved
@jerch
Copy link
Member Author

jerch commented Oct 11, 2018

@mofux: Thx! Im abit concerned about the package size, might try to get the new data in a more packed representation (The old one isnt even half as big, also zips further down).

@jerch
Copy link
Member Author

jerch commented Oct 11, 2018

Reminder to myself: A provider instance needs to be disposable from the callback register (maybe simply remove it from the arguments).

@mofux
Copy link
Contributor

mofux commented Oct 12, 2018

I wouldn't bee too concerned about the extra 32KB 😅

@jerch
Copy link
Member Author

jerch commented Oct 26, 2018

Yeah, also zipped the difference is 7.8 vs 2.5 kB, imho nothing to worry about.

@jerch jerch removed the work-in-progress Do not merge label Oct 26, 2018
@jerch
Copy link
Member Author

jerch commented Oct 26, 2018

Made the provider disposable and added some tests.

Ready for next review round 😸

demo/client.ts Outdated Show resolved Hide resolved
src/InputHandler.ts Outdated Show resolved Hide resolved
src/Linkifier.ts Outdated Show resolved Hide resolved
src/UnicodeProvider.ts Outdated Show resolved Hide resolved
src/UnicodeProvider.ts Outdated Show resolved Hide resolved
* Gets the newly registered version and
* the `UnicodeProvider` instance as arguments.
*/
public addRegisterListener(callback: (version: number, provider: UnicodeProvider) => void): void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all these listener functions needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are in preparation for adding unicode versions at runtime by addons or such. The listener will inform about new versions, so any terminal instance can switch if needed.

src/UnicodeProvider.ts Outdated Show resolved Hide resolved
src/UnicodeProvider.ts Outdated Show resolved Hide resolved
src/UnicodeProvider.ts Outdated Show resolved Hide resolved
src/UnicodeProvider.ts Outdated Show resolved Hide resolved
@jerch jerch closed this Nov 8, 2018
@jerch jerch reopened this Nov 8, 2018
@jerch
Copy link
Member Author

jerch commented Nov 8, 2018

Up for next review.

@jerch jerch self-assigned this Nov 15, 2018
@jerch jerch added this to the 3.9.0 milestone Nov 15, 2018
@jerch jerch mentioned this pull request Nov 18, 2018
Copy link
Member

@Tyriar Tyriar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests failing after merge? Also I think this is blocked on #1789 now

@jerch
Copy link
Member Author

jerch commented Nov 19, 2018

Ah thats weird, have to check if there is some high char hidden in the test, that might change width with a different unicode version.
And yes, I think we should wait with this after #1789.

@jerch
Copy link
Member Author

jerch commented Nov 20, 2018

Reminder: Initialize wcwidth lookup table on first version request in activeVersion setter. (see #1789)

@jerch
Copy link
Member Author

jerch commented Nov 23, 2018

Currently blocked by #1789.

@jerch jerch closed this Nov 24, 2018
@jerch jerch reopened this Nov 24, 2018
@jerch
Copy link
Member Author

jerch commented Nov 24, 2018

@dnfield May I ask you to have a look at the new code, in particular here:

const WIDE_EASTASIAN_BMP = [

As you can see I extracted the BMP plane codepoints from the others to create the lookup table from it and shortcut the bisect for higher abit too (no magic here, I have simply split your lists).

The problem: A closer look at your code revealed that you are not using the WIDE_EASTASIAN_BMP definitions from your extracted list, instead you used the same values from the v6 definition. Did you do this on purpose? Since the defintions for v6 and v11 are very different I am unsure which one to go with.

@dnfield
Copy link

dnfield commented Nov 24, 2018

I didn't have any good test cases for the wide East Asian, and so I just left it as is, which I suppose is incorrect.

@jerch
Copy link
Member Author

jerch commented Nov 24, 2018

@dnfield
Aww ok yeah same here, guess we need to get some test against a system with v11 up before we can decide this.
Thx for the fast response. ❤️

@jerch
Copy link
Member Author

jerch commented Nov 24, 2018

The new iOS release and Mojave claim to be Unicode v11, Ubuntu still ships ICU v60 in 18.10 which is still v10. Ubuntu 19.04 has ICU v63 linked which is v11. Not sure if the latter is already useable, guess the tests have to be done on Mojave.

So I did a quick test in Mojave with the first codepoint that differs in v6 and v11: U+231A (watch)
their clib wcwidth returns 1 while the char is taking 2 cells in Terminal.app. Wth? Seems they use some other method to get the width. Does anyone know how get the correct width under macOS?
Btw Ubuntu 18.04 with unicode v10 returns 2. What a mess...

@Tyriar
Copy link
Member

Tyriar commented Nov 24, 2018

🤷‍♂️

@jerch
Copy link
Member Author

jerch commented Nov 29, 2018

Hmm it seems macOS' wcwidth is not updated/maintained. Only chance I see to get some numbers is to draw a char and get its bounding box. But then the numbers rely on the font glyphs and their relation/support for a specific unicode version. @gnachman Maybe you know a way how to get wcwidth numbers more reliable on macOS?

Guess I gonna start with v10 on ubuntu to check which wide definitions can be pulled in.

@Tyriar Tyriar modified the milestones: 3.9.0, 3.10.0 Nov 29, 2018
@jerch
Copy link
Member Author

jerch commented Dec 16, 2018

Made some progress and queried wcwidths from different systems I have access to with these results:

  • Ubuntu 14 LTS: claims to be Unicode v6.3, almost in line with current values used in xterm.js
  • Ubuntu 16 LTS: claims to be Unicode v7.0, only a few changes to v6.3 (good to go with 6.3 here)
  • Ubuntu 18 LTS: claims to be Unicode v10.0, many changes compared to v7.0, needs its own map imho
  • Mojave: claims to be Unicode v11.0, Terminal.app wcwidth values are not usable to build a map (very off to the values from Ubuntu v10, partly not updated, still contains v6.3 values)

I will try to get better results for v11 by extracting the values from the upcoming Ubuntu LTS (claims to be v11) and comparing it with the results from here https://github.com/jquast/wcwidth (they have several extractors to get the values).
Also Windows is not covered yet, I have no clue which Unicode version Win10 relies on and whether this depends on the installed patch set. Also this would have to be in line with the char handling of the new conpty (@Tyriar maybe ask the dev team, which version they use or how they determine the applied wcwidth value?)

If this all fails we could still provide system dependent maps extracted from the default terminal, but I see this only as a last resort.

From the differences of the extracted maps I conclude that we will have to bundle at least 3 versions (6.3 + 10 + 11, 11 contains much more changes than any other major release before) which raises the question whether we should go with an addon solution from the beginning.

@Tyriar
Copy link
Member

Tyriar commented Jan 16, 2019

I had an idea yesterday which would probably be a good fix for this. Once we have addons, the unicode definitions can be "bundled" addons, so they will remain in xtermjs/xterm.js and ship in the xterm module, but loading them will be controlled via a new Terminal option. For example:

// Loads the default unicode addon version using dynamic imports such
// that all unicode definitions are not loaded when a terminal is created
const t1 = new Terminal();

// Loads only the v11 unicode addon via dynamic import
const t2 = new Terminal({
  unicodeVersion: 11
});

@Tyriar Tyriar mentioned this pull request Jan 16, 2019
@jerch
Copy link
Member Author

jerch commented Jan 16, 2019

@Tyriar Stumbled myself over the question what to do with at least 3 versions to support. For local applications (like electron based) 50kB more or less dont really hurt, for remote browser based scenarios this is nasty, since a user only needs one particular version.

My idea was to offer 2 ways to integrate the data:

  • For local apps make the definition addons able to be prebundled (like your idea), this could also be used by browser solutions for an initial definition to start with or to include all if they want to go with a fat bundle.
  • Optionally make other definition addons lazy loadable from some ressource if a user requests it. This would only need a small fetch callback that loads the new definition into the provider manager, prolly with a lazy unicode version loader addon.

@Tyriar Tyriar mentioned this pull request Mar 4, 2019
4 tasks
@Tyriar
Copy link
Member

Tyriar commented May 10, 2019

Closing as the eventual addon solution will look pretty different to this, see #1709 (comment)

@Tyriar Tyriar closed this May 10, 2019
@Tyriar Tyriar added the reference A closed issue/pr that is expected to be useful later as a reference label May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reference A closed issue/pr that is expected to be useful later as a reference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode handling in xterm.js
4 participants