-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement grapheme clusters (#16916)
First, this adds `GraphemeTableGen` which * parses `ucd.nounihan.grouped.xml` * computes the cluster break property for each codepoint * computes the East Asian Width property for each codepoint * compresses everything into a 4-stage trie * computes a LUT of cluster break rules between 2 codepoints * and serializes everything to C++ tables and helper functions Next, this adds `GraphemeTestTableGen` which * parses `GraphemeBreakTest.txt` * splits each test into graphemes and break opportunities * and serializes everything to a C++ table for use as unit tests `CodepointWidthDetector.cpp` was rewritten from scratch to * use an iterator struct (`GraphemeState`) to maintain state * accumulate codepoints until a break opportunity arises * accumulate the total width of a grapheme * support 3 different measurement modes: Grapheme clusters, `wcswidth`-style, and a mode identical to the old conhost With this in place the following changes were made: * `ROW::WriteHelper::_replaceTextUnicode` now uses the new grapheme cluster text iterators * The same function was modified to join new text with existing contents of the current cell if they join to form a cluster * Otherwise, a ton of places were modified to funnel the selection of the measurement mode over from WT's settings to ConPTY This is part of #1472 ## Validation Steps Performed * So many tests ✅ * https://github.com/apparebit/demicode works fantastic ✅ * UTF8-torture-test.txt works fantastic ✅
- Loading branch information
Showing
54 changed files
with
3,794 additions
and
727 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.