Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store only secondary weight in diacritic table and remove jamo tailoring bit #1977

Closed
wants to merge 11 commits into from

Conversation

hsivonen
Copy link
Member

@hsivonen hsivonen commented Jun 2, 2022

This both simplifies the case that the table is designed for and makes it possible to have non-self-contained CE32 diacritic tailorings.

…without ignoring default ignorables

Default ignorables are not ignored, because doing so would violate the fundamental assumption
of the normalizes that every input character produces non-empty output.

The expectation is that real NFKC_CaseFold will be implemented by first filtering out default ignorables
and then plugging the NFKD_CaseFold data into the upcoming `ComposingNormalizer` code that will turn
NFD into NFC and NFKD into NFKC.
Saves 7332 bytes in data size.
…and allow dynamic further shortening in tailorings

This makes the action of turning a value read from the table into a `CollationElement` super-simple (and branchless).
@hsivonen hsivonen added C-collator Component: Collation, normalization S-medium Size: Less than a week (larger bug fix or enhancement) labels Jun 2, 2022
@hsivonen hsivonen self-assigned this Jun 2, 2022
@hsivonen hsivonen requested a review from echeran June 2, 2022 10:51
@hsivonen
Copy link
Member Author

hsivonen commented Jun 2, 2022

(I marked this as a draft only because the PR also contains the changesets for #1967. Once that lands, the Files changed view here becomes more useful.)

@hsivonen hsivonen changed the title Store only secondary weight in diacritic table Store only secondary weight in diacritic table and remove jamo tailoring bit Jun 2, 2022
@hsivonen
Copy link
Member Author

hsivonen commented Jun 2, 2022

Jamo tailoring only applies to search collations. Sorting those out is #1941 post-1.0.

@hsivonen hsivonen added this to the ICU4X 1.0 (Features) milestone Jun 2, 2022
@hsivonen hsivonen closed this Jun 2, 2022
@hsivonen hsivonen deleted the diacritics branch June 2, 2022 11:38
@hsivonen
Copy link
Member Author

hsivonen commented Jun 2, 2022

Sorry about messing up with the gh tool and pushing a branch to the wrong place. Migrated the PR to #1978.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-collator Component: Collation, normalization S-medium Size: Less than a week (larger bug fix or enhancement)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant