-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regenerate collation test data #2090
Conversation
I have checked that both the full versions of the conformance tests and the disabled-in-repo zh, lt, fi, and sv tests pass with this data update if enabled. |
@echeran Note that this PR also effectively updates the property testdata to Unicode 15. |
Good observation. The only property that is being updated is On the one hand, the change does seem isolated, but on the other hand, we would have a version inconsistency among the properties in ICU4X v1.0 (because updating properties to Unicode v15.0 is slated for after the ICU4X v1.0 release). I have no other reservations about this data update PR except the aforementioned version consistency / release timing. Any thoughts? @hsivonen -- is this a PR that can wait until after the imminent ICU4X v1.0 release (even if you're away and need someone else to click the merge button)? |
When I made this PR, I thought that ICU4X 1.0 was supposed to ship with Unicode 15 support. To the extent #2058 is blocked on the ICU4C patch and to the extent we want the collator and normalizer as non-experimental in 1.0, if the ICU4C patch lands on ICU4C trunk only, the full data export from ICU4C CI will be from ICU4C trunk, i.e. Unicode 15. This PR is about test data only either way. If we want Unicode 14-consistent non-test data to be available for the normalizer and the collator, the ICU4C patch needs to be backported to ICU4C from before ICU4C trunk upgraded to Unicode 15. In that sense, I think leaving this test data patch unmerged doesn't address what non-test data is available as a zip file for non-test use. And on the flip side, to the extent the test data is really just test data, it seems OK to update it in a less synchronized way and the main thing is that the deployment zip file is either all-Unicode 14 or all-Unicode 15. |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Somehow I had managed to be off by three months in my understanding of when Unicode 15 is expected to be released and how it relates to ICU4X 1.0. I have force-pushed a much smaller changeset that syncs the collation and normalization relevant files from an actual (draft PR) CI run from the ICU4C 71 branch. The non-comment change relates to my tweaks to the root generation using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, that SGTM, and it resolves my only issue about property data version consistency. Thanks, everything LGTM.
(Edited.) This is an update from ICU4C 71 branch CI.