Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Unicode data to 15.0 #5864

Merged
merged 1 commit into from
Aug 13, 2023
Merged

Conversation

crlf0710
Copy link
Member

@crlf0710 crlf0710 commented Jul 29, 2023

This replaces the usage of unicode_categories crate which hasn't been updated for 7 years with the new unicode-properties crate under unicode-rs organization.

Also bumps dependencies version of other unicode-* crates in lockfiles.

cc rust-lang/rust#101840

@calebcartwright
Copy link
Member

Thank you for the PR! Definitely agree this is a change we need to push through, and I'm glad to see that none of our testing (including idempotence against various repos - https://github.com/rust-lang/rustfmt/actions/runs/5843722535/job/15846117620) is impacted by this dep change.

However, this PR reminds me of a longstanding feeling I've had that we don't have enough breadth and rigor in our testing around string content & encodings, and that's something we should look to improve in the future.

@crlf0710 & @Manishearth by any chance do you know off hand of any Rust repos that have a good amount strings/comments/etc. with a variety of characters and/or encodings that we could consider incorporating into our test suite? (I suspect there's some good candidates under https://github.com/unicode-rs but figured I'd ask to see if you had any specific repos you'd suggest)

@calebcartwright calebcartwright added release-notes Needs an associated changelog entry and removed pr-not-reviewed labels Aug 13, 2023
@calebcartwright calebcartwright merged commit 9f58224 into rust-lang:master Aug 13, 2023
27 checks passed
@Manishearth
Copy link
Member

ICU4X should have a bunch too. Don't have specific thoughts, a lot of the unicode-rs ones use escape codes.

@calebcartwright
Copy link
Member

Thanks Manish!

@crlf0710 crlf0710 deleted the unicode15 branch August 14, 2023 03:13
@ytmimi ytmimi removed the release-notes Needs an associated changelog entry label Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants