-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Invariant Mode Case Mapping #55520
Support Invariant Mode Case Mapping #55520
Conversation
Tagging subscribers to this area: @tarekgh, @safern Issue DetailsFixes #43774 This change is to add full support for case mapping in the Invariant Globalization Mode. Casing will be supported withe string and span operations (Compare, IndexOf, LastIndexOf, IsPrefix, IsSuffix, SortKey generation, calculating the hashcode, ToUpper and ToLower). To support the casing, we must carry some casing data which will increase the System.Private.Corelib by about 13K. I tried to optimize the new introduced casing data as much as I can and in same time having the casing operation still fast, so I generated the data in the form of 8-4-4 tables. Here is more info about the size increase: Before
After
|
@marek-safar @eerhardt @jkotas @safern could you please have a look? I appreciate if I can get your feedback soon if possible, to catch the deadline. |
src/libraries/System.Private.CoreLib/src/System/Globalization/InvariantModeCasing.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/InvariantModeCasing.cs
Outdated
Show resolved
Hide resolved
Our existing ICU call sites special-case the Turkish I. We should consider removing that special-case or adding it here as well so that the invariant-mode casing tables match the non-invariant-mode casing tables. Edit: looking at the bottom of the file, there appears to be some special-casing for the Latin Long S as well? |
I'm glad to see this work come together! 😃 Heads up that IMO this is a substantive breaking change and deserves a callout in the .NET 6 breaking change docs. Also deserves an email blast to the internal breaking change notifications list. |
I already did that when generating the data in the tool. |
Interesting as I didn't think about this way :-) Invariant mode is restrictive and I am not sure if there is anyone possibly depend on the old behavior. But I think wouldn't hurt if I file a breaking change issue. I'll do that. Thanks for pointing at that. |
src/libraries/System.Private.CoreLib/src/System/Globalization/InvariantModeCasing.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/InvariantModeCasing.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/InvariantModeCasing.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Globalization/tests/Invariant/InvariantMode.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/SurrogateCasing.cs
Outdated
Show resolved
Hide resolved
@GrabYourPitchforks your suggestion to use the Unicode category data to generate the casing tables there give us much better size result. Now the whole increase is around 6K compared to 13K before applying your suggestion. Thanks a lot. |
Now everything is ready. I appreciate if you can review this soon so I can try to get it before we snap. Thanks! |
src/coreclr/System.Private.CoreLib/Tools/GenUnicodeProp/CategoryCasingInfo.cs
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/Tools/GenUnicodeProp/CharUnicodeInfoData.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/Ordinal.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/Ordinal.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/SurrogateCasing.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/Tools/GenUnicodeProp/CategoryCasingInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/InvariantModeCasing.cs
Show resolved
Hide resolved
…ryCasingInfo.cs Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>
…CharUnicodeInfo.cs Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>
I have updated the breaking change issue dotnet/docs#24849 to include the case mapping changes we are doing here. |
Any other comments or we are good to go? I hope I can get this merged by tomorrow. thanks for all feedback so far :-) |
Plan on finalizing the review for this tomorrow. Thanks for the patience! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tentative approval, subject to:
-
Take a look at the sorting behavior w.r.t. when one string contains a surrogate pair and the other string doesn't, and double-check that the as-implemented behavior is the behavior we want.
-
See comments in SurrogateCasing.cs and determine whether that method should check its inputs or whether the caller should be more defensive against not passing in bad data.
-
Everything else is perf-related and is at your discretion. The logic otherwise LGTM!
Thanks so much for tacking this Tarek! :)
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/InvariantModeCasing.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/InvariantModeCasing.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/SurrogateCasing.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/OrdinalCasing.Icu.cs
Outdated
Show resolved
Hide resolved
The reset of the CI runs already succeeded but looks it was not reflected in GitHub. I am merging it. |
Fixes #43774
This change is to add full support for case mapping in the Invariant Globalization Mode. Casing will be supported with string and span operations (Compare, IndexOf, LastIndexOf, IsPrefix, IsSuffix, SortKey generation, calculating the hashcode, ToUpper and ToLower).
To support the casing, we must carry some casing data which will increase the System.Private.Corelib by about 6K.
Here is more info about the size increase:
Before
After