-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add numerical ordering option for string comparison operations #109861
base: main
Are you sure you want to change the base?
Add numerical ordering option for string comparison operations #109861
Conversation
Note regarding the
|
Note regarding the
|
/// Indicates that the string comparison must sort sequences of digits (Unicode general category "Nd") based on their numeric value. | ||
/// For example, "2" comes before "10". Non-digit characters such as decimal points, minus or plus signs, etc. | ||
/// are not considered as part of the sequence and will terminate it. | ||
/// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be good to add a remarks to this one giving more information like this option can be used in comparisons but not for search (IndexOf/StartsWith/EndsWith). Will be good to hint the behavior difference too when ICU is used against NLS. And last tell this option cannot be combined with the ordinal operations,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was getting long so I just included just the part about indexing. Mentioning search might be confusing because people might consider IndexOf (which isn't supported) as search and GetHashCode (which is supported) as not. I think it's easier to just say that NumericOrdering works in all cases except for indexing.
I prefer to keep the combination behavior with Ordinal and OrdinalIgnoreCase on those members since this is really a property of theirs instead of numeric ordering. This is also consistent with the other options as well which don't mention Ordinal even though they can't combine with them either.
I think the ICU and NLS differences should probably go in docs rather than in doc comments since NLS usage is not going to be high. There are already docs about NLS/ICU differences that we can append to (https://learn.microsoft.com/en-us/dotnet/core/extensions/globalization-icu#behavioral-differences).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if you prefer that, it will be better to edit the docs of the indexing APIs and add the remark there. I am trying to make it easy for the API users to understand when this new enum value is not allowed. I guess users can be puzzled if they get exceptions and do not understand what is wrong.
You don't have to block the PR on that, but it will be good to open a doc issue/PR to add the info as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -9186,6 +9186,7 @@ public enum CompareOptions | |||
IgnoreSymbols = 4, | |||
IgnoreKanaType = 8, | |||
IgnoreWidth = 16, | |||
NumericOrdering = 32, | |||
OrdinalIgnoreCase = 268435456, | |||
StringSort = 536870912, | |||
Ordinal = 1073741824, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we change this to use hex numbers for the sake of the readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how the GenAPI tool generates it and the guidance I've heard is to make as few diffs from that tool as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but still can make diffs :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ref assembly isn’t really designed for readability, it’s designed to be correct and automatically generated
Deviations and manual diffs just cause later downstream pain and hinder the ability to rerun the tool, as people have to fight against it
The better option would be to submit a bug or better a patch such that the output produced by the tool for flags enabled enums is “better” such as using hex or logical shifts to represent the bits instead (1 << 0, 1 << 1, 1 << 2
, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created an issue: dotnet/sdk#44999. GenAPI seems to be in a kind of limbo state where there is a new Roslyn based version we want to switch to (tracked by dotnet/sdk#31088) and we don't want to maintain the current CCI-based one (https://github.com/dotnet/arcade/blob/main/src/Microsoft.DotNet.GenAPI/README.md).
src/libraries/System.Runtime/tests/System.Runtime.Tests/System/StringComparerTests.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few minor comments, LGTM otherwise. Thanks @PranavSenthilnathan!
I'll let @matouskozak and @ilonatommy comment on the hybrid and WASM stuff.
/azp run runtime-ioslike,runtime-ioslikesimulator,runtime-maccatalyst |
Azure Pipelines successfully started running 3 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for considering the Apple mobile globalization :). Could you please create an issue for the missing support on iOS/..., linking the original issue and the findings that you mentioned in the PR description. I'll look into it later.
I think we might have to temporarily disable the newly added tests for Apple mobile otherwise we will start getting PlatformNotSupportedException
for the numeric ordering tests. I started the Apple mobile CI test runs to see if that's the case.
One more thing, in the PR description, there is an example for NLS
However, these numbers are compared as expected with unequal numbers, namely
1 < 02 < 2 < 03
.
Does that mean that numbers with leading zeros are always smaller than without. Also, 002 < 02
then?
/azp run runtime-ioslike,runtime-ioslikesimulator,runtime-maccatalyst |
Azure Pipelines successfully started running 3 pipeline(s). |
Created #109999
Updated the PR to skip them. I introduced a new PlatformDetection property IsNumericComparisonSupported instead of reusing the IsHybridGlobalizationOnApplePlatform so it's easier to find and remove once when you get the test working.
Yes, if the numbers are actually equal, then in NLS the one with more leading zeros is considered less (it's basically a tiebreaker). This behavior is better for deterministic sorting of lists, but the downside is that hash tables won't consider these equal. I prefer ICU's behavior here (JS/wasm does the same) but I don't think there's much we can do about it. |
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.Icu.cs
Outdated
Show resolved
Hide resolved
/azp run runtime-ioslike,runtime-ioslikesimulator,runtime-maccatalyst |
Azure Pipelines successfully started running 3 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the test fixes. I checked the apple mobile CI and everything failing looks to be unrelated and tracked at #103472. Looking good from Apple mobile side.
@ilonatommy do you want to check the WASM changes?
Adds numerical ordering for comparison operations (e.g. Compare, Equals, GetHashCode, GetSortKey). This now enables comparisons of numbers based on their numerical value instead of lexicographical order, such as 2 < 10. We don't support Index operations (e.g. StartWith, EndsWith, IsPrefix, IsSuffix) since the underlying globalization libraries, NLS and ICU, don't support it.
Because this new option relies on underlying globalization libraries, there could be differences in behavior for different platforms and libraries:
"01" == "1"
)."1" == "١"
, where ١ is the Arabic-Indic Digit One)."01" != "1"
). However, these numbers are compared as expected with unequal numbers, namely1 < 02 < 2 < 03
."1" != "١"
)Implements #13979.