Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add numerical ordering option for string comparison operations #109861

Merged
merged 9 commits into from
Nov 25, 2024

Conversation

PranavSenthilnathan
Copy link
Member

@PranavSenthilnathan PranavSenthilnathan commented Nov 15, 2024

Adds numerical ordering for comparison operations (e.g. Compare, Equals, GetHashCode, GetSortKey). This now enables comparisons of numbers based on their numerical value instead of lexicographical order, such as 2 < 10. We don't support Index operations (e.g. StartWith, EndsWith, IsPrefix, IsSuffix) since the underlying globalization libraries, NLS and ICU, don't support it.

Because this new option relies on underlying globalization libraries, there could be differences in behavior for different platforms and libraries:

  • Full ICU (default in most cases):
    • Numbers with leading zeros are treated as equal to equivalent numbers with no leading zeros ("01" == "1").
    • Equivalent numbers in different languages/scripts are treated the same ("1" == "١", where ١ is the Arabic-Indic Digit One).
  • NLS (legacy Windows):
    • Numbers with leading zeros are treated as unequal to equivalent numbers with no leading zeros ("01" != "1"). However, these numbers are compared as expected with unequal numbers, namely 1 < 02 < 2 < 03.
    • Equivalent numbers in different languages/scripts are treated as unequal ("1" != "١")
  • Hybrid ICU:
    • WASM: The same as ICU, but the option set is limited. This PR does not change the valid option set besides allowing adding the NumericOrdering option to any previously valid option. See this for the allowed options.
    • iOS, tvos, maccatalyst: Not supported because I don't have an Apple device to test with. However, Apple does support numeric comparisons for strings.

Closes #13979.

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@PranavSenthilnathan PranavSenthilnathan changed the title [WIP] Numerical ordering for string compare/equals/hashcode Add numerical ordering option for string comparison operations Nov 18, 2024
@PranavSenthilnathan PranavSenthilnathan marked this pull request as ready for review November 18, 2024 22:14
Copy link
Member

@matouskozak matouskozak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for considering the Apple mobile globalization :). Could you please create an issue for the missing support on iOS/..., linking the original issue and the findings that you mentioned in the PR description. I'll look into it later.

I think we might have to temporarily disable the newly added tests for Apple mobile otherwise we will start getting PlatformNotSupportedException for the numeric ordering tests. I started the Apple mobile CI test runs to see if that's the case.

One more thing, in the PR description, there is an example for NLS

However, these numbers are compared as expected with unequal numbers, namely 1 < 02 < 2 < 03.

Does that mean that numbers with leading zeros are always smaller than without. Also, 002 < 02 then?

@PranavSenthilnathan
Copy link
Member Author

/azp run runtime-ioslike,runtime-ioslikesimulator,runtime-maccatalyst

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@PranavSenthilnathan
Copy link
Member Author

Thank you for considering the Apple mobile globalization :). Could you please create an issue for the missing support on iOS/..., linking the original issue and the findings that you mentioned in the PR description. I'll look into it later.

Created #109999

I think we might have to temporarily disable the newly added tests for Apple mobile otherwise we will start getting PlatformNotSupportedException for the numeric ordering tests. I started the Apple mobile CI test runs to see if that's the case.

Updated the PR to skip them. I introduced a new PlatformDetection property IsNumericComparisonSupported instead of reusing the IsHybridGlobalizationOnApplePlatform so it's easier to find and remove once when you get the test working.

One more thing, in the PR description, there is an example for NLS

However, these numbers are compared as expected with unequal numbers, namely 1 < 02 < 2 < 03.

Does that mean that numbers with leading zeros are always smaller than without. Also, 002 < 02 then?

Yes, if the numbers are actually equal, then in NLS the one with more leading zeros is considered less (it's basically a tiebreaker). This behavior is better for deterministic sorting of lists, but the downside is that hash tables won't consider these equal. I prefer ICU's behavior here (JS/wasm does the same) but I don't think there's much we can do about it.

@PranavSenthilnathan
Copy link
Member Author

/azp run runtime-ioslike,runtime-ioslikesimulator,runtime-maccatalyst

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link
Member

@matouskozak matouskozak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the test fixes. I checked the apple mobile CI and everything failing looks to be unrelated and tracked at #103472. Looking good from Apple mobile side.

@ilonatommy do you want to check the WASM changes?

@ilonatommy
Copy link
Member

ilonatommy commented Nov 25, 2024

@ilonatommy do you want to check the WASM changes?

Sorry, I was away for a while. I read the WASM part and it looks good. Let me just run extra platforms with _HybridGlobalization suffix.

Edit: it did not get triggered, I found it under runtime-wasm

@ilonatommy
Copy link
Member

/azp run runtime-extra-platforms

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ilonatommy
Copy link
Member

/azp run runtime-wasm-libtests

Copy link

No pipelines are associated with this pull request.

@ilonatommy
Copy link
Member

/azp run runtime-wasm

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@ilonatommy ilonatommy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HybridGlobalization failures are not connected.

@PranavSenthilnathan PranavSenthilnathan merged commit 45bd118 into dotnet:main Nov 25, 2024
205 of 237 checks passed
mikelle-rogers pushed a commit to mikelle-rogers/runtime that referenced this pull request Dec 10, 2024
…t#109861)

Add numerical ordering option for string comparison operations
@github-actions github-actions bot locked and limited conversation to collaborators Dec 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

String comparer for sorting numeric strings logically
5 participants