Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit PNSE by using Invariant HashCode in HybridGlobalization #96354

Merged
merged 24 commits into from
Jan 19, 2024

Conversation

ilonatommy
Copy link
Member

@ilonatommy ilonatommy commented Dec 29, 2023

Fixes paritally #95921.

We used to throw PNSE for all HashCode functions. The consequence was throwing PNSE for quite a few APIs, some of them crucial and far-connected with Globalization. With this PR the following APIs do not throw PNSE always (see the update in the doc for details):

- System.Collections.Hashtable.Add 
- System.Collections.Hashtable.GetHash 
- System.Collections.CaseInsensitiveHashCodeProvider.GetHashCode 
- System.Collections.Specialized.NameValueCollection.Add 
- System.Collections.Specialized.NameValueCollection.Remove 
- System.Collections.Specialized.NameValueCollection.Get 
- System.Collections.Specialized.NameValueCollection.Set 
- System.Collections.Specialized.NameObjectCollectionBase.BaseAdd 
- System.Collections.Specialized.NameObjectCollectionBase.BaseGet 
- System.Collections.Specialized.NameObjectCollectionBase.BaseRemove 
- System.Collections.Specialized.NameObjectCollectionBase.BaseSet 
- System.Collections.Specialized.OrderedDictionary.Add 
- System.Collections.Generic.HashSet 
- System.Collections.Generic.Dictionary

JS does not expose native, locale-sensitive hashing that we could use, so we decided to utilize Invariant version of functions instead with some limitations. Invariant hashing algorithm differs from ICU hashing with not ignoring empty unicode chars (e.g. zero width joiner), the strings get cleaned from them first.

  • Invariant locale - always supported with this PR by using invariant hashing method. This approach differs from the ICU approach where invariant culture was getting data from ICU4C.
  • Non-invariant locales with CompareOptions.None - use invariant hashing algo.
  • Non-invariant locales with CompareOptions.IgnoreCase - use JS-native toLower() and then invariant hashing algo.
  • Non-invariant locales with any other CompareOptions e.g. CompareOptions.IgnoreCase | CompareOptions.IgnoreSymbols - keep throwing PNSE, as it was without this PR.

It will be documented in the HG doc, so far I did not find an example where the behavior of APIs that threw PNSE so far differs between ICU and HG because of this change.

Out of scope (for follow-up PRs):

  • performance tests for methods using HashCode and performance improvements
  • SortKey - investigate similar change

@ilonatommy ilonatommy self-assigned this Dec 29, 2023
@ghost
Copy link

ghost commented Dec 29, 2023

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #95921.

Author: ilonatommy
Assignees: ilonatommy
Labels:

arch-wasm, area-System.Globalization

Milestone: -

@dotnet dotnet deleted a comment from azure-pipelines bot Dec 29, 2023
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ilonatommy ilonatommy requested a review from matouskozak January 2, 2024 15:12
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ilonatommy ilonatommy marked this pull request as ready for review January 2, 2024 15:13
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

…ted for non-invariant cultures only with `IgnoreCase` or `None` options.
Copy link
Member

@pavelsavara pavelsavara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for patience with my feedback.

@ilonatommy ilonatommy changed the title Limit PNSE by using Invariant SortKey in HybridGlobalization Limit PNSE by using Invariant HashCode and SortKey in HybridGlobalization Jan 18, 2024
@ilonatommy ilonatommy changed the title Limit PNSE by using Invariant HashCode and SortKey in HybridGlobalization Limit PNSE by using Invariant HashCode in HybridGlobalization Jan 18, 2024
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ilonatommy
Copy link
Member Author

/azp run runtime-wasm-libtests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ilonatommy
Copy link
Member Author

AOT lanes won't finish, they are generally broken. Failures are not related.

@ilonatommy ilonatommy merged commit 993b23f into dotnet:main Jan 19, 2024
181 of 193 checks passed
ilonatommy added a commit to ilonatommy/runtime that referenced this pull request Jan 22, 2024
tmds pushed a commit to tmds/runtime that referenced this pull request Jan 23, 2024
…otnet#96354)

* Unblock all tests.

* SortKey is Invariant now.

* Missing SortKey changes + fix HashCode.

* Typo

* Revert unbocking tests connected with CompareOptions PNSE.

* Hashing uses Invariant mode -these tests should be skipped.

* Add active issue.

* feedback

* Feedback

* Better documentation.

* Add new tests + sanitize string before invariant comparison.

* Comment + more cases.

* Clean CI.

* Missing change for clean CI commit.

* `SortKey` not supported for non-invariant cultures, `HashCode` supported for non-invariant cultures only with `IgnoreCase` or `None` options.

* Feedback.

* Fix build, add docs.

* Feedback @matouskozak @pavelsavara

* Added tests + fixed algo.

* Block failing tests for a follow-up PR.

* Add more details to PNSE.

* Feedback - correct comment
ilonatommy added a commit that referenced this pull request Jan 25, 2024
* Moving the tests, cleaning up.

* Removal.

* Rename: fix build.

* Fix order, apply feedback

* Missing changes.

* Fixed by #96354

* This test uses SortKey, should not be enabled.

* Nit

* These methods use `CompareOptions = IgnoreCase, IgnoreKanaType, IgnoreWidth`, should stay blocked.
@github-actions github-actions bot locked and limited conversation to collaborators Mar 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants