Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit PNSE by using default comparer in HybridGlobalization #96541

Closed
wants to merge 5 commits into from

Conversation

ilonatommy
Copy link
Member

@ilonatommy ilonatommy commented Jan 5, 2024

Improved version of #96354. Fixes #96400.
Changes:

  • Sets default comparer=StringComparer.Ordinal for string-keyed collections that have access to GlobalizationMode (= are in System.Private.CoreLib):
    System.Collections.Generic.Dictionary, System.Collections.Generic.HashSet, System.Collections.Hashtable
    Dictionary and HashSet are generic collections and we know their key type in the compile time. We can decide if the collection requires the default comparer (has keys of string type) or not, in collection constructor.
    Hashtable is non-generic, types of keys are not known in the compile time. Because of this, logic of comparer initialization was moved to Add() function. If the first element is string, we assume all the other will be strings. Hashtable does not prevent users from adding keys of different types and it's only a good practice to keep them in the same type. In HybridGlobalization keys of different types are not supported, adding non-string key to collection that has comparer=StringComparer will throw ArgumentException(Arg_MustBeString).
  • For collections that do not belong to System.Private.CoreLib assembly (that is:
    System.Collections.Specialized.OrderedDictionary, System.Collections.Specialized.NameValueCollection, System.Collections.Specialized.NameValueCollection, System.Collections.Specialized.NameObjectCollectionBase) we cannot detect if they are run in Hybrid Globalization mode. It is still possible to use these collections if StringComparer.Ordinal or StringComparer.OrdinalIgnoreCase will be passed to their constructors. We change the tests to use the constructor with parameter for HG.
  • Changed the PNSE message to point into using the cctr with StringComparer.Ordinal or StringComparer.OrdinalIgnoreCase parameter.
  • Classes: System.Data.DataColumnCollection, Microsoft.VisualBasic.Collection, System.Net.Mail.MailAddress, System.Collections.CaseInsensitiveHashCodeProvider (deprecated) have no constructor with parameter and are not in System.Private.CoreLib, so none of above apply to them. They will stay unsupported.

@ghost
Copy link

ghost commented Jan 5, 2024

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

Improved version of #96354. Changes:

  • Sets default comparer=StringComparer.Ordinal for string-keyed collections that have access to GlobalizationMode:
    System.Collections.Generic.Dictionary, System.Collections.Generic.HashSet, System.Collections.Hashtable
    Dictionary and HashSet are generic collections and we know their key type in the compile time. We can decide if the collection requires the default comparer (has keys of string type) or not, in collection constructor.
    Hashtable is non-generic, types of keys are not known in the compile time. Because of this, logic of comparer initialization was moved to Add() function. If the first element is string, we assume all the other will be strings. Hashtable does not prevent users from adding keys of different types and it's only a good practice to keep them in the same type. In HybridGlobalization keys of different types are not supported, adding non-string key to collection that has comparer=StringComparer will throw ArgumentException(Arg_MustBeString).
  • For collections that do not belong to System.Private.CoreLib assembly (that is:
    System.Collections.Specialized.OrderedDictionary, System.Collections.Specialized.NameValueCollection, System.Collections.Specialized.NameValueCollection, System.Collections.Specialized.NameObjectCollectionBase) we cannot detect if they are run in Hybrid Globalization mode. It is still possible to use these collections if StringComparer.Ordinal or StringComparer.OrdinalIgnoreCase will be passed to their constructors. We change the tests to use the constructor with parameter for HG.
  • Changed the PNSE message to point into using the cctr with StringComparer.Ordinal or StringComparer.OrdinalIgnoreCase parameter.
  • Classes: System.Data.DataColumnCollection, Microsoft.VisualBasic.Collection, System.Net.Mail.MailAddress, System.Collections.CaseInsensitiveHashCodeProvider (deprecated) have no constructor with parameter and are not in System.Private.CoreLib, so none of above apply to them. They will stay unsupported.
Author: ilonatommy
Assignees: ilonatommy
Labels:

arch-wasm, area-System.Globalization

Milestone: -

@@ -415,6 +416,17 @@ private uint InitHash(object key, int hashsize, out uint seed, out uint incr)
//
public virtual void Add(object key, object? value)
{
#if TARGET_BROWSER
if (_keycomparer == null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to move this logic to constructor, because there are other methods which are able to modify the collection. like public virtual object? this[object key]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please link the issue which explains motivations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we move it to constructor when we don't know what type will be added to Hashtable? If we add int then it will throw because we set comparer to StringComparer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could somebody add "1" string and then 2 int ? That would fail anyway, right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the problem is that for Hashtable it's perfectly allowed, even when not recommended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we assume that users don't mix types, constructor does not know what types will be put into the non-generic collection.

public void IListedKeysPropertyCanUseCustomEqualityComparer()
{
var orderedDictionary = new OrderedDictionary(StringComparer.InvariantCultureIgnoreCase);
var orderedDictionary = PlatformDetection.IsHybridGlobalizationOnBrowser ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please link the issue which explains this

#if TARGET_BROWSER
if (GlobalizationMode.Hybrid)
{
_comparer = comparer ?? (IEqualityComparer<T>)(IEqualityComparer<string>)StringComparer.Ordinal;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please link the issue which explains motivations.

#if TARGET_BROWSER
if (GlobalizationMode.Hybrid)
{
_comparer = comparer ?? (IEqualityComparer<TKey>)(IEqualityComparer<string>)StringComparer.Ordinal;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please link the issue which explains motivations.

yield return new object[] { 10, null };
if (PlatformDetection.IsNotHybridGlobalizationOnBrowser)
{
yield return new object[] { 0, null };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this doesn't work ?

Copy link
Member Author

@ilonatommy ilonatommy Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because comparer is null, so the default one will be used. We don't have any means of checking if the run is HybridGlobalization or not in System.Collections.Specialized, so all the responsibility lies on the user to use the correct constructor. Here in the test the incorrect one is used.

@@ -72,7 +73,18 @@ public Dictionary(int capacity, IEqualityComparer<TKey>? comparer)
if (typeof(TKey) == typeof(string) &&
NonRandomizedStringEqualityComparer.GetStringComparer(_comparer!) is IEqualityComparer<string> stringComparer)
{
#if TARGET_BROWSER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds pretty broken to be changing behavior of generic collections when hybrid globalization is enabled. Can this be fixed in the string comparers instead, so that anybody using the string comparers gets the right behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it

Copy link
Member Author

@ilonatommy ilonatommy Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea but so far the implementation of it that I have in my head is also messy. So we could go to

public abstract class StringComparer : IComparer, IEqualityComparer, IComparer<string?>, IEqualityComparer<string?>

and for each comparer option (InvariantCulture, InvariantCultureIgnoreCase, etc) make a preprocessor directive #if TARGET_BROWSER then use Ordinal or OrdinalIgnoreCase instead. However, it is also problematic:

  • All the usages of these StringComparer types, even in the context of collections that are currently not affected by PNSE will be changed.
  • User that calls StringComparer.CurrentCulture expects current culture, not Ordinal (this is public member). This can be documented but does not look good.
    Or maybe I missed the point and you had a different idea, @jkotas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding we are playing tradeoffs here. Hybrid is partial between ICU and InvariantCulture.

StringComparer.CurrentCulture expects locale, but we can't do it because we don't have hybrid hashcode, right ?
So we could keep PNSE or we could do InvariantCulture, that's our tradeoff.

Or is Ordinal better ? Why ?

Copy link
Member Author

@ilonatommy ilonatommy Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it does not use ICU so we don't need native SortKey implementation to keep it working as expected. The idea was: only Ordinal and OrdinalIgnoreCase are able to work in Hybrid Globalization, so use them always for collections that wanted to use other options and thanks to that, keep these collections supported. If we "keep PNSE or we could do InvariantCulture" (InvariantCulture throws the same way as any other culture, because it uses CompareInfo implementation) then we would just stick to the current behavior. And we don't want it, we would like to make collections listed in the description of this PR available to the users.

Copy link
Member Author

@ilonatommy ilonatommy Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you're mixing InvariantCulture and InvariantGlobalization. "Hybrid is partial between ICU and InvariantCulture." -> "Hybrid is partial between ICU and InvariantGlobalization."

When we use InvariantCulture we still use the same paths in the code as for ICU.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are hundreds of different types of custom collections out there in the ecosystem, and many of them are affected the same way as the few collections that you are changing in this PR. It is not ok to break all these custom collections out there with hybrid globalization. I would be a major violation of our code portability and compatibility promise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas do you mean the other solution that I proposed is also invalid? If so, I am really out of ideas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we go back to the drawing board with the hybrid globalization then? Given the feature gaps in the hybrid globalization, it may make more sense to think about it as invariant globalization with a few extra culture aware APIs working.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to limitations of Web API the Hybrid Globalization feature in the Browser in fact turned out to be less promising than e.g. iOS's implementation where we were able to really maintain ICU-like experience with Invariant-like size. However, the unsupported scope is not that huge, we are listing all the limitations in a doc (https://github.com/dotnet/runtime/blob/928ff3015d1936ca9985a0123754e13cbf47b237/docs/design/features/globalization-hybrid-mode.md). I do not believe a re-design would help in the issue we have with SortKey. We know we cannot support it nativity (and loading it from ICU does not pay off in this point, we would loose all the benefits of collations removal). Now we have a brainstorm how to solve it some other way.

@ilonatommy ilonatommy closed this Jan 16, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[browser] HybridGlobalization allows different hashes for strings that return true for Equals
3 participants