Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Equality and Hashing Issues in RedactorProvider with DataClassificationSet Keys in FrozenDictionary #4932

Closed
damianhorna opened this issue Feb 9, 2024 · 0 comments · Fixed by #4933
Labels
bug This issue describes a behavior which is not expected - a bug. work in progress 🚧

Comments

@damianhorna
Copy link
Contributor

damianhorna commented Feb 9, 2024

Description

The DataClassificationSet class in the .NET Extensions library implements custom Equals and GetHashCode methods to support value-based equality. However, the current GetHashCode implementation may not ensure consistent hash codes for logically identical instances.

See: https://github.com/dotnet/extensions/blob/main/src/Libraries/Microsoft.Extensions.Compliance.Abstractions/Classification/DataClassificationSet.cs#L86

This inconsistency has implications for classes like RedactorProvider, which relies on FrozenDictionary<DataClassificationSet, Redactor> for mapping classifications to redactors. If two identical DataClassificationSet instances (by content) produce different hash codes, it undermines the reliability of key-based lookups in RedactorProvider, potentially leading to incorrect behavior such as failing to retrieve the correct Redactor for a given classification set.

Refer to RedactorProvider implementation for Redactor lookup:
https://github.com/dotnet/extensions/blob/main/src/Libraries/Microsoft.Extensions.Compliance.Redaction/RedactorProvider.cs#L28

Reproduction Steps

  1. Define multiple DataClassificationSet instances with identical contents but as separate instances.
  2. Initialize a RedactorProvider instance with mappings that include one of the DataClassificationSet instances as a key.
  3. Attempt to retrieve a Redactor using RedactorProvider.GetRedactor with a logically identical DataClassificationSet instance (not the same instance used in initialization). HashSet<DataClassification> will have its hash code calculated based on the HashSet instance's runtime identity, not the contents of the HashSet. Thus, two DataClassificationSet instances containing identical items may return different hash codes.
  4. Observe potential inconsistencies in the retrieval outcome due to differing hash codes or failed equality checks.
// Identical DataClassificationSet instances (by content, different instances)
var classificationSet1 = new DataClassificationSet(new DataClassification("TaxonomyName", "Value"));
var classificationSet2 = new DataClassificationSet(new DataClassification("TaxonomyName", "Value"));

// NOTE: classificationSet1 and classificationSet2 will have different hash codes

// Retrieve a redactor based on the classification set
var redactor = redactorProvider.GetRedactor(classificationSet1);
var redactor = redactorProvider.GetRedactor(classificationSet2;

// Above call to redactorProvider will have inconsistent results

Expected behavior

The RedactorProvider.GetRedactor method should consistently return the correct Redactor for any DataClassificationSet instance that is logically identical to a key in the underlying FrozenDictionary, ensuring that DataClassificationSet's value-based equality is effectively utilized in key-based lookups.

Actual behavior

There may be instances where RedactorProvider.GetRedactor fails to return the correct Redactor for logically identical DataClassificationSet instances. This failure can occur due to the current GetHashCode implementation in DataClassificationSet not guaranteeing consistent hash codes for identical set contents, leading to potential lookup failures in FrozenDictionary.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

Suggested fix:
Revise the GetHashCode and possibly Equals implementations in DataClassificationSet to ensure they account for the contents of the set in a consistent and order-independent manner. This might involve aggregating the hash codes of contained DataClassification instances in a way that is unaffected by their order. Additionally, ensure that RedactorProvider and any similar components relying on DataClassificationSet instances as keys in dictionaries or sets properly handle these instances to maintain lookup reliability and correctness.

@damianhorna damianhorna added bug This issue describes a behavior which is not expected - a bug. untriaged labels Feb 9, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue describes a behavior which is not expected - a bug. work in progress 🚧
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant