Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up KeyAnalyzer for substring based frozen collections #89689

Closed
wants to merge 7 commits into from

Commits on Jul 29, 2023

  1. Move the IsLeft/IsRight decision out of the loop

    The slicing is really only changed once per Count, so move the
    IsLeft-dependent logic into `Slicer` method and eliminate the `GetSpan` delegate.
    
    Changed to also pass the already-computed `set` of unique substrings to the `CreateAnalysisResults` method, so we don't recompute the slices twice. In order than either the set or the original `uniqueStrings` can be passed, swapped that argument for the `Analyze` method to take the `uniqueStrings` as a `string[]` (which it already is at all call-sites).
    IDisposable committed Jul 29, 2023
    Configuration menu
    Copy the full SHA
    677d59d View commit details
    Browse the repository at this point in the history
  2. Revert one optimization

    Subtle bug in that the entire string is being placed in the set, not the span.
    IDisposable committed Jul 29, 2023
    Configuration menu
    Copy the full SHA
    c8c8801 View commit details
    Browse the repository at this point in the history
  3. Only count the input strings once.

    Since we are working with the same set of input strings in each strategy there's no reason to take the length every time we make an attempt (per count, both left and right justified).
    Hoist the calculation of the acceptable number of collisions out to the top, do it once, and pass that number into the `HasSufficientUniquenessFactor` method for it to (locally) use up.
    IDisposable committed Jul 29, 2023
    Configuration menu
    Copy the full SHA
    a168767 View commit details
    Browse the repository at this point in the history
  4. Back to string[] for HasSufficientUniquenessFactor

    Benchmarks ever so slightly better.
    IDisposable committed Jul 29, 2023
    Configuration menu
    Copy the full SHA
    860ebb3 View commit details
    Browse the repository at this point in the history
  5. Back to ReadOnlySpan

    Looks like the overhead of IEnumerable<string> is not worth the savings for the benchmark test data. Perhaps it would matter less if we were freezing more strings, but not likely
    IDisposable committed Jul 29, 2023
    Configuration menu
    Copy the full SHA
    c1dd5d9 View commit details
    Browse the repository at this point in the history

Commits on Jul 30, 2023

  1. Use trait-based classes to get everything inlined

    This will cost one extra `HashSet<string>` and one extra `SubstringComparer` to be allocated, but might make the code run faster.
    
    Use the GSW strategy for virtual flattening
    IDisposable committed Jul 30, 2023
    Configuration menu
    Copy the full SHA
    7060c23 View commit details
    Browse the repository at this point in the history

Commits on Aug 1, 2023

  1. Code review cleanup

    Replace ISubstringComparer with ISubstringEqualityComparer as that's more indicative of the interface it's exposing, and rename everything derived from that.
    Fixed the implementation of FullStringEqualityComparer to not actually use the slice ReadOnlySpan(s) in the Equals and GetHashCode specializations.
    Added comments for the SubstringEquality classes
    Renamed the files for the specialization of the comparers.
    Fixed the unit tests under debug builds.
    Extended the test suite data for the KeyAnalyzer tests to exercise using the data in FrozenFromKnownValuesTests.
    IDisposable committed Aug 1, 2023
    Configuration menu
    Copy the full SHA
    23d028f View commit details
    Browse the repository at this point in the history