Speed up KeyAnalyzer for substring based frozen collections #89689

IDisposable · 2023-07-30T06:38:48Z

Building on PR #88516, I tried making everything as explicitly inline-able and sealed to get the JIT to eliminate any conditional jumps in the hot loop by teasing out the various comparator options into trait-like classes to mix in to get sealed classes where all the logic in conditional-free (down to the internals of the String.AsSpan() methods themselves. This required making some potentially hard-to-read changes, but the essence is that now we construct a comparer and hash set for the from-the-left attempt, and then lazily construct another comparer and matching hash set only if we need one for from-the-right attempt.

Since in the vast number of cases, we're going to spend the bulk of the execution time iterating through the uniqueStrings we're presented with, the overhead of having another comparer/collection is pretty low in comparison.

To ensure we don't bloat memory. I reversed when we clean the set to right before we return false from HasSufficientUniquenessFactor since we know that we're going to get called again for a different index/count... this means that both HashSet<string>s spend their lifetimes empty except when being tested (we could clear them on success, as well, but the old code didn't bother).

The resulting SubstringComparer classes could be useful elsewhere, but are currently just internal to the System.Collections.Immutable assembly. I suspect we could/should pass them out in the AnalysisResults and use the chosen substring extractor as it already has the offset, length, and ignore/honor case encapsulated, but that would be another pass. If we do that, I would like to add some tests for those classes first.

I also added some more tests to the KeyAnalyser tests to verify newly exposed behavior.

The other changes imported from PR #88516 here are:

Eliminate the the inner TryUseSubstring because we can just early return the calculated results as we build them
Hoist the calculation of acceptableNonUniqueCount out to the top level since it never changes (which means we pass that into the HasSufficientUniquenessFactor method for it to "use up" internally (passed by value, so unchanged at call-site)
Eliminated the delegate ReadOnlySpan<char> GetSpan and use, which helps reduce dynamic dispatch overhead in the CreateAnalysisResults method
Eliminated the IsLeft field of the SubstringComparer since we can tell by the Index being negative that we're doing right-justified slicing (and documented that on the class)
Changed the logic managing the Index and Count on the comparer for right-justified substrings.
Added [MethodImpl(MethodImplOptions.AggressiveInlining)] to the Equals and GetHashCode overrides.

The slicing is really only changed once per Count, so move the IsLeft-dependent logic into `Slicer` method and eliminate the `GetSpan` delegate. Changed to also pass the already-computed `set` of unique substrings to the `CreateAnalysisResults` method, so we don't recompute the slices twice. In order than either the set or the original `uniqueStrings` can be passed, swapped that argument for the `Analyze` method to take the `uniqueStrings` as a `string[]` (which it already is at all call-sites).

Subtle bug in that the entire string is being placed in the set, not the span.

Since we are working with the same set of input strings in each strategy there's no reason to take the length every time we make an attempt (per count, both left and right justified). Hoist the calculation of the acceptable number of collisions out to the top, do it once, and pass that number into the `HasSufficientUniquenessFactor` method for it to (locally) use up.

Benchmarks ever so slightly better.

Looks like the overhead of IEnumerable<string> is not worth the savings for the benchmark test data. Perhaps it would matter less if we were freezing more strings, but not likely

ghost · 2023-07-30T06:38:59Z

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author:	IDisposable
Assignees:	-
Labels:	`area-System.Collections`, `community-contribution`
Milestone:	-

IDisposable · 2023-07-30T06:48:09Z

Performance tests.

Baseline is the main as of d5c4a4e
Faster Freeze is the code in PR Move the IsLeft/IsRight decision out of the loop and use computed substring set #88516 which shows between a 1%-3% performance improvement and unchanged allocation (see that PR for discussion)
Trait Based is this code (which picks up where faster freeze left off), and shows identical allocation and a 1%-3% performance gain on my test machine.

BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22631.2115)
Intel Core i7-10875H CPU 2.30GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=8.0.100-preview.7.23322.33
  [Host]     : .NET 8.0.0 (8.0.23.32106), X64 RyuJIT AVX2
  Job-SVXVNR : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-CESATQ : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-YVAYLV : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Method	Job	Toolchain	Count	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(2ms)	RatioSD	Gen0	Gen1	Gen2	Allocated	Alloc Ratio
ToFrozenDictionary	Job-SVXVNR	Trait Based	10	1.233 μs	0.0240 μs	0.0247 μs	1.241 μs	1.183 μs	1.279 μs	0.99	Same	0.02	0.2038	-	-	1.68 KB	1.00
ToFrozenDictionary	Job-CESATQ	Baseline	10	1.250 μs	0.0246 μs	0.0242 μs	1.257 μs	1.210 μs	1.281 μs	1.00	Base	0.00	0.2056	-	-	1.68 KB	1.00
ToFrozenDictionary	Job-YVAYLV	Faster Freeze	10	1.225 μs	0.0244 μs	0.0280 μs	1.215 μs	1.190 μs	1.260 μs	0.98	Same	0.03	0.2031	-	-	1.67 KB	1.00

ToFrozenDictionary	Job-SVXVNR	Trait Based	100	7.263 μs	0.1453 μs	0.1554 μs	7.286 μs	7.035 μs	7.590 μs	0.99	Same	0.03	1.4245	0.0297	-	11.83 KB	1.00
ToFrozenDictionary	Job-CESATQ	Baseline	100	7.307 μs	0.1418 μs	0.1393 μs	7.366 μs	7.068 μs	7.471 μs	1.00	Base	0.00	1.4462	0.0284	-	11.83 KB	1.00
ToFrozenDictionary	Job-YVAYLV	Faster Freeze	100	7.461 μs	0.1212 μs	0.1134 μs	7.504 μs	7.253 μs	7.637 μs	1.02	Same	0.02	1.4284	0.0292	-	11.82 KB	1.00

ToFrozenDictionary	Job-SVXVNR	Trait Based	1000	68.360 μs	1.0448 μs	0.9773 μs	68.455 μs	66.576 μs	69.596 μs	0.98	Same	0.02	11.1702	2.6596	-	92.64 KB	1.00
ToFrozenDictionary	Job-CESATQ	Baseline	1000	69.919 μs	0.8861 μs	0.7399 μs	70.078 μs	68.216 μs	70.834 μs	1.00	Base	0.00	11.1607	2.5112	-	92.64 KB	1.00
ToFrozenDictionary	Job-YVAYLV	Faster Freeze	1000	67.473 μs	1.1315 μs	1.1620 μs	67.719 μs	65.659 μs	69.484 μs	0.97	Same	0.02	11.0759	2.6371	-	92.63 KB	1.00

ToFrozenDictionary	Job-SVXVNR	Trait Based	10000	1,104.651 μs	4.3610 μs	3.6416 μs	1,104.166 μs	1,098.515 μs	1,112.397 μs	1.00	Same	0.01	148.4375	148.4375	148.4375	904.43 KB	1.00
ToFrozenDictionary	Job-CESATQ	Baseline	10000	1,108.193 μs	10.2913 μs	8.0347 μs	1,109.215 μs	1,087.775 μs	1,123.516 μs	1.00	Base	0.00	145.8333	145.8333	145.8333	904.43 KB	1.00
ToFrozenDictionary	Job-YVAYLV	Faster Freeze	10000	1,093.550 μs	12.1788 μs	10.1698 μs	1,094.617 μs	1,064.160 μs	1,107.952 μs	0.99	Same	0.01	145.8333	145.8333	145.8333	904.42 KB	1.00

This will cost one extra `HashSet<string>` and one extra `SubstringComparer` to be allocated, but might make the code run faster. Use the GSW strategy for virtual flattening

IDisposable · 2023-07-30T12:06:27Z

This was a fantastic learning experience... This gained, at most, 3% performance without adversely impacting the allocation patterns and helped me learn the GSW pattern's application other places we want to eliminate virtual and delete overheads.

danmoseley · 2023-07-30T14:37:59Z

What is GSW pattern in this context? Maybe i missed it.

IDisposable · 2023-07-30T19:18:14Z

What is GSW pattern in this context?

Using a GenericSpecializedWrapper to "pass in" the explicit implementation details at template declaration time as a struct that allows the compile to devirtualize all the interface methods. I saw it in the various frozen set implementations.

Replace ISubstringComparer with ISubstringEqualityComparer as that's more indicative of the interface it's exposing, and rename everything derived from that. Fixed the implementation of FullStringEqualityComparer to not actually use the slice ReadOnlySpan(s) in the Equals and GetHashCode specializations. Added comments for the SubstringEquality classes Renamed the files for the specialization of the comparers. Fixed the unit tests under debug builds. Extended the test suite data for the KeyAnalyzer tests to exercise using the data in FrozenFromKnownValuesTests.

IDisposable · 2023-08-02T03:56:34Z

I've done another pass to clean up the documentation and make it a little cleaner.

IDisposable · 2023-08-02T17:00:03Z

Superseded by PR #89863

IDisposable added 5 commits July 29, 2023 02:16

Revert one optimization

c8c8801

Subtle bug in that the entire string is being placed in the set, not the span.

Back to string[] for HasSufficientUniquenessFactor

860ebb3

Benchmarks ever so slightly better.

Back to ReadOnlySpan

c1dd5d9

Looks like the overhead of IEnumerable<string> is not worth the savings for the benchmark test data. Perhaps it would matter less if we were freezing more strings, but not likely

dotnet-issue-labeler bot added the area-System.Collections label Jul 30, 2023

ghost added the community-contribution Indicates that the PR has been added by a community member label Jul 30, 2023

IDisposable mentioned this pull request Jul 30, 2023

Move the IsLeft/IsRight decision out of the loop and use computed substring set #88516

Closed

Use trait-based classes to get everything inlined

7060c23

This will cost one extra `HashSet<string>` and one extra `SubstringComparer` to be allocated, but might make the code run faster. Use the GSW strategy for virtual flattening

IDisposable force-pushed the trait-based branch from ab81f64 to 7060c23 Compare July 30, 2023 11:18

IDisposable marked this pull request as ready for review July 30, 2023 12:09

build-analysis bot mentioned this pull request Jul 30, 2023

Roslyn source generator crash on mono/linux/arm64 #81123

Closed

This was referenced Aug 2, 2023

Tracking issue for CI build timeouts #76454

Closed

Assert in System.Net.Sockets.SocketAsyncEventArgs.CompleteAcceptOperation #89806

Closed

IDisposable mentioned this pull request Aug 2, 2023

Speed up KeyAnalyzer for substring based frozen collections #89863

Closed

IDisposable closed this Aug 2, 2023

ghost locked as resolved and limited conversation to collaborators Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up KeyAnalyzer for substring based frozen collections #89689

Speed up KeyAnalyzer for substring based frozen collections #89689

IDisposable commented Jul 30, 2023 •

edited

Loading

ghost commented Jul 30, 2023

IDisposable commented Jul 30, 2023 •

edited

Loading

IDisposable commented Jul 30, 2023

danmoseley commented Jul 30, 2023

IDisposable commented Jul 30, 2023

IDisposable commented Aug 2, 2023

IDisposable commented Aug 2, 2023

Speed up KeyAnalyzer for substring based frozen collections #89689

Speed up KeyAnalyzer for substring based frozen collections #89689

Conversation

IDisposable commented Jul 30, 2023 • edited Loading

ghost commented Jul 30, 2023

IDisposable commented Jul 30, 2023 • edited Loading

IDisposable commented Jul 30, 2023

danmoseley commented Jul 30, 2023

IDisposable commented Jul 30, 2023

IDisposable commented Aug 2, 2023

IDisposable commented Aug 2, 2023

IDisposable commented Jul 30, 2023 •

edited

Loading

IDisposable commented Jul 30, 2023 •

edited

Loading