-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an additional length check to FrozenDictionary and FrozenSet #92546
Add an additional length check to FrozenDictionary and FrozenSet #92546
Conversation
On construction of the collection, we compute an unsigned long which is effectively 64 boolean flags, each representing the presence of a key string of a particular length (mod 64). When reading from the collection, we can exit early if the key being tested does not map to a bit which has been switched on by the original compuation. I believe this has similarities to how Bloom Filters work. This adds a relatively small cost on creation of the collection as small cost to each read operation. However it can speed up reads with certain data patterns especially when the difference between the maximum and minimum key length is large but there aren't many different lengths.
Tagging subscribers to this area: @dotnet/area-system-collections Issue DetailsOn construction of the collection, we compute an unsigned long which is effectively 64 boolean flags, each representing the presence of a key string of a particular length (mod 64) in the set of keys. Tested via the following benchmark using System.Collections.Frozen;
using System.Collections.Generic;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher
.FromAssembly(typeof(Program).Assembly)
.Run(new[] { "-i" });
public class FrozenSetBenchmarks
{
private FrozenSet<string> _hashSet = new HashSet<string>
{
"cddafs", "cd9660", "iso", "isofs", "iso9660", "fuseiso", "fuseiso9660", "udf", "umview-mod-umfuseiso9660", "aafs", "adfs", "affs", "anoninode", "anon-inode FS", "apfs", "balloon-kvm-fs", "bdevfs", "befs", "bfs", "bootfs", "bpf_fs",
"btrfs", "btrfs_test", "coh", "daxfs", "drvfs", "efivarfs", "efs", "exfat", "exofs", "ext", "ext2", "ext2_old", "ext3", "ext2/ext3", "ext4", "ext4dev", "f2fs", "fat", "fuseext2", "fusefat", "hfs", "hfs+", "hfsplus", "hfsx", "hostfs",
"hpfs", "inodefs", "inotifyfs", "jbd", "jbd2", "jffs", "jffs2", "jfs", "lofs", "logfs", "lxfs", "minix (30 char.)", "minix v2 (30 char.)", "minix v2", "minix", "minix_old", "minix2", "minix2v2", "minix2 v2", "minix3", "mlfs", "msdos",
"nilfs", "nsfs", "ntfs", "ntfs-3g", "ocfs2", "omfs", "overlay", "overlayfs", "pstorefs", "qnx4", "qnx6", "reiserfs", "rpc_pipefs", "sffs", "smackfs", "squashfs", "swap", "sysv", "sysv2", "sysv4", "tracefs", "ubifs", "ufs", "ufscigam",
"ufs2", "umsdos", "umview-mod-umfuseext2", "v9fs", "vagrant", "vboxfs", "vxfs", "vxfs_olt", "vzfs", "wslfs", "xenix", "xfs", "xia", "xiafs", "xmount", "zfs", "zfs-fuse", "zsmallocfs", "9p", "acfs", "afp", "afpfs", "afs", "aufs", "autofs",
"autofs4", "beaglefs", "ceph", "cifs", "coda", "coherent", "curlftpfs", "davfs2", "dlm", "eCryptfs", "fhgfs", "flickrfs", "ftp", "fuse", "fuseblk", "fusedav", "fusesmb", "gfsgfs2", "gfs/gfs2", "gfs2", "glusterfs-client",
"gmailfs", "gpfs", "ibrix", "k-afs", "kafs", "kbfuse", "ltspfs", "lustre", "ncp", "ncpfs", "nfs", "nfs4", "nfsd", "novell", "obexfs", "panfs", "prl_fs", "s3ql", "samba", "smb", "smb2", "smbfs", "snfs", "sshfs", "vmhgfs", "webdav", "wikipediafs",
"xenfs", "anon_inode", "anon_inodefs", "aptfs", "avfs", "bdev", "binfmt_misc", "cgroup", "cgroupfs", "cgroup2fs", "configfs", "cpuset", "cramfs", "cramfs-wend", "cryptkeeper", "ctfs", "debugfs", "dev", "devfs", "devpts", "devtmpfs", "encfs", "fd",
"fdesc", "fuse.gvfsd-fuse", "fusectl", "futexfs", "hugetlbfs", "libpam-encfs", "ibpam-mount", "mntfs", "mqueue", "mtpfs", "mythtvfs", "objfs", "openprom", "openpromfs", "pipefs", "plptools", "proc", "pstore", "pytagsfs", "ramfs", "rofs", "romfs",
"rootfs", "securityfs", "selinux", "selinuxfs", "sharefs", "sockfs", "sysfs", "tmpfs", "udev", "usbdev", "usbdevfs", "gphotofs", "sdcardfs", "usbfs", "usbdevice", "vfat",
}.ToFrozenSet();
// No entry has length 17 so the optimization occurs (and hash code is not calculated)
[Benchmark]
public void Caught_By_LengthFilter() => _hashSet.Contains("some_drive_name_2");
// As some entries have length 14 then the length filter does not help
[Benchmark]
public void Missed_By_LengthFilter() => _hashSet.Contains("somedrivename2");
} Results before (control)
Results after
|
@stephentoub @adamsitnik I would greatly appreciate feedback on this PR. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, but we need to make sure that such misses will be common in real life scenarios (with the micro benchmarks from dotnet/performance repo they are not)
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/KeyAnalyzer.cs
Outdated
Show resolved
Hide resolved
8216613
to
39a44df
Compare
Side note: I've changed from Edit: |
I think the JIT already does it for unsigned integers. |
Thank you for your review and for pointing me to the resource on how to properly run the benchmarks. Additional TestTo obtain an interesting discussion I added a new microbenchmark designed to showcase the existing length bounds as well as the proposed length filter in this PR. This can be seen in my results as the Count = 20 case. The implementation is a bimodal distribution of lengths and a testing ("notfound") array uniformly distributed over a larger bounds.
Benchmark Results: TryGetValue_*_FrozenDictionary
DefaultFrozenDictionary
LengthBucketsFrozenDictionary
SingleCharFrozenDictionary
SubstringFrozenDictionary
DiscussionTwo tests have obviously triggered the new check. The
[This next section has been edited I looked at the wrong data in the original comment 6 hours ago]
Note however that while the existing branch changed depending on the calculation of the hash code, the proposed filter is constant (approx 17ns in both of the above benchmarks) as it skips the calculation entirely. Thus I think we can safely conclude that there is a very small penalty for adding this filter (far less than a nanosecond per method call) but can result in a big gain (skipping computation of a hash code) when it is of use. In terms of how often this will be used in real-world cases, unfortunately I have no concrete statistics. Is there a way we can obtain any data here besides intuition? How was the cost/benefit for the min/max filter made? |
This seems like a good cost/benefit to me. (But I'm not area owner) Should any of your tests go into dotnet/performance? We don't aim to be exhaustive there, so it would only be if there was a clear gap in coverage. |
https://github.com/dotnet/performance/pull/3432/files#r1367705797 Added as a suggestion to @adamsitnik 's PR which importantly adds coverage for the |
One possible realistic scenario where this PR might come in useful is in aspnetcore's DictionaryJumpTable.cs (used in AOT) One can imagine some routes like the following
In the above example the frozen dictionary would contain I got the following results by constructing an example data showcasing the above
|
Cc @BrennanConroy for that. |
It is probably an opinionated change which isn't right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andrewjsaid
Thank you for applying my suggestion! I've re run all the frozen collection benchmarks we have with --launchCount 3
argument (each benchmark run in 3 dedicated processes, helps to lower side effects of memory randomization and other similar factors).
Perf_LengthBucketsFrozenDictionary
The TryGetValue_True_FrozenDictionary
and TryGetValue_False_FrozenDictionary
show no difference (ratio 1.00 or 0.99), which confirms that we can trust the results: there were no changes made to this particular code path and performance is the same.
But the creation of frozen dictionaries has regressed up to 7% for small inputs. I know that my suggestion is the reason for that.
BenchmarkDotNet v0.13.7-nightly.20230717.35, Windows 11 (10.0.22621.2715/22H2/2022Update/SunValley2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-alpha.1.23531.2
[Host] : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT AVX2
PR : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
main : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true IterationTime=250.0000 ms
LaunchCount=3 MaxIterationCount=20 MinIterationCount=15
WarmupCount=1
Method | Job | Count | ItemsPerBucket | Mean | Ratio | Allocated |
---|---|---|---|---|---|---|
ToFrozenDictionary | PR | 10 | 1 | 234.39 ns | 1.06 | 488 B |
ToFrozenDictionary | main | 10 | 1 | 220.88 ns | 1.00 | 488 B |
TryGetValue_True_FrozenDictionary | PR | 10 | 1 | 22.31 ns | 1.00 | - |
TryGetValue_True_FrozenDictionary | main | 10 | 1 | 22.20 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 10 | 1 | 12.49 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | main | 10 | 1 | 12.49 ns | 1.00 | - |
ToFrozenDictionary | PR | 10 | 5 | 231.01 ns | 1.02 | 328 B |
ToFrozenDictionary | main | 10 | 5 | 225.83 ns | 1.00 | 328 B |
TryGetValue_True_FrozenDictionary | PR | 10 | 5 | 80.77 ns | 1.02 | - |
TryGetValue_True_FrozenDictionary | main | 10 | 5 | 78.98 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 10 | 5 | 12.47 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | main | 10 | 5 | 12.47 ns | 1.00 | - |
ToFrozenDictionary | PR | 100 | 1 | 1,110.68 ns | 1.06 | 3728 B |
ToFrozenDictionary | main | 100 | 1 | 1,045.52 ns | 1.00 | 3728 B |
TryGetValue_True_FrozenDictionary | PR | 100 | 1 | 206.33 ns | 1.00 | - |
TryGetValue_True_FrozenDictionary | main | 100 | 1 | 206.37 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 100 | 1 | 102.75 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | main | 100 | 1 | 102.44 ns | 1.00 | - |
ToFrozenDictionary | PR | 100 | 5 | 1,131.40 ns | 1.07 | 2128 B |
ToFrozenDictionary | main | 100 | 5 | 1,055.68 ns | 1.00 | 2128 B |
TryGetValue_True_FrozenDictionary | PR | 100 | 5 | 902.55 ns | 1.00 | - |
TryGetValue_True_FrozenDictionary | main | 100 | 5 | 903.75 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 100 | 5 | 102.68 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | main | 100 | 5 | 102.40 ns | 1.00 | - |
ToFrozenDictionary | PR | 1000 | 1 | 9,514.12 ns | 1.04 | 36128 B |
ToFrozenDictionary | main | 1000 | 1 | 9,178.91 ns | 1.00 | 36128 B |
TryGetValue_True_FrozenDictionary | PR | 1000 | 1 | 2,362.08 ns | 1.00 | - |
TryGetValue_True_FrozenDictionary | main | 1000 | 1 | 2,361.07 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 1000 | 1 | 1,253.38 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | main | 1000 | 1 | 1,251.53 ns | 1.00 | - |
ToFrozenDictionary | PR | 1000 | 5 | 9,501.18 ns | 1.02 | 20128 B |
ToFrozenDictionary | main | 1000 | 5 | 9,360.78 ns | 1.00 | 20128 B |
TryGetValue_True_FrozenDictionary | PR | 1000 | 5 | 8,324.55 ns | 1.00 | - |
TryGetValue_True_FrozenDictionary | main | 1000 | 5 | 8,301.55 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 1000 | 5 | 1,172.01 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | main | 1000 | 5 | 1,168.92 ns | 1.00 | - |
ToFrozenDictionary | PR | 10000 | 1 | 239,837.56 ns | 0.99 | 360129 B |
ToFrozenDictionary | main | 10000 | 1 | 241,305.99 ns | 1.00 | 360129 B |
TryGetValue_True_FrozenDictionary | PR | 10000 | 1 | 63,182.66 ns | 0.99 | - |
TryGetValue_True_FrozenDictionary | main | 10000 | 1 | 63,709.90 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 10000 | 1 | 60,050.12 ns | 0.99 | - |
TryGetValue_False_FrozenDictionary | main | 10000 | 1 | 60,422.24 ns | 1.00 | - |
ToFrozenDictionary | PR | 10000 | 5 | 180,362.34 ns | 1.02 | 200129 B |
ToFrozenDictionary | main | 10000 | 5 | 176,522.39 ns | 1.00 | 200129 B |
TryGetValue_True_FrozenDictionary | PR | 10000 | 5 | 122,381.94 ns | 1.00 | - |
TryGetValue_True_FrozenDictionary | main | 10000 | 5 | 122,064.66 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 10000 | 5 | 49,376.62 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | main | 10000 | 5 | 49,377.70 ns | 1.00 | - |
Perf_SingleCharFrozenDictionary
The creation time has not changed, but TryGetValue
has regressed by 3 to 5%. I know that it would have been different if the inputs for TryGetValue_False
were strings with more than a single character.
BenchmarkDotNet v0.13.7-nightly.20230717.35, Windows 11 (10.0.22621.2715/22H2/2022Update/SunValley2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-alpha.1.23531.2
[Host] : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT AVX2
PR : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
main : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true IterationTime=250.0000 ms
LaunchCount=3 MaxIterationCount=20 MinIterationCount=15
WarmupCount=1
Method | Job | Count | Mean | Ratio | Allocated |
---|---|---|---|---|---|
ToFrozenDictionary | PR | 10 | 688.71 ns | 1.00 | 1440 B |
ToFrozenDictionary | main | 10 | 687.80 ns | 1.00 | 1432 B |
TryGetValue_True_FrozenDictionary | PR | 10 | 32.89 ns | 1.04 | - |
TryGetValue_True_FrozenDictionary | main | 10 | 31.76 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 10 | 26.43 ns | 1.04 | - |
TryGetValue_False_FrozenDictionary | main | 10 | 25.36 ns | 1.00 | - |
ToFrozenDictionary | PR | 100 | 4,190.57 ns | 0.96 | 9864 B |
ToFrozenDictionary | main | 100 | 4,343.67 ns | 1.00 | 9856 B |
TryGetValue_True_FrozenDictionary | PR | 100 | 321.98 ns | 1.03 | - |
TryGetValue_True_FrozenDictionary | main | 100 | 313.56 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 100 | 258.90 ns | 1.04 | - |
TryGetValue_False_FrozenDictionary | main | 100 | 248.27 ns | 1.00 | - |
ToFrozenDictionary | PR | 1000 | 43,094.51 ns | 1.05 | 94872 B |
ToFrozenDictionary | main | 1000 | 41,252.63 ns | 1.00 | 94864 B |
TryGetValue_True_FrozenDictionary | PR | 1000 | 3,176.23 ns | 1.04 | - |
TryGetValue_True_FrozenDictionary | main | 1000 | 3,049.37 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 1000 | 2,491.99 ns | 1.05 | - |
TryGetValue_False_FrozenDictionary | main | 1000 | 2,368.69 ns | 1.00 | - |
ToFrozenDictionary | PR | 10000 | 721,583.83 ns | 1.00 | 892490 B |
ToFrozenDictionary | main | 10000 | 718,446.34 ns | 1.00 | 892482 B |
TryGetValue_True_FrozenDictionary | PR | 10000 | 31,959.12 ns | 1.04 | - |
TryGetValue_True_FrozenDictionary | main | 10000 | 30,790.82 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 10000 | 25,065.95 ns | 1.05 | - |
TryGetValue_False_FrozenDictionary | main | 10000 | 23,867.58 ns | 1.00 | - |
Perf_SubstringFrozenDictionary
The creation time has not changed or is within the range of error, but TryGetValue
has regressed by 2 to 3%.
BenchmarkDotNet v0.13.7-nightly.20230717.35, Windows 11 (10.0.22621.2715/22H2/2022Update/SunValley2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-alpha.1.23531.2
[Host] : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT AVX2
PR : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
main : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true IterationTime=250.0000 ms
LaunchCount=3 MaxIterationCount=20 MinIterationCount=15
WarmupCount=1
Method | Job | Count | Mean | Ratio | Allocated |
---|---|---|---|---|---|
ToFrozenDictionary | PR | 10 | 1,336.32 ns | 1.00 | 1728 B |
ToFrozenDictionary | main | 10 | 1,331.19 ns | 1.00 | 1720 B |
TryGetValue_True_FrozenDictionary | PR | 10 | 56.75 ns | 1.03 | - |
TryGetValue_True_FrozenDictionary | main | 10 | 55.21 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 10 | 50.66 ns | 1.03 | - |
TryGetValue_False_FrozenDictionary | main | 10 | 49.07 ns | 1.00 | - |
ToFrozenDictionary | PR | 100 | 9,128.97 ns | 1.01 | 12120 B |
ToFrozenDictionary | main | 100 | 9,057.09 ns | 1.00 | 12112 B |
TryGetValue_True_FrozenDictionary | PR | 100 | 578.00 ns | 1.02 | - |
TryGetValue_True_FrozenDictionary | main | 100 | 566.76 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 100 | 509.78 ns | 1.03 | - |
TryGetValue_False_FrozenDictionary | main | 100 | 493.32 ns | 1.00 | - |
ToFrozenDictionary | PR | 1000 | 66,301.37 ns | 1.01 | 94872 B |
ToFrozenDictionary | main | 1000 | 65,870.51 ns | 1.00 | 94864 B |
TryGetValue_True_FrozenDictionary | PR | 1000 | 6,177.06 ns | 1.03 | - |
TryGetValue_True_FrozenDictionary | main | 1000 | 6,016.53 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 1000 | 5,543.15 ns | 1.03 | - |
TryGetValue_False_FrozenDictionary | main | 1000 | 5,375.22 ns | 1.00 | - |
ToFrozenDictionary | PR | 10000 | 1,265,175.44 ns | 1.00 | 926146 B |
ToFrozenDictionary | main | 10000 | 1,266,817.49 ns | 1.00 | 926138 B |
TryGetValue_True_FrozenDictionary | PR | 10000 | 86,572.00 ns | 1.03 | - |
TryGetValue_True_FrozenDictionary | main | 10000 | 84,168.82 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 10000 | 65,729.32 ns | 1.02 | - |
TryGetValue_False_FrozenDictionary | main | 10000 | 64,286.50 ns | 1.00 | - |
Perf_DefaultFrozenDictionary
The creation time has not changed or is within the range of error. When it comes to TryGetValue
it depends: it has regressed by 1 to 7% for some cases, while improved 21% for other.
BenchmarkDotNet v0.13.7-nightly.20230717.35, Windows 11 (10.0.22621.2715/22H2/2022Update/SunValley2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-alpha.1.23531.2
[Host] : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT AVX2
PR : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
main : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true IterationTime=250.0000 ms
LaunchCount=3 MaxIterationCount=20 MinIterationCount=15
WarmupCount=1
Method | Job | Count | Mean | Ratio | Allocated |
---|---|---|---|---|---|
ToFrozenDictionary | PR | 10 | 718.87 ns | 1.01 | 1440 B |
ToFrozenDictionary | main | 10 | 710.13 ns | 1.00 | 1432 B |
TryGetValue_True_FrozenDictionary | PR | 10 | 33.10 ns | 1.04 | - |
TryGetValue_True_FrozenDictionary | main | 10 | 31.79 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 10 | 29.78 ns | 1.03 | - |
TryGetValue_False_FrozenDictionary | main | 10 | 29.06 ns | 1.00 | - |
ToFrozenDictionary | PR | 100 | 10,669.42 ns | 0.99 | 18568 B |
ToFrozenDictionary | main | 100 | 10,823.51 ns | 1.00 | 18560 B |
TryGetValue_True_FrozenDictionary | PR | 100 | 1,034.05 ns | 1.01 | - |
TryGetValue_True_FrozenDictionary | main | 100 | 1,022.50 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 100 | 705.22 ns | 0.79 | - |
TryGetValue_False_FrozenDictionary | main | 100 | 894.03 ns | 1.00 | - |
ToFrozenDictionary | PR | 1000 | 58,759.00 ns | 0.99 | 98616 B |
ToFrozenDictionary | main | 1000 | 59,278.50 ns | 1.00 | 98608 B |
TryGetValue_True_FrozenDictionary | PR | 1000 | 10,935.30 ns | 1.00 | - |
TryGetValue_True_FrozenDictionary | main | 1000 | 10,902.89 ns | 1.00 | - |
TryGetValue_False_FrozenDictionary | PR | 1000 | 10,322.62 ns | 1.01 | - |
TryGetValue_False_FrozenDictionary | main | 1000 | 10,228.25 ns | 1.00 | - |
ToFrozenDictionary | PR | 10000 | 984,770.98 ns | 1.00 | 926141 B |
ToFrozenDictionary | main | 10000 | 980,378.44 ns | 1.00 | 926133 B |
TryGetValue_True_FrozenDictionary | PR | 10000 | 238,425.79 ns | 1.01 | 1 B |
TryGetValue_True_FrozenDictionary | main | 10000 | 235,672.89 ns | 1.00 | 1 B |
TryGetValue_False_FrozenDictionary | PR | 10000 | 203,839.76 ns | 1.07 | 1 B |
TryGetValue_False_FrozenDictionary | main | 10000 | 191,346.29 ns | 1.00 | 1 B |
Conclusion
The main idea behind Frozen
collection optimizations is finding the right strategy.
We already have a dedicated strategy optimized for length checks (LengthBucketsFrozenDictionary
).
The SubstringFrozenDictionary
strategy performs a LOT of work to focus only on part of the strings when doing the hashcode computation.
I believe that these two strategies will be used most of the time for ASP.NET paths and similar examples. I don't believe that we should change them.
SingleCharFrozenDictionary
will be used in very specific examples and it should not regress.
However, the fallback strategy (DefaultFrozenDictionary
) cleary shows that nice gains are possible.
Would it be possible to change the code in a way that the optimization is used only in the default strategy?
Thank you for your contribution!
(cc @stephentoub to verify if he agrees with my conclusions)
Thank you for your feedback. I take your point; with the minimal cost of hashing a single char or a few chars, there's not much benefit to be had from filtering early when a few instructions later we will be testing equality which early on compares string lengths anyway. I've implemented the change to only apply when no hashable substring was found by KeyAnalyzer. Benchmark results below showing what you would expect. Luckily the concrete classes already had been optimized for customized codegen based on overridden methods Equals and GetHashCode. Perf_DefaultFrozenDictionaryImprovement has remained (~20% in the relevant case).
Perf_LengthBucketsFrozenDictionaryStill no change (just noise)
Perf_SingleCharFrozenDictionaryOptimization no longer applies. No filter but also no regression!
Perf_SubstringFrozenDictionarySame as Perf_SingleCharFrozenDictionary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, big thanks for your contribution @andrewjsaid !
On construction of the collection, we compute an unsigned long which is effectively 64 boolean flags, each representing the presence of a key string of a particular length (mod 64) in the set of keys.
When reading from the collection, we can exit early if the key being tested does not map to a bit which has been switched on by the original computation. I believe this has similarities to how Bloom Filters work.
This adds a relatively small cost on creation of the collection as well as a possibly negligible cost to each read operation. However it can speed up reads with certain data patterns especially when the difference between the maximum and minimum key length is large but there aren't many different lengths.
Tested via the following benchmark
Results before (control)
Results after