-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unoptimal hash code combine in TypeHashingAlgorithms.ComputeMethodHashCode #103070
Comments
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas |
At which level are you replacing with |
The problem is not the speed of the hashing itself. That's done only once per each method. The problem are the collisions that are caused by matching hashes and the simple XOR is extremely bad at that. The profile above is with |
NativeAOT file format depends on the stable hashcodes for these things. It should be fine to improve this combining function, but it needs be stable. |
Huh, that explains why it didn't work. I guess I should file a documentation issue since the current documentation doesn't mention this. |
It is, but in the class remarks. https://learn.microsoft.com/en-us/dotnet/api/system.hashcode?view=net-8.0#remarks |
@vcsjones Thanks! I just searched for "HashCode.Combine" directly. I went a bit more down the rabbit hole analyzing this and I am not necessarily sure that changing the hash function is a good enough fix. In this particular instance we are running into a huge instance of I don't have the numbers off-hand, but in the pessimistic case where the hashtable is nearly full, and you search for an element that is NOT in the hashtable (yet), you can end up searching linearly up to 60% (resize threshhold) of the table before finding out that the element is not there.
Overall, this data structure with these parameters doesn't scale up well. We can likely tweak some parameters of the
Both of these algorithms may be difficult to implement with the required concurrency requirements. That said, my benchmarks show the bottlenecks overwhelmingly happening in parts of the execution which are single-threaded. I am not necessarily good at algorithms, so any additional input is welcome. |
This can only happen when the hashcodes are poorly distributed or when you are really unlucky. I think fixing the hashcodes should take care of this problem. |
They are definitely poorly distributed. There's no question about that. It may not be the only issue though and I don't have enough data to confirm/refute that yet. I started tracking the collision numbers on Add (on a fairly large app build):
For scale, the regular |
I may have found one of the root causes. WinForms generates a large number of |
Nit: This threshold is used to switch to more expensive randomized string hash code and only for string keys. It is done for security reasons (protection against DoS attacks). |
Few other notable collisions (same hash, but also same type and name):
First column is the number of such |
For runtime/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/NodeFactory.cs Lines 922 to 923 in fb1fe7f
|
There's a TODO in the code:
runtime/src/coreclr/tools/Common/TypeSystem/Common/TypeHashingAlgorithms.cs
Lines 229 to 234 in b438888
Notably, this comes up as a sore spot in my profiles:
Changing to
HashCode.Combine
yields about >20% less hash collisions in this method alone:This compilation speed improvement is quite noticable.
Unfortunately, there seems to be some dependency on the hash combining algorithm somewhere in the code because this produces an executable that fails to run at runtime:
The text was updated successfully, but these errors were encountered: