-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swiss tables design for Dict
#44513
Swiss tables design for Dict
#44513
Conversation
Co-authored-by: Simeon Schaub <simeondavidschaub99@gmail.com>
Co-authored-by: Simeon Schaub <simeondavidschaub99@gmail.com>
@nanosoldier |
@KristofferC I'm sorry I forget to update |
Would be great to hear how this compares in implementation and performance to https://juliacollections.github.io/DataStructures.jl/latest/swiss_dict/ |
Something went wrong when running your job:
Unfortunately, the logs could not be uploaded. |
Does this PR generate similar code to the LLVM-calls from the JuliaCollections dict? If so, that's really cool! |
Thank you for the comment. I've added |
@oscardssmith Unfortunately the PR is not that cool. :-) The idea was to keep it as simple as possible and in pure Julia. It just takes advantage of iterating over |
base/dict.jl
Outdated
h.count += 1 | ||
h.age += 1 | ||
if index < h.idxfloor | ||
h.idxfloor = index | ||
end | ||
|
||
sz = length(h.keys) | ||
sz = length(h.pairs) | ||
# Rehash now if necessary | ||
if h.ndel >= ((3*sz)>>2) || h.count*3 > sz*2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this heuristic be updated? As I understand, most of the purpose of a swiss table is to allow higher capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the question. Generally yes, but here the primary motivation is different. The PR limits the number of isequal
calls, and thus limits the number of allocations for abstract types and pressure on GC. These coeficients should be updated to limit memory consumption. This fine-tuning needs much more benchmarking on various sizes. There will always be some tradeoff between speed and used memory. I propose to move such a discussion into a separate PR. Meanwhile, I'll try to prepare some microbenchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I've tested so far, we would need to use SIMD (as in DataStructures.jl) to increase the capacity without a significant performance drop (given by unpredictable branching). The good news is, that metadata in slots
is prepared for that. Nevertheless, I'm not sure, if such a low-level llvm code should go to base because it will be harder to read and check ... and it will be platform specific.
I think this is basically ready to merge. Can you add a benchmark for iterating over a |
One other optimization that we should either include here or in a followup PR is that for |
I've added a benchmark for iterating over values. There is only one extra Line 182 in 99e24d7
Btw, if we increase the density in future, the iteration will become faster (and closer to |
I guess the breakage of this PR is the same as #44332, but lets check: @nanosoldier |
I like the idea of storing a vector of pairs, but it occurs to me this can have a large cost in alignment padding, for example in a |
This is a design choice and I'm NOT the right one in this conversation to decide. I'll benchmark such combinations. Now, I see the following options:
|
Assuming benchmarks for option 2 look good, that's probably what I would want. We definitely wouldn't wait to do |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
Personally, I like keeping fairly orthogonal things in different PRs so to me, only focusing on the Swiss here and having it be merged relatively quickly is advantageous over bundling it together with the |
I agree, so I've focused the PR only on the Swiss design and updated the comparison. Further, I've changed Lines 370 to 371 in 8fd9617
|
Great and thanks for putting in the extra effort of keeping things backward compatible! IMO this is mergeable but maybe @JeffBezanson wants to look it over one last time. |
I've gone throw the code for the last time and reverted a single line ( |
This PR introduces simplified Swiss tables design for
Dict
described atThis extends my previous PR #44332 by using Swiss tables design described athttps://abseil.io/about/design/swisstables#swiss-tables-design-notes
The performance gain starts to be really interesting, especially for abstract types. I created a separate PR because the Swiss Tables can be implemented independently on #44332 but probably with less performance gian. Changes related only to the Swiss tables are in the commit Swiss tables based hashing.
The main idea is to store the 7 highest bits of the hash in the slots. They can be utilized to test if the key may be equal. This limits
isequal
calls almost to a single one per operation (using a high-quality hashing function).CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
All times are in seconds.
Master (total time 52.320 s)
PR (total time 35.070 s)
PR together with #44332 (total time 33.552 s)
SwissDict (total time 43.458 s) from DataStructures.jl
Testing code