-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of Dict{K,V}
(~5%) by storing elements in pairs::Vector{Pair{K,V}}
#44332
Conversation
This looks great! I think #38145 is probably the direction we want to move in the longer term, but free performance improvements are always great! |
What's the actual benchmark that you ran? Did you measure iterating over |
This comment was marked as outdated.
This comment was marked as outdated.
This is probably a good idea, since we can now store |
The remaining test failure is about precompilation only.
Master typeof(dict) = Dict{Int64, Any}
SET 1.331 s, GET 1.040 s, GET! 1.153 s, ITER. KEYS 0.080 s, ITER. VALS 0.525 s
typeof(dict) = Dict{Any, Int64}
SET 2.706 s, GET 1.306 s, GET! 5.691 s, ITER. KEYS 0.536 s, ITER. VALS 0.080 s
typeof(dict) = Dict{Any, Any}
SET 2.557 s, GET 1.922 s, GET! 7.591 s, ITER. KEYS 0.533 s, ITER. VALS 0.558 s
Total time 27.608 s PR typeof(dict) = Dict{Int64, Any}
SET 0.976 s, GET 0.927 s, GET! 1.897 s, ITER. KEYS 0.086 s, ITER. VALS 0.533 s
typeof(dict) = Dict{Any, Int64}
SET 1.837 s, GET 2.029 s, GET! 5.541 s, ITER. KEYS 0.551 s, ITER. VALS 0.089 s
typeof(dict) = Dict{Any, Any}
SET 2.941 s, GET 1.988 s, GET! 6.506 s, ITER. KEYS 0.591 s, ITER. VALS 0.577 s
Total time 27.070 s |
Very interesting; we should look into that. |
The problem seems to be caused by extra allocations because of Lines 386 to 387 in 8306858
|
Seems like someone accidentally wrote k=>v later in that function, which had the wrong type |
Thank you, fixed. I was just naive that this can be optimized out automatically. |
Dict{K,V}
(~5%) by storing elements in pairs::Vector{Pair{K,V}}
Can you post the updated benchmarks with concrete as well as abstract types? |
All results should be updated now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great PR! I have a few comments and I think it would be good to have another review from someone more familiar with dict internals than me, but overall I think this is a very nice improvement.
function getindex(h::Dict{K,V}, key) where V where K | ||
index = ht_keyindex(h, key) | ||
@inbounds return (index < 0) ? throw(KeyError(key)) : h.vals[index]::V | ||
@inbounds return (index < 0) ? throw(KeyError(key)) : h.pairs[index].second::V |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why all these type annotations here were added in the first place. They probably don't hurt, but I also don't see why they'd be needed.
Co-authored-by: Simeon Schaub <simeondavidschaub99@gmail.com>
Co-authored-by: Simeon Schaub <simeondavidschaub99@gmail.com>
Are any of the benchmarks here useful? https://github.com/JuliaCollections/DataStructures.jl/blob/master/benchmark/bench_heap.jl |
I'm closing this in favor of #44513. |
Reopening to run CI. I'll update benchmarks once I have some time. |
I've updated the evaluation. The speed-up is still about 5% for concrete types, but none for abstract types. Furthermore, Julia currently doesn't support packing of |
one thing that might be worth trying is storing 8 keys followed by 8 values. this would fix the alignment issues as least |
I've updated the PR against master, since I found it beneficial for small julia> @btime Set(x) setup=(x=rand()); # PR
62.848 ns (3 allocations: 336 bytes)
julia> @btime Set(x) setup=(x=rand()); # master
87.330 ns (4 allocations: 400 bytes)
julia> @btime Dict(x => x) setup=(x=rand()); # PR
64.738 ns (3 allocations: 480 bytes)
julia> @btime Dict(x => x) setup=(x=rand()); # master
90.961 ns (4 allocations: 544 bytes)
julia> @btime Base.ImmutableDict(x => x) setup=(x=rand()); # only as a ground truth
10.300 ns (2 allocations: 64 bytes)
|
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
This comment was marked as outdated.
This comment was marked as outdated.
(you can leave out the |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
@oscardssmith Nice idea, but I'm not aware how to implement that in pure Julia. Closest way I can imagine is using julia> v = Pair{NTuple{8,Int64}, NTuple{8,Int8}}[]
Pair{NTuple{8, Int64}, NTuple{8, Int8}}[]
julia> resize!(v, 4)
4-element Vector{Pair{NTuple{8, Int64}, NTuple{8, Int8}}}:
(0, 0, 0, 0, 0, 0, 0, 0) => (0, 0, 0, 0, 0, 0, 0, 0)
(0, 0, 0, 0, 0, 0, 0, 0) => (0, 0, 0, 0, 0, 0, 0, 0)
(0, 0, 0, 0, 0, 0, 0, 0) => (0, 0, 0, 0, 0, 0, 0, 0)
(0, 0, 0, 0, 0, 0, 0, 0) => (0, 0, 0, 0, 0, 0, 0, 0)
julia> isbitstype(Pair{NTuple{8,Int64}, NTuple{8,Int8}})
true |
good point. @JeffBezanson this is another great example of why we should have a simple buffer type. |
This PR has been approved but was never merged; it has various conflicts now. Also apparently the performance benefit in the current version is minimal to non-existent. Thus I think it is OK to close this. Feel free to re-open should I be mistaken, or just submit a new PR with a pointer to this one (I believe this will increase its chance of being "seen" by reviewers). |
Updated on March 28, 2022 - when testing on multiple sizes, the difference is not so significant, or zero.
I have noticed that
Dict
performance can be improved by storing keys and values together in a single vector of pairs. It can provide up to about 5% performance improvement for large dictionaries because it limits the number of random accesses to the memory. This PR is a kind of proof-of-concept. There is no change to the algorithm. Do you think it's worth it?Although the PR is considered to be non-breaking, it may brake code to those who utilizes internal representation of the
Dict
.Master:
PR:
I've measured the total elapsed time and total allocated memory over large dictionaries with various sizes.
Master:
PR:
Testing code