Swiss tables design for `Dict` #44513

petvana · 2022-03-08T10:32:23Z

This PR introduces simplified Swiss tables design for Dict described at
~~This extends my previous PR #44332 by using Swiss tables design described at~~
https://abseil.io/about/design/swisstables#swiss-tables-design-notes

The performance gain starts to be really interesting, especially for abstract types. I created a separate PR because the Swiss Tables can be implemented independently on #44332 but probably with less performance gian. Changes related only to the Swiss tables are in the commit Swiss tables based hashing.

The main idea is to store the 7 highest bits of the hash in the slots. They can be utilized to test if the key may be equal. This limits isequal calls almost to a single one per operation (using a high-quality hashing function).

CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
All times are in seconds.

Master (total time 52.320 s)

	Type	SET	GET	GET!empty	GET!full	ITERATE
1	Dict{Int64, Int64}	0.702	0.445	0.718	0.507	0.048
2	Dict{Any, Int64}	5.155	1.242	4.303	1.52	0.048
3	Dict{Int64, Any}	1.412	1.345	2.546	1.765	1.342
4	Dict{Any, Any}	5.801	3.472	6.117	3.755	0.934
5	Dict{String, Int64}	3.321	1.373	2.915	1.485	0.049

PR (total time 35.070 s)

	Type	SET	GET	GET!empty	GET!full	ITERATE
1	Dict{Int64, Int64}	0.637	0.381	0.638	0.486	0.049
2	Dict{Any, Int64}	2.33	0.532	2.85	0.684	0.057
3	Dict{Int64, Any}	1.571	1.156	2.388	1.258	0.885
4	Dict{Any, Any}	4.239	1.213	5.671	1.245	0.888
5	Dict{String, Int64}	1.991	0.805	1.953	1.104	0.056

PR together with #44332 (total time 33.552 s)

	Type	SET	GET	GET!empty	GET!full	ITERATE
1	Dict{Int64, Int64}	0.597	0.356	0.59	0.444	0.051
2	Dict{Any, Int64}	2.788	0.506	2.278	0.592	0.053
3	Dict{Int64, Any}	0.773	1.113	2.376	1.091	1.66
4	Dict{Any, Any}	3.461	1.127	4.876	1.119	2.119
5	Dict{String, Int64}	1.815	0.711	1.908	1.084	0.063

SwissDict (total time 43.458 s) from DataStructures.jl

	Type	SET	GET	GET!empty	GET!full	ITERATE
1	SwissDict{Int64, Int64}	0.776	0.359	0.813	0.459	0.018
2	SwissDict{Any, Int64}	3.955	1.946	3.867	1.355	0.018
3	SwissDict{Int64, Any}	1.257	1.282	2.105	1.899	1.328
4	SwissDict{Any, Any}	5.717	1.636	6.731	1.66	0.913
5	SwissDict{String, Int64}	1.715	1.061	1.513	1.055	0.018

Testing code

module TestDict

using Printf
using Random
using DataStructures
using DataFrames

const n = 10_000_000

function test_set(dict, x)
    xn = length(x)
    #sizehint!(dict, xn)
    for i in 1:xn
        dict[x[i]] = i
    end
end

test_get(dict, x) = sum(dict[x[i]] for i = 1:length(x))
test_get!(dict, x) = sum(get!(dict, x[i], i) for i = 1:length(x))
test_iterate(dict) = sum(v for v = values(dict))

for D in [SwissDict, Dict]
    df = DataFrame()
    for (A,B) in [(Int, Int), (Any, Int), (Int, Any), (Any, Any), (String, Int)]
        Random.seed!(42)
        if A == String
            keys = [randstring() for i = 1:n]
        else
            keys = rand(A == Any ? Int : A, n)
        end
        keys = unique(keys)
        correct_sum = sum([1:length(keys)...])

        dict = D{A, B}()
        test_set(dict, keys)
        test_get(dict, keys)
        dict = D{A, B}()
        time_set = @elapsed test_set(dict, keys)
        time_get = @elapsed getsum = test_get(dict, keys)
        @assert getsum == correct_sum

        dict = D{A, B}()
        test_get!(dict, keys)
        test_get(dict, keys)

        dict = D{A, B}()
        time_get!_empty = @elapsed getsum = test_get!(dict, keys)
        @assert getsum == correct_sum
        time_get!_full = @elapsed getsum = test_get!(dict, keys)
        @assert getsum == correct_sum

        test_iterate(dict)
        time_iterate = @elapsed getsum = test_iterate(dict)
        @assert getsum == correct_sum

        new_data = ( 
            Type = typeof(dict), 
            SET = time_set, 
            GET = time_get, 
            GET!empty = time_get!_empty,
            GET!full = time_get!_full,
            ITERATE = time_iterate,
        )
        println(new_data)
        push!(df, new_data)
    end
    total = sum(sum(x) for x in eachcol(df[:,2:end]))
    df[:,2:end] = round.(df[:,2:end]; digits = 3)
    show(stdout, MIME("text/plain"), df)
    println("\n")
    show(stdout, MIME("text/html"), df; eltypes = false, summary = false)
    println("\n")
    @printf "Total time %.3f s\n\n\n" total
end

end

Co-authored-by: Simeon Schaub <simeondavidschaub99@gmail.com>

KristofferC · 2022-03-08T11:06:28Z

@nanosoldier runbenchmarks(ALL, vs=":master")

petvana · 2022-03-08T12:40:51Z

@KristofferC I'm sorry I forget to update set.jl. So, this nanosoldier run can be terminated.

vtjnash · 2022-03-08T17:06:23Z

Would be great to hear how this compares in implementation and performance to https://juliacollections.github.io/DataStructures.jl/latest/swiss_dict/
https://nextjournal.com/eulerkochy/gsoc-20-in-datastructures.jl
@eulerkochy

nanosoldier · 2022-03-08T18:49:45Z

Something went wrong when running your job:

NanosoldierError: error when preparing/pushing to report repo: failed process: Process(setenv(`git push`; dir="/nanosoldier/workdir/NanosoldierReports"), ProcessExited(1)) [1]

Unfortunately, the logs could not be uploaded.

vtjnash · 2022-03-08T18:52:04Z

https://github.com/JuliaCI/NanosoldierReports/blob/master/benchmark/by_hash/27b53cf_vs_dc45d77/report.md

oscardssmith · 2022-03-08T20:52:58Z

Does this PR generate similar code to the LLVM-calls from the JuliaCollections dict? If so, that's really cool!

petvana · 2022-03-08T21:53:10Z

Would be great to hear how this compares in implementation and performance to https://juliacollections.github.io/DataStructures.jl/latest/swiss_dict/ https://nextjournal.com/eulerkochy/gsoc-20-in-datastructures.jl @eulerkochy

Thank you for the comment. I've added SwissDict into the comparison. I'm quite surprised that the PR seems to be faster (at least for abstract types). However, these two implementations are very different because the PR utilizes only one idea to store part of the hash in Metadata (here slots).

petvana · 2022-03-08T22:16:14Z

Does this PR generate similar code to the LLVM-calls from the JuliaCollections dict? If so, that's really cool!

@oscardssmith Unfortunately the PR is not that cool. :-) The idea was to keep it as simple as possible and in pure Julia. It just takes advantage of iterating over Vector{UInt8} is fast. Therefore, if you store part of the hash (7 bits) in slots you are able with 127/128 probability check if the keys are equal. As a result, the linear probing is fast and the number of isequal calls is limited almost to a single call per operation (~1.05 calls for optimal hashing function).

oscardssmith · 2022-03-11T15:22:37Z

base/dict.jl

    h.count += 1
    h.age += 1
    if index < h.idxfloor
        h.idxfloor = index
    end

-    sz = length(h.keys)
+    sz = length(h.pairs)
    # Rehash now if necessary
    if h.ndel >= ((3*sz)>>2) || h.count*3 > sz*2


Should this heuristic be updated? As I understand, most of the purpose of a swiss table is to allow higher capacity.

Thank you for the question. Generally yes, but here the primary motivation is different. The PR limits the number of isequal calls, and thus limits the number of allocations for abstract types and pressure on GC. These coeficients should be updated to limit memory consumption. This fine-tuning needs much more benchmarking on various sizes. There will always be some tradeoff between speed and used memory. I propose to move such a discussion into a separate PR. Meanwhile, I'll try to prepare some microbenchmarks.

What I've tested so far, we would need to use SIMD (as in DataStructures.jl) to increase the capacity without a significant performance drop (given by unpredictable branching). The good news is, that metadata in slots is prepared for that. Nevertheless, I'm not sure, if such a low-level llvm code should go to base because it will be harder to read and check ... and it will be platform specific.

base/dict.jl

oscardssmith · 2022-03-11T19:00:00Z

I think this is basically ready to merge. Can you add a benchmark for iterating over a Dict to the suite? That looks like the main type of test missing.

oscardssmith · 2022-03-11T20:14:28Z

One other optimization that we should either include here or in a followup PR is that for get we should do a linear probe over slotsfor dictionaries with 32 or fewer elements. The linear lookup should be easy for LLVM to vectorize, and should be simpler and faster. (for reference, with a perfect hash function the linear scan will have a false positive 23% of the time for a 32 element dict).

petvana · 2022-03-11T21:15:58Z

I think this is basically ready to merge. Can you add a benchmark for iterating over a Dict to the suite? That looks like the main type of test missing.

I've added a benchmark for iterating over values. There is only one extra & operation when iterating. It comes from:

julia/base/dict.jl

Line 182 in 99e24d7

@propagate_inbounds isslotfilled(h::Dict, i::Int) = (h.slots[i] & 0x80) == 0

Btw, if we increase the density in future, the iteration will become faster (and closer to SwissDict).

base/dict.jl

fredrikekre · 2022-03-11T22:05:25Z

I guess the breakage of this PR is the same as #44332, but lets check:

@nanosoldier runtests(ALL, vs = ":master")

JeffBezanson · 2022-03-11T22:14:26Z

I like the idea of storing a vector of pairs, but it occurs to me this can have a large cost in alignment padding, for example in a Dict{Int64, Int8}.

petvana · 2022-03-11T22:38:13Z

I like the idea of storing a vector of pairs, but it occurs to me this can have a large cost in alignment padding, for example in a Dict{Int64, Int8}.

This is a design choice and I'm NOT the right one in this conversation to decide. I'll benchmark such combinations. Now, I see the following options:

Merge as it is (both vector of pairs and Swiss table design) - Slightly braking (for example single change in CSV.jl)
Split the PR and merge only the Swiss table design. - Almost no breakage
Split the PR, merge only the Swiss table design now, and merge a vector of pairs to Julia 2.0. - The most breaking change would be postponed to a major release.

oscardssmith · 2022-03-11T23:11:07Z

Assuming benchmarks for option 2 look good, that's probably what I would want. We definitely wouldn't wait to do Vector{Pair} for Julia 2.0. It's not breaking, so if it's better we'll merge it nowish, and if it's worse, we won't merge it for 2.0. I also discovered that Google has an Apache licensed implementation of swish hash here.

nanosoldier · 2022-03-12T05:48:37Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

KristofferC · 2022-03-12T08:11:12Z

Personally, I like keeping fairly orthogonal things in different PRs so to me, only focusing on the Swiss here and having it be merged relatively quickly is advantageous over bundling it together with the Pair discussion that has to also take into account the padding and "breakage". So, in my opinion, make this PR only have the Swiss and rebase the other one on top of this.

petvana · 2022-03-14T10:53:55Z

Personally, I like keeping fairly orthogonal things in different PRs so to me, only focusing on the Swiss here and having it be merged relatively quickly is advantageous over bundling it together with the Pair discussion that has to also take into account the padding and "breakage". So, in my opinion, make this PR only have the Swiss and rebase the other one on top of this.

I agree, so I've focused the PR only on the Swiss design and updated the comparison. Further, I've changed 0x00 to be empty slot. Finally, I've tried to brake as little code as possible by preserving ht_keyindex2!. Now, we can depricate it easily if you want. Tested on AbstractAlgebra and CSV packages.

julia/base/dict.jl

Lines 370 to 371 in 8fd9617

    
           # Only for better backward compatibility. It can be removed in the future. 
        
           ht_keyindex2!(h::Dict, key) = ht_keyindex2_shorthash!(h, key)[1]

KristofferC · 2022-03-14T12:25:10Z

Great and thanks for putting in the extra effort of keeping things backward compatible!

IMO this is mergeable but maybe @JeffBezanson wants to look it over one last time.

petvana · 2022-03-14T22:15:44Z

I've gone throw the code for the last time and reverted a single line (fill to zeros). Thus, ready to be merged from my point of view.

petvana and others added 18 commits February 24, 2022 16:19

Improve performance of 'Dict'

ae2aad5

Fix Set

0acaa9a

Fix most of the tests

8306858

Use explicite types for pairs

9a1a660

Mark broken test of precompilation

82e7a9d

Merge branch 'master' into pv-dict

263dc86

Reenable broken test

96d80a5

Use simplified OldDict for testing precompilation

f37cc2d

Fix precompile test

25d9bec

Fix white space

8d39aef

Merge branch 'master' into pv-dict

51a6c15

Apply suggestions from code review

08d7d88

Co-authored-by: Simeon Schaub <simeondavidschaub99@gmail.com>

Apply suggestions from code review

afd7975

Co-authored-by: Simeon Schaub <simeondavidschaub99@gmail.com>

Fix performance of Set constructor if both {T,V} are bitstypes

e69826f

Apply suggestions from code review

6600b03

Add inbounds

d789afc

Swiss tables based hashing

ae9483c

Fix whitespaces

cafda6e

Fix Set for Swiss tables

4e03425

JeffBezanson added collections Data structures holding multiple items, e.g. sets performance Must go faster labels Mar 9, 2022

Update comments + rename _shorthash7 function

2cdb1aa

petvana marked this pull request as ready for review March 10, 2022 22:11

oscardssmith reviewed Mar 11, 2022

View reviewed changes

base/dict.jl Outdated Show resolved Hide resolved

Apply suggestions from code review

99e24d7

oscardssmith approved these changes Mar 11, 2022

View reviewed changes

petvana mentioned this pull request Mar 11, 2022

Improve performance of Dict{K,V} (~5%) by storing elements in pairs::Vector{Pair{K,V}} #44332

Closed

JeffBezanson reviewed Mar 11, 2022

View reviewed changes

base/dict.jl Outdated Show resolved Hide resolved

JeffBezanson reviewed Mar 11, 2022

View reviewed changes

base/dict.jl Outdated Show resolved Hide resolved

Remove unreachable code

67694ba

petvana added 4 commits March 12, 2022 19:38

Focus PR purely on Swiss table design

e0d59ac

Make empty slot 0x00

aca0272

Improve backward compatibility

8fd9617

Merge branch 'master' into pv-dict-swiss

90d5f75

JeffBezanson approved these changes Mar 14, 2022

View reviewed changes

vtjnash approved these changes Mar 14, 2022

View reviewed changes

Minor change (revert fill() to zeros())

380c160

KristofferC merged commit 85eaf4e into JuliaLang:master Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swiss tables design for `Dict` #44513

Swiss tables design for `Dict` #44513

petvana commented Mar 8, 2022 •

edited

Loading

KristofferC commented Mar 8, 2022

petvana commented Mar 8, 2022

vtjnash commented Mar 8, 2022

nanosoldier commented Mar 8, 2022

vtjnash commented Mar 8, 2022

oscardssmith commented Mar 8, 2022

petvana commented Mar 8, 2022

petvana commented Mar 8, 2022 •

edited

Loading

oscardssmith Mar 11, 2022

petvana Mar 11, 2022

petvana Mar 18, 2022

oscardssmith commented Mar 11, 2022

oscardssmith commented Mar 11, 2022 •

edited

Loading

petvana commented Mar 11, 2022

fredrikekre commented Mar 11, 2022

JeffBezanson commented Mar 11, 2022

petvana commented Mar 11, 2022 •

edited

Loading

oscardssmith commented Mar 11, 2022

nanosoldier commented Mar 12, 2022

KristofferC commented Mar 12, 2022

petvana commented Mar 14, 2022 •

edited

Loading

KristofferC commented Mar 14, 2022

petvana commented Mar 14, 2022

Swiss tables design for Dict #44513

Swiss tables design for Dict #44513

Conversation

petvana commented Mar 8, 2022 • edited Loading

KristofferC commented Mar 8, 2022

petvana commented Mar 8, 2022

vtjnash commented Mar 8, 2022

nanosoldier commented Mar 8, 2022

vtjnash commented Mar 8, 2022

oscardssmith commented Mar 8, 2022

petvana commented Mar 8, 2022

petvana commented Mar 8, 2022 • edited Loading

oscardssmith Mar 11, 2022

Choose a reason for hiding this comment

petvana Mar 11, 2022

Choose a reason for hiding this comment

petvana Mar 18, 2022

Choose a reason for hiding this comment

oscardssmith commented Mar 11, 2022

oscardssmith commented Mar 11, 2022 • edited Loading

petvana commented Mar 11, 2022

fredrikekre commented Mar 11, 2022

JeffBezanson commented Mar 11, 2022

petvana commented Mar 11, 2022 • edited Loading

oscardssmith commented Mar 11, 2022

nanosoldier commented Mar 12, 2022

KristofferC commented Mar 12, 2022

petvana commented Mar 14, 2022 • edited Loading

KristofferC commented Mar 14, 2022

petvana commented Mar 14, 2022

Swiss tables design for `Dict` #44513

Swiss tables design for `Dict` #44513

petvana commented Mar 8, 2022 •

edited

Loading

petvana commented Mar 8, 2022 •

edited

Loading

oscardssmith commented Mar 11, 2022 •

edited

Loading

petvana commented Mar 11, 2022 •

edited

Loading

petvana commented Mar 14, 2022 •

edited

Loading