radix sort fallback to Base on small input #51

quinnj · 2021-10-17T03:38:39Z

I don't think we need ts as an argument here, right? I'm not aware of anywhere that passes their own ts, for example. I think we just move ts = similar(vs) down below the length check.

I use it in #33, but indeed I've no idea why it's supported here given that it's not used. Anyway better leave this for another PR?

Ah OK you mention this because of the addition of the length check. Indeed better move this inside the function then.

I'm not aware of very many users of SortingAlgorithms, but I imagine one might desire to pass in ts when sorting multiple arrays to avoid allocations and save time. For example:

using SortingAlgorithms, BenchmarkTools using Base.Sort, Base.Order using Base.Sort: defalg function sort_ts!(A::AbstractArray; dims::Integer, alg::Algorithm=defalg(A), lt=isless, by=identity, rev::Union{Bool,Nothing}=nothing, order::Ordering=Forward) ordr = ord(lt, by, rev, order) nd = ndims(A) k = dims 1 <= k <= nd || throw(ArgumentError("dimension out of range")) remdims = ntuple(i -> i == k ? 1 : size(A, i), nd) ts = similar(A[ntuple(i -> i == k ? Colon() : 1, nd)...]) for idx in CartesianIndices(remdims) Av = view(A, ntuple(i -> i == k ? Colon() : idx[i], nd)...) sort_ts!(Av, alg, ordr, ts) end A end function sort_ts!(v::AbstractVector, alg::Algorithm, order::Ordering, ts) inds = axes(v,1) sort!(v,first(inds),last(inds),alg,order,ts) end a = @benchmark sort!(x; dims=1) setup=(x=rand(Int, 5_000, 5_000)) display(a) b = @benchmark sort!(x; alg=RadixSort, dims=1) setup=(x=rand(Int, 5_000, 5_000)) display(b) c = @benchmark sort_ts!(x; alg=RadixSort, dims=1) setup=(x=rand(Int, 5_000, 5_000)) display(c)

With results

BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.159 s … 1.168 s ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.164 s ┊ GC (median): 0.00% Time (mean ± σ): 1.164 s ± 3.638 ms ┊ GC (mean ± σ): 0.00% ± 0.00% █ ██ █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁ 1.16 s Histogram: frequency by time 1.17 s < Memory estimate: 468.75 KiB, allocs estimate: 10000. BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.210 s … 1.429 s ┊ GC (min … max): 15.85% … 23.46% Time (median): 1.332 s ┊ GC (median): 20.54% Time (mean ± σ): 1.326 s ± 110.688 ms ┊ GC (mean ± σ): 22.18% ± 6.99% █ █ █ █ █▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█ ▁ 1.21 s Histogram: frequency by time 1.43 s < Memory estimate: 1.11 GiB, allocs estimate: 90000. BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.081 s … 1.286 s ┊ GC (min … max): 14.12% … 26.52% Time (median): 1.281 s ┊ GC (median): 25.15% Time (mean ± σ): 1.232 s ± 100.796 ms ┊ GC (mean ± σ): 24.24% ± 7.18% ▁ █▁ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██ ▁ 1.08 s Histogram: frequency by time 1.29 s < Memory estimate: 945.74 MiB, allocs estimate: 80004.

The problem is that it's not clear whether this is part of the public API or not. In doubt, maybe we should keep it to avoid breaking things.

But I don't think we should silently resize ts disregarding its size. The cleanest solution would probably to use nothing as the default. Otherwise, we could allow 0-length ts as a special case, but that's a bit less clean (and that's one more allocation).

I believe the now-current edb805c version is as strict as before the PR: throws a bounds error if ts or vs fail to contain all of lo:hi and otherwise messes with the contents of ts but not its size.

I also think the current API is identical to the old: one may pass in a correctly sized ts or leave that argument off.

Theoretically, the only changes this PR makes are:

when hi-lo is small radix sort is not run, and no allocations are made.

when ts needs to be allocated (almost always) an OffsetVector spanning lo:hi is allocated instead of a full similar(vs)

explicit low cost bounds checks at the beginning of the sorting algorithm that should cover all subsequent bounds errors (though no additional @inbounds are added)

Perhaps 2 and 3 belong in a different PR, but I think both are reasonable (assuming OffsetVectors are implemented as efficiently as Vectors)

I think the use of an OffsetArray is free:

julia> @benchmark sort!(x; alg=RadixSort) setup=(x=rand(20_000)) BenchmarkTools.Trial: 7600 samples with 1 evaluation. Range (min … max): 421.061 μs … 5.636 ms ┊ GC (min … max): 0.00% … 89.58% Time (median): 512.915 μs ┊ GC (median): 0.00% Time (mean ± σ): 551.441 μs ± 340.750 μs ┊ GC (mean ± σ): 4.52% ± 6.63% ▁▂▂▂▁▃▄▆██▇▅▃▂▂▂▁▁▁▁ ▁ ▂ ▅▆▆███████████████████████▇██▇▇▆▇▆▆▇▇▆▅▆▅▅▆▅▅▅▆▇█▇▇▄▃▃▄▄▂▄▂▄▅ █ 421 μs Histogram: log(frequency) by time 871 μs < Memory estimate: 349.91 KiB, allocs estimate: 16. julia> @benchmark sort!(x; alg=RadixSort) setup=(x=OffsetArray(rand(20_000), 1:20_000)) BenchmarkTools.Trial: 7511 samples with 1 evaluation. Range (min … max): 416.961 μs … 6.288 ms ┊ GC (min … max): 0.00% … 90.21% Time (median): 511.857 μs ┊ GC (median): 0.00% Time (mean ± σ): 553.784 μs ± 364.445 μs ┊ GC (mean ± σ): 4.80% ± 6.65% ▁▁▂▁▁ ▄▇██▇▆▅▄▃▂▂▂▂▁▁▁▁ ▂ ▃▂▆██████▇█████████████████████▇███▇▆▆▆▇▇▆▅▆▅▆▆▅▅▆▅▅▄▆▆██▇▇▆▅ █ 417 μs Histogram: log(frequency) by time 816 μs < Memory estimate: 349.91 KiB, allocs estimate: 16. julia> @benchmark sort!(x) setup=(x=OffsetArray(rand(20_000), 1:20_000)) BenchmarkTools.Trial: 4266 samples with 1 evaluation. Range (min … max): 991.905 μs … 1.702 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.053 ms ┊ GC (median): 0.00% Time (mean ± σ): 1.071 ms ± 65.684 μs ┊ GC (mean ± σ): 0.00% ± 0.00% ▁▃▄▆▇██▇▆▃▂▂▃▂▂▂▁▁▁ ▁ ▅▇███████████████████████▇▇▇▇▆▆▇▆▆▆▆▇▇▇▆▅▆▆▆▆▅▆▄▄▄▄▅▅▃▃▃▃▃▃▃ █ 992 μs Histogram: log(frequency) by time 1.38 ms < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark sort!(x) setup=(x=rand(20_000)) BenchmarkTools.Trial: 4143 samples with 1 evaluation. Range (min … max): 1.005 ms … 2.248 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.078 ms ┊ GC (median): 0.00% Time (mean ± σ): 1.103 ms ± 85.456 μs ┊ GC (mean ± σ): 0.00% ± 0.00% ▁▄▅▆██▇▅▂▂▃▂▂▁▁▁ ▁▁▁ ▁ ▃▆▇███████████████████▇▇▆▇██████▇▆▇▆▇▆▅▆▃▆▅▅▃▅▅▃▄▃▅▅▅▄▅▃▃▆ █ 1.01 ms Histogram: log(frequency) by time 1.49 ms < Memory estimate: 0 bytes, allocs estimate: 0.

Using an OffsetArray is clever. Though that will only make a significant difference if lo:hi covers a relatively small part of vs. AFAICT this will only happen when there are lots of NaNs (via fpsort!), right?

That is also the only case I'm aware of. Do you still think it's worth it? It adds a dependency on OffsetArrays.

Maybe not if that's the only situation where it happens.

-Original file line number
+Diff line change
@@ Expand Up @@
     const RADIX_SIZE = 11
     const RADIX_MASK = 0x7FF
-    function sort!(vs::AbstractVector, lo::Int, hi::Int, ::RadixSortAlg, o::Ordering, ts=similar(vs))
+    function sort!(vs::AbstractVector, lo::Int, hi::Int, ::RadixSortAlg, o::Ordering, ts=similar(vs, 0))
+        # Fallback on small lists
+        if hi - lo < 2^RADIX_SIZE
+            return sort!(vs, lo, hi, Base.Sort.defalg(vs), o)
+        end
+        if length(ts) < length(vs); resize!(ts, length(vs)); end
         # Input checking
         if lo >= hi;  return vs;  end
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -3,7 +3,7 @@ using Test @@
     using StatsBase
     using Random
-    a = rand(1:10000, 1000)
+    a = rand(1:10000, 2200)
     for alg in [TimSort, HeapSort, RadixSort]
         b = sort(a, alg=alg)
@@ Expand Down Expand Up / @@ -54,7 +54,7 @@ end @@
     Random.seed!(0xdeadbeef)
-    for n in [0:10..., 100, 101, 1000, 1001]
+    for n in [0:10..., 100, 101, 1000, 1001, 2200, 2201]
         r = 1:10
         v = rand(1:10,n)
         h = fit(Histogram, v, r)
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

radix sort fallback to Base on small input #51

Diff view

Diff view

There are no files selected for viewing

quinnj Oct 17, 2021

nalimilan Oct 17, 2021

nalimilan Oct 17, 2021

LilithHafner Oct 17, 2021

nalimilan Oct 19, 2021

LilithHafner Oct 23, 2021

LilithHafner Oct 23, 2021

nalimilan Oct 23, 2021

LilithHafner Oct 23, 2021

nalimilan Oct 25, 2021

radix sort fallback to Base on small input #51

radix sort fallback to Base on small input #51

Diff view

Diff view

There are no files selected for viewing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment