Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

radix sort fallback to Base on small input #51

9 changes: 8 additions & 1 deletion src/SortingAlgorithms.jl
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,14 @@ uint_mapping(o::Lt, x ) = error("uint_mapping does not work with general L
const RADIX_SIZE = 11
const RADIX_MASK = 0x7FF

function sort!(vs::AbstractVector, lo::Int, hi::Int, ::RadixSortAlg, o::Ordering, ts=similar(vs))
function sort!(vs::AbstractVector, lo::Int, hi::Int, ::RadixSortAlg, o::Ordering, ts=similar(vs, 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need ts as an argument here, right? I'm not aware of anywhere that passes their own ts, for example. I think we just move ts = similar(vs) down below the length check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use it in #33, but indeed I've no idea why it's supported here given that it's not used. Anyway better leave this for another PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK you mention this because of the addition of the length check. Indeed better move this inside the function then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of very many users of SortingAlgorithms, but I imagine one might desire to pass in ts when sorting multiple arrays to avoid allocations and save time. For example:

using SortingAlgorithms, BenchmarkTools
using Base.Sort, Base.Order
using Base.Sort: defalg
function sort_ts!(A::AbstractArray;
               dims::Integer,
               alg::Algorithm=defalg(A),
               lt=isless,
               by=identity,
               rev::Union{Bool,Nothing}=nothing,
               order::Ordering=Forward)
    ordr = ord(lt, by, rev, order)
    nd = ndims(A)
    k = dims

    1 <= k <= nd || throw(ArgumentError("dimension out of range"))

    remdims = ntuple(i -> i == k ? 1 : size(A, i), nd)
    ts = similar(A[ntuple(i -> i == k ? Colon() : 1, nd)...])
    for idx in CartesianIndices(remdims)
        Av = view(A, ntuple(i -> i == k ? Colon() : idx[i], nd)...)
        sort_ts!(Av, alg, ordr, ts)
    end
    A
end

function sort_ts!(v::AbstractVector, alg::Algorithm, order::Ordering, ts)
    inds = axes(v,1)
    sort!(v,first(inds),last(inds),alg,order,ts)
end

a = @benchmark sort!(x; dims=1) setup=(x=rand(Int, 5_000, 5_000))
display(a)
b = @benchmark sort!(x; alg=RadixSort, dims=1) setup=(x=rand(Int, 5_000, 5_000))
display(b)
c = @benchmark sort_ts!(x; alg=RadixSort, dims=1) setup=(x=rand(Int, 5_000, 5_000))
display(c)

With results

BenchmarkTools.Trial: 4 samples with 1 evaluation.
 Range (min … max):  1.159 s …  1.168 s  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.164 s             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.164 s ± 3.638 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                             ██                       █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.16 s        Histogram: frequency by time        1.17 s <

 Memory estimate: 468.75 KiB, allocs estimate: 10000.
BenchmarkTools.Trial: 4 samples with 1 evaluation.
 Range (min … max):  1.210 s …    1.429 s  ┊ GC (min … max): 15.85% … 23.46%
 Time  (median):     1.332 s               ┊ GC (median):    20.54%
 Time  (mean ± σ):   1.326 s ± 110.688 ms  ┊ GC (mean ± σ):  22.18% ±  6.99%

  █          █                                         █   █  
  █▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█ ▁
  1.21 s         Histogram: frequency by time         1.43 s <

 Memory estimate: 1.11 GiB, allocs estimate: 90000.
BenchmarkTools.Trial: 4 samples with 1 evaluation.
 Range (min … max):  1.081 s …    1.286 s  ┊ GC (min … max): 14.12% … 26.52%
 Time  (median):     1.281 s               ┊ GC (median):    25.15%
 Time  (mean ± σ):   1.232 s ± 100.796 ms  ┊ GC (mean ± σ):  24.24% ±  7.18%

  ▁                                                       █▁  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██ ▁
  1.08 s         Histogram: frequency by time         1.29 s <

 Memory estimate: 945.74 MiB, allocs estimate: 80004.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that it's not clear whether this is part of the public API or not. In doubt, maybe we should keep it to avoid breaking things.

But I don't think we should silently resize ts disregarding its size. The cleanest solution would probably to use nothing as the default. Otherwise, we could allow 0-length ts as a special case, but that's a bit less clean (and that's one more allocation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the now-current edb805c version is as strict as before the PR: throws a bounds error if ts or vs fail to contain all of lo:hi and otherwise messes with the contents of ts but not its size.

I also think the current API is identical to the old: one may pass in a correctly sized ts or leave that argument off.

Theoretically, the only changes this PR makes are:

  1. when hi-lo is small radix sort is not run, and no allocations are made.
  2. when ts needs to be allocated (almost always) an OffsetVector spanning lo:hi is allocated instead of a full similar(vs)
  3. explicit low cost bounds checks at the beginning of the sorting algorithm that should cover all subsequent bounds errors (though no additional @inbounds are added)

Perhaps 2 and 3 belong in a different PR, but I think both are reasonable (assuming OffsetVectors are implemented as efficiently as Vectors)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use of an OffsetArray is free:

julia> @benchmark sort!(x; alg=RadixSort) setup=(x=rand(20_000))
BenchmarkTools.Trial: 7600 samples with 1 evaluation.
 Range (min  max):  421.061 μs    5.636 ms  ┊ GC (min  max): 0.00%  89.58%
 Time  (median):     512.915 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   551.441 μs ± 340.750 μs  ┊ GC (mean ± σ):  4.52% ±  6.63%

     ▁▂▂▂▁▃▄▆██▇▅▃▂▂▂▁▁▁▁                         ▁             ▂
  ▅▆▆███████████████████████▇██▇▇▆▇▆▆▇▇▆▅▆▅▅▆▅▅▅▆▇█▇▇▄▃▃▄▄▂▄▂▄▅ █
  421 μs        Histogram: log(frequency) by time        871 μs <

 Memory estimate: 349.91 KiB, allocs estimate: 16.

julia> @benchmark sort!(x; alg=RadixSort) setup=(x=OffsetArray(rand(20_000), 1:20_000))
BenchmarkTools.Trial: 7511 samples with 1 evaluation.
 Range (min  max):  416.961 μs    6.288 ms  ┊ GC (min  max): 0.00%  90.21%
 Time  (median):     511.857 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   553.784 μs ± 364.445 μs  ┊ GC (mean ± σ):  4.80% ±  6.65%

     ▁▁▂▁▁   ▄▇██▇▆▅▄▃▂▂▂▂▁▁▁▁                                  ▂
  ▃▂▆██████▇█████████████████████▇███▇▆▆▆▇▇▆▅▆▅▆▆▅▅▆▅▅▄▆▆██▇▇▆▅ █
  417 μs        Histogram: log(frequency) by time        816 μs <

 Memory estimate: 349.91 KiB, allocs estimate: 16.

julia> @benchmark sort!(x) setup=(x=OffsetArray(rand(20_000), 1:20_000))
BenchmarkTools.Trial: 4266 samples with 1 evaluation.
 Range (min  max):  991.905 μs   1.702 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):       1.053 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.071 ms ± 65.684 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▁▃▄▆▇██▇▆▃▂▂▃▂▂▂▁▁▁                                       ▁
  ▅▇███████████████████████▇▇▇▇▆▆▇▆▆▆▆▇▇▇▆▅▆▆▆▆▅▆▄▄▄▄▅▅▃▃▃▃▃▃▃ █
  992 μs        Histogram: log(frequency) by time      1.38 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sort!(x) setup=(x=rand(20_000))
BenchmarkTools.Trial: 4143 samples with 1 evaluation.
 Range (min  max):  1.005 ms   2.248 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.078 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.103 ms ± 85.456 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▁▄▅▆██▇▅▂▂▃▂▂▁▁▁          ▁▁▁                           ▁
  ▃▆▇███████████████████▇▇▆▇██████▇▆▇▆▇▆▅▆▃▆▅▅▃▅▅▃▄▃▅▅▅▄▅▃▃▆ █
  1.01 ms      Histogram: log(frequency) by time     1.49 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an OffsetArray is clever. Though that will only make a significant difference if lo:hi covers a relatively small part of vs. AFAICT this will only happen when there are lots of NaNs (via fpsort!), right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also the only case I'm aware of. Do you still think it's worth it? It adds a dependency on OffsetArrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not if that's the only situation where it happens.

# Fallback on small lists
LilithHafner marked this conversation as resolved.
Show resolved Hide resolved
if hi - lo < 2^RADIX_SIZE
return sort!(vs, lo, hi, Base.Sort.defalg(vs), o)
end

if length(ts) < length(vs); resize!(ts, length(vs)); end

# Input checking
if lo >= hi; return vs; end

LilithHafner marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
4 changes: 2 additions & 2 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ using Test
using StatsBase
using Random

a = rand(1:10000, 1000)
a = rand(1:10000, 2200)
LilithHafner marked this conversation as resolved.
Show resolved Hide resolved

for alg in [TimSort, HeapSort, RadixSort]
b = sort(a, alg=alg)
Expand Down Expand Up @@ -54,7 +54,7 @@ end

Random.seed!(0xdeadbeef)

for n in [0:10..., 100, 101, 1000, 1001]
for n in [0:10..., 100, 101, 1000, 1001, 2200, 2201]
r = 1:10
v = rand(1:10,n)
h = fit(Histogram, v, r)
Expand Down