radix sort fallback to Base on small input #51

LilithHafner · 2021-09-30T17:03:14Z

Fixes #50

And defer allocation of the temporary array until after the fallback decision point because we will typically fall back to QuickSort which does not need a temporary array.

I chose 2^RADIX_SIZE because we allocate an array of that size. I validated the choice empirically with these tests:

for n in [500, 1000, 2000, 4000]
    a = @belapsed sort(x) setup=(x=rand(Int, $n)) evals=1 seconds=1
    b = @belapsed sort(x; alg=RadixSort) setup=(x=rand(Int, $n)) evals=1 seconds=1
    println(n, ": ", a/b)
end
500: 0.4460956119444834
1000: 0.9001948035520023
2000: 1.337070228643473
4000: 1.7441533572359844
for n in [500, 1000, 2000, 4000]
    a = @belapsed sort(x) setup=(x=rand(Float, $n)) evals=1 seconds=1
    b = @belapsed sort(x; alg=RadixSort) setup=(x=rand(Float, $n)) evals=1 seconds=1
    println(n, ": ", a/b)
end
500: 0.38127995053331276
1000: 0.6718377570395507
2000: 1.112666746785242
4000: 1.4809084311791743

codecov-commenter · 2021-09-30T17:07:08Z

Codecov Report

Merging #51 (c507f37) into master (d0a9007) will decrease coverage by 0.18%.
The diff coverage is 90.00%.

@@            Coverage Diff             @@
##           master      #51      +/-   ##
==========================================
- Coverage   97.84%   97.66%   -0.19%     
==========================================
  Files           1        1              
  Lines         325      342      +17     
==========================================
+ Hits          318      334      +16     
- Misses          7        8       +1

Impacted Files	Coverage Δ
src/SortingAlgorithms.jl	`97.66% <90.00%> (-0.19%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d0a9007...c507f37. Read the comment docs.

nalimilan

Thanks!

test/runtests.jl

src/SortingAlgorithms.jl

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

hi - lo < 2^RADIX_SIZE implies hi < lo unless we have an overflow issue, but on arrays that big we already fail b.c. bins is a UInt32 array.

quinnj · 2021-10-17T03:38:39Z

src/SortingAlgorithms.jl

-function sort!(vs::AbstractVector, lo::Int, hi::Int, ::RadixSortAlg, o::Ordering, ts=similar(vs))
-    # Input checking
-    if lo >= hi;  return vs;  end
+function sort!(vs::AbstractVector, lo::Int, hi::Int, ::RadixSortAlg, o::Ordering, ts=similar(vs, 0))


I don't think we need ts as an argument here, right? I'm not aware of anywhere that passes their own ts, for example. I think we just move ts = similar(vs) down below the length check.

I use it in #33, but indeed I've no idea why it's supported here given that it's not used. Anyway better leave this for another PR?

Ah OK you mention this because of the addition of the length check. Indeed better move this inside the function then.

I'm not aware of very many users of SortingAlgorithms, but I imagine one might desire to pass in ts when sorting multiple arrays to avoid allocations and save time. For example:

using SortingAlgorithms, BenchmarkTools using Base.Sort, Base.Order using Base.Sort: defalg function sort_ts!(A::AbstractArray; dims::Integer, alg::Algorithm=defalg(A), lt=isless, by=identity, rev::Union{Bool,Nothing}=nothing, order::Ordering=Forward) ordr = ord(lt, by, rev, order) nd = ndims(A) k = dims 1 <= k <= nd || throw(ArgumentError("dimension out of range")) remdims = ntuple(i -> i == k ? 1 : size(A, i), nd) ts = similar(A[ntuple(i -> i == k ? Colon() : 1, nd)...]) for idx in CartesianIndices(remdims) Av = view(A, ntuple(i -> i == k ? Colon() : idx[i], nd)...) sort_ts!(Av, alg, ordr, ts) end A end function sort_ts!(v::AbstractVector, alg::Algorithm, order::Ordering, ts) inds = axes(v,1) sort!(v,first(inds),last(inds),alg,order,ts) end a = @benchmark sort!(x; dims=1) setup=(x=rand(Int, 5_000, 5_000)) display(a) b = @benchmark sort!(x; alg=RadixSort, dims=1) setup=(x=rand(Int, 5_000, 5_000)) display(b) c = @benchmark sort_ts!(x; alg=RadixSort, dims=1) setup=(x=rand(Int, 5_000, 5_000)) display(c)

With results

BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.159 s … 1.168 s ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.164 s ┊ GC (median): 0.00% Time (mean ± σ): 1.164 s ± 3.638 ms ┊ GC (mean ± σ): 0.00% ± 0.00% █ ██ █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁ 1.16 s Histogram: frequency by time 1.17 s < Memory estimate: 468.75 KiB, allocs estimate: 10000. BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.210 s … 1.429 s ┊ GC (min … max): 15.85% … 23.46% Time (median): 1.332 s ┊ GC (median): 20.54% Time (mean ± σ): 1.326 s ± 110.688 ms ┊ GC (mean ± σ): 22.18% ± 6.99% █ █ █ █ █▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█ ▁ 1.21 s Histogram: frequency by time 1.43 s < Memory estimate: 1.11 GiB, allocs estimate: 90000. BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.081 s … 1.286 s ┊ GC (min … max): 14.12% … 26.52% Time (median): 1.281 s ┊ GC (median): 25.15% Time (mean ± σ): 1.232 s ± 100.796 ms ┊ GC (mean ± σ): 24.24% ± 7.18% ▁ █▁ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██ ▁ 1.08 s Histogram: frequency by time 1.29 s < Memory estimate: 945.74 MiB, allocs estimate: 80004.

The problem is that it's not clear whether this is part of the public API or not. In doubt, maybe we should keep it to avoid breaking things.

But I don't think we should silently resize ts disregarding its size. The cleanest solution would probably to use nothing as the default. Otherwise, we could allow 0-length ts as a special case, but that's a bit less clean (and that's one more allocation).

I believe the now-current edb805c version is as strict as before the PR: throws a bounds error if ts or vs fail to contain all of lo:hi and otherwise messes with the contents of ts but not its size.

I also think the current API is identical to the old: one may pass in a correctly sized ts or leave that argument off.

Theoretically, the only changes this PR makes are:

when hi-lo is small radix sort is not run, and no allocations are made.

when ts needs to be allocated (almost always) an OffsetVector spanning lo:hi is allocated instead of a full similar(vs)

explicit low cost bounds checks at the beginning of the sorting algorithm that should cover all subsequent bounds errors (though no additional @inbounds are added)

Perhaps 2 and 3 belong in a different PR, but I think both are reasonable (assuming OffsetVectors are implemented as efficiently as Vectors)

I think the use of an OffsetArray is free:

julia> @benchmark sort!(x; alg=RadixSort) setup=(x=rand(20_000)) BenchmarkTools.Trial: 7600 samples with 1 evaluation. Range (min … max): 421.061 μs … 5.636 ms ┊ GC (min … max): 0.00% … 89.58% Time (median): 512.915 μs ┊ GC (median): 0.00% Time (mean ± σ): 551.441 μs ± 340.750 μs ┊ GC (mean ± σ): 4.52% ± 6.63% ▁▂▂▂▁▃▄▆██▇▅▃▂▂▂▁▁▁▁ ▁ ▂ ▅▆▆███████████████████████▇██▇▇▆▇▆▆▇▇▆▅▆▅▅▆▅▅▅▆▇█▇▇▄▃▃▄▄▂▄▂▄▅ █ 421 μs Histogram: log(frequency) by time 871 μs < Memory estimate: 349.91 KiB, allocs estimate: 16. julia> @benchmark sort!(x; alg=RadixSort) setup=(x=OffsetArray(rand(20_000), 1:20_000)) BenchmarkTools.Trial: 7511 samples with 1 evaluation. Range (min … max): 416.961 μs … 6.288 ms ┊ GC (min … max): 0.00% … 90.21% Time (median): 511.857 μs ┊ GC (median): 0.00% Time (mean ± σ): 553.784 μs ± 364.445 μs ┊ GC (mean ± σ): 4.80% ± 6.65% ▁▁▂▁▁ ▄▇██▇▆▅▄▃▂▂▂▂▁▁▁▁ ▂ ▃▂▆██████▇█████████████████████▇███▇▆▆▆▇▇▆▅▆▅▆▆▅▅▆▅▅▄▆▆██▇▇▆▅ █ 417 μs Histogram: log(frequency) by time 816 μs < Memory estimate: 349.91 KiB, allocs estimate: 16. julia> @benchmark sort!(x) setup=(x=OffsetArray(rand(20_000), 1:20_000)) BenchmarkTools.Trial: 4266 samples with 1 evaluation. Range (min … max): 991.905 μs … 1.702 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.053 ms ┊ GC (median): 0.00% Time (mean ± σ): 1.071 ms ± 65.684 μs ┊ GC (mean ± σ): 0.00% ± 0.00% ▁▃▄▆▇██▇▆▃▂▂▃▂▂▂▁▁▁ ▁ ▅▇███████████████████████▇▇▇▇▆▆▇▆▆▆▆▇▇▇▆▅▆▆▆▆▅▆▄▄▄▄▅▅▃▃▃▃▃▃▃ █ 992 μs Histogram: log(frequency) by time 1.38 ms < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark sort!(x) setup=(x=rand(20_000)) BenchmarkTools.Trial: 4143 samples with 1 evaluation. Range (min … max): 1.005 ms … 2.248 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.078 ms ┊ GC (median): 0.00% Time (mean ± σ): 1.103 ms ± 85.456 μs ┊ GC (mean ± σ): 0.00% ± 0.00% ▁▄▅▆██▇▅▂▂▃▂▂▁▁▁ ▁▁▁ ▁ ▃▆▇███████████████████▇▇▆▇██████▇▆▇▆▇▆▅▆▃▆▅▅▃▅▅▃▄▃▅▅▅▄▅▃▃▆ █ 1.01 ms Histogram: log(frequency) by time 1.49 ms < Memory estimate: 0 bytes, allocs estimate: 0.

Using an OffsetArray is clever. Though that will only make a significant difference if lo:hi covers a relatively small part of vs. AFAICT this will only happen when there are lots of NaNs (via fpsort!), right?

That is also the only case I'm aware of. Do you still think it's worth it? It adds a dependency on OffsetArrays.

Maybe not if that's the only situation where it happens.

src/SortingAlgorithms.jl

nalimilan · 2021-10-23T13:00:11Z

src/SortingAlgorithms.jl

+function sort!(vs::AbstractVector, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering)
+    if hi <= lo
+        vs
+    elseif hi - lo < RADIX_SMALL_THRESHOLD
+        sort!(vs, lo, hi, Base.Sort.defalg(vs), o)
+    else
+        _sort!(vs, lo, hi, a, o, similar(vs, lo:hi))
+    end
+end
+function sort!(vs::AbstractVector{T}, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering, ts::AbstractVector{T}) where T
+    if hi <= lo
+        vs
+    elseif hi - lo < RADIX_SMALL_THRESHOLD
+        checkbounds(ts, lo:hi) # Consistently throw an error for insufficient ts.
+        sort!(vs, lo, hi, Base.Sort.defalg(vs), o)
+    else
+        _sort!(vs, lo, hi, a, o, ts)
+    end
+end
+function _sort!(vs::AbstractVector{T}, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering, ts::AbstractVector{T}) where T
+    checkbounds(vs, lo:hi)
+    checkbounds(ts, lo:hi)


Can't you simplify all this by taking the following approach (not tested)?

Suggested change

function sort!(vs::AbstractVector, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering)

if hi <= lo

vs

elseif hi - lo < RADIX_SMALL_THRESHOLD

sort!(vs, lo, hi, Base.Sort.defalg(vs), o)

else

_sort!(vs, lo, hi, a, o, similar(vs, lo:hi))

end

end

function sort!(vs::AbstractVector{T}, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering, ts::AbstractVector{T}) where T

if hi <= lo

vs

elseif hi - lo < RADIX_SMALL_THRESHOLD

checkbounds(ts, lo:hi) # Consistently throw an error for insufficient ts.

sort!(vs, lo, hi, Base.Sort.defalg(vs), o)

else

_sort!(vs, lo, hi, a, o, ts)

end

end

function _sort!(vs::AbstractVector{T}, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering, ts::AbstractVector{T}) where T

checkbounds(vs, lo:hi)

checkbounds(ts, lo:hi)

function sort!(vs::AbstractVector{T}, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering,

ts::Union{AbstractVector{T}, Nothing}=nothing) where T

if hi <= lo

return vs

elseif hi - lo < RADIX_SMALL_THRESHOLD

return sort!(vs, lo, hi, Base.Sort.defalg(vs), o)

elseif ts === nothing

ts = similar(vs, lo:hi)

end

checkbounds(vs, lo:hi)

checkbounds(ts, lo:hi)

Yes. I didn't want to expose sort!(... ::Nothing) for potential compatibility issues, but if this is internal &/or we are okay with declaring sort!(... ::Nothing), then that would be simpler. Is it okay to expand the interface in this case?

We should also put the type check before the fallback & add a ts boundscheck on the fallback if ts is provided to give consistent errors. (Not require users to test large arrays to ensure they are getting errors)

If you think it's okay to expand the interface in this way, I'll make the changes.

@nalimilan, do you think it's okay to expand the interface in this way?

Yes I think that's fine.

nalimilan · 2021-10-23T13:08:49Z

src/SortingAlgorithms.jl

-function sort!(vs::AbstractVector, lo::Int, hi::Int, ::RadixSortAlg, o::Ordering, ts=similar(vs))
-    # Input checking
-    if lo >= hi;  return vs;  end
+function sort!(vs::AbstractVector, lo::Int, hi::Int, ::RadixSortAlg, o::Ordering, ts=similar(vs, 0))


Using an OffsetArray is clever. Though that will only make a significant difference if lo:hi covers a relatively small part of vs. AFAICT this will only happen when there are lots of NaNs (via fpsort!), right?

Defer this to a separate PR

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

LilithHafner · 2022-04-03T01:12:41Z

With radix sort in Base, I'm not as interested in maintaining RadixSort here. If someone has a use case for radix sort that isn't covered by Base, ping me, but until then I'm going to close this in favor of JuliaLang/julia#44230.

radix sort fallback to Base on small input

ae551b7

LilithHafner changed the title ~~radix sort fallback to Base on small input (#50)~~ radix sort fallback to Base on small input (fixes #50) Sep 30, 2021

LilithHafner changed the title ~~radix sort fallback to Base on small input (fixes #50)~~ radix sort fallback to Base on small input Sep 30, 2021

increase test input size to cover radixsort

ee9a5a8

nalimilan reviewed Oct 16, 2021

View reviewed changes

test/runtests.jl Outdated Show resolved Hide resolved

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved

nalimilan mentioned this pull request Oct 16, 2021

Add RadixSort as default sorting algorithm for InlineString types JuliaStrings/InlineStrings.jl#10

Merged

LilithHafner commented Oct 16, 2021

View reviewed changes

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved

LilithHafner and others added 3 commits October 16, 2021 16:30

Update src/SortingAlgorithms.jl

f3190d4

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

Test smaller & larger inputs

6a9b226

Update src/SortingAlgorithms.jl

b662767

hi - lo < 2^RADIX_SIZE implies hi < lo unless we have an overflow issue, but on arrays that big we already fail b.c. bins is a UInt32 array.

quinnj reviewed Oct 17, 2021

View reviewed changes

LilithHafner added 2 commits October 22, 2021 16:35

Handle default arguments manualy

5f005a7

Update Project.toml

edb805c

LilithHafner mentioned this pull request Oct 22, 2021

document and update default choice between :dict and :radixsort JuliaStats/StatsBase.jl#720

Merged

nalimilan reviewed Oct 23, 2021

View reviewed changes

LilithHafner and others added 4 commits October 23, 2021 09:52

Cover uncommon branches of radix_sort

95fb8ea

Rollback consistent exception testing

4bc1464

Defer this to a separate PR

whitespace

97030b0

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

fix tests

c507f37

LilithHafner mentioned this pull request Oct 26, 2021

make counting more robust to input datatype JuliaStats/StatsBase.jl#722

Merged

LilithHafner closed this Apr 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

radix sort fallback to Base on small input #51

radix sort fallback to Base on small input #51

LilithHafner commented Sep 30, 2021 •

edited

Loading

codecov-commenter commented Sep 30, 2021 •

edited

Loading

nalimilan left a comment

quinnj Oct 17, 2021

nalimilan Oct 17, 2021

nalimilan Oct 17, 2021

LilithHafner Oct 17, 2021

nalimilan Oct 19, 2021

LilithHafner Oct 23, 2021

LilithHafner Oct 23, 2021

nalimilan Oct 23, 2021

LilithHafner Oct 23, 2021

nalimilan Oct 25, 2021

nalimilan Oct 23, 2021 •

edited

Loading

LilithHafner Oct 23, 2021

LilithHafner Oct 23, 2021

LilithHafner Oct 23, 2021

LilithHafner Oct 24, 2021

nalimilan Oct 25, 2021

nalimilan Oct 23, 2021

LilithHafner commented Apr 3, 2022

radix sort fallback to Base on small input #51

radix sort fallback to Base on small input #51

Conversation

LilithHafner commented Sep 30, 2021 • edited Loading

codecov-commenter commented Sep 30, 2021 • edited Loading

Codecov Report

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan Oct 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LilithHafner commented Apr 3, 2022

LilithHafner commented Sep 30, 2021 •

edited

Loading

codecov-commenter commented Sep 30, 2021 •

edited

Loading

nalimilan Oct 23, 2021 •

edited

Loading