-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
radix sort fallback to Base on small input #51
radix sort fallback to Base on small input #51
Conversation
Codecov Report
@@ Coverage Diff @@
## master #51 +/- ##
==========================================
- Coverage 97.84% 97.66% -0.19%
==========================================
Files 1 1
Lines 325 342 +17
==========================================
+ Hits 318 334 +16
- Misses 7 8 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
hi - lo < 2^RADIX_SIZE implies hi < lo unless we have an overflow issue, but on arrays that big we already fail b.c. bins is a UInt32 array.
src/SortingAlgorithms.jl
Outdated
function sort!(vs::AbstractVector, lo::Int, hi::Int, ::RadixSortAlg, o::Ordering, ts=similar(vs)) | ||
# Input checking | ||
if lo >= hi; return vs; end | ||
function sort!(vs::AbstractVector, lo::Int, hi::Int, ::RadixSortAlg, o::Ordering, ts=similar(vs, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need ts
as an argument here, right? I'm not aware of anywhere that passes their own ts
, for example. I think we just move ts = similar(vs)
down below the length check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use it in #33, but indeed I've no idea why it's supported here given that it's not used. Anyway better leave this for another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK you mention this because of the addition of the length check. Indeed better move this inside the function then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of very many users of SortingAlgorithms
, but I imagine one might desire to pass in ts
when sorting multiple arrays to avoid allocations and save time. For example:
using SortingAlgorithms, BenchmarkTools
using Base.Sort, Base.Order
using Base.Sort: defalg
function sort_ts!(A::AbstractArray;
dims::Integer,
alg::Algorithm=defalg(A),
lt=isless,
by=identity,
rev::Union{Bool,Nothing}=nothing,
order::Ordering=Forward)
ordr = ord(lt, by, rev, order)
nd = ndims(A)
k = dims
1 <= k <= nd || throw(ArgumentError("dimension out of range"))
remdims = ntuple(i -> i == k ? 1 : size(A, i), nd)
ts = similar(A[ntuple(i -> i == k ? Colon() : 1, nd)...])
for idx in CartesianIndices(remdims)
Av = view(A, ntuple(i -> i == k ? Colon() : idx[i], nd)...)
sort_ts!(Av, alg, ordr, ts)
end
A
end
function sort_ts!(v::AbstractVector, alg::Algorithm, order::Ordering, ts)
inds = axes(v,1)
sort!(v,first(inds),last(inds),alg,order,ts)
end
a = @benchmark sort!(x; dims=1) setup=(x=rand(Int, 5_000, 5_000))
display(a)
b = @benchmark sort!(x; alg=RadixSort, dims=1) setup=(x=rand(Int, 5_000, 5_000))
display(b)
c = @benchmark sort_ts!(x; alg=RadixSort, dims=1) setup=(x=rand(Int, 5_000, 5_000))
display(c)
With results
BenchmarkTools.Trial: 4 samples with 1 evaluation.
Range (min … max): 1.159 s … 1.168 s ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.164 s ┊ GC (median): 0.00%
Time (mean ± σ): 1.164 s ± 3.638 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ██ █
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
1.16 s Histogram: frequency by time 1.17 s <
Memory estimate: 468.75 KiB, allocs estimate: 10000.
BenchmarkTools.Trial: 4 samples with 1 evaluation.
Range (min … max): 1.210 s … 1.429 s ┊ GC (min … max): 15.85% … 23.46%
Time (median): 1.332 s ┊ GC (median): 20.54%
Time (mean ± σ): 1.326 s ± 110.688 ms ┊ GC (mean ± σ): 22.18% ± 6.99%
█ █ █ █
█▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█ ▁
1.21 s Histogram: frequency by time 1.43 s <
Memory estimate: 1.11 GiB, allocs estimate: 90000.
BenchmarkTools.Trial: 4 samples with 1 evaluation.
Range (min … max): 1.081 s … 1.286 s ┊ GC (min … max): 14.12% … 26.52%
Time (median): 1.281 s ┊ GC (median): 25.15%
Time (mean ± σ): 1.232 s ± 100.796 ms ┊ GC (mean ± σ): 24.24% ± 7.18%
▁ █▁
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██ ▁
1.08 s Histogram: frequency by time 1.29 s <
Memory estimate: 945.74 MiB, allocs estimate: 80004.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that it's not clear whether this is part of the public API or not. In doubt, maybe we should keep it to avoid breaking things.
But I don't think we should silently resize ts
disregarding its size. The cleanest solution would probably to use nothing
as the default. Otherwise, we could allow 0-length ts
as a special case, but that's a bit less clean (and that's one more allocation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the now-current edb805c version is as strict as before the PR: throws a bounds error if ts
or vs
fail to contain all of lo:hi
and otherwise messes with the contents of ts
but not its size.
I also think the current API is identical to the old: one may pass in a correctly sized ts
or leave that argument off.
Theoretically, the only changes this PR makes are:
- when
hi-lo
is small radix sort is not run, and no allocations are made. - when
ts
needs to be allocated (almost always) an OffsetVector spanninglo:hi
is allocated instead of a fullsimilar(vs)
- explicit low cost bounds checks at the beginning of the sorting algorithm that should cover all subsequent bounds errors (though no additional @inbounds are added)
Perhaps 2 and 3 belong in a different PR, but I think both are reasonable (assuming OffsetVectors are implemented as efficiently as Vectors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the use of an OffsetArray
is free:
julia> @benchmark sort!(x; alg=RadixSort) setup=(x=rand(20_000))
BenchmarkTools.Trial: 7600 samples with 1 evaluation.
Range (min … max): 421.061 μs … 5.636 ms ┊ GC (min … max): 0.00% … 89.58%
Time (median): 512.915 μs ┊ GC (median): 0.00%
Time (mean ± σ): 551.441 μs ± 340.750 μs ┊ GC (mean ± σ): 4.52% ± 6.63%
▁▂▂▂▁▃▄▆██▇▅▃▂▂▂▁▁▁▁ ▁ ▂
▅▆▆███████████████████████▇██▇▇▆▇▆▆▇▇▆▅▆▅▅▆▅▅▅▆▇█▇▇▄▃▃▄▄▂▄▂▄▅ █
421 μs Histogram: log(frequency) by time 871 μs <
Memory estimate: 349.91 KiB, allocs estimate: 16.
julia> @benchmark sort!(x; alg=RadixSort) setup=(x=OffsetArray(rand(20_000), 1:20_000))
BenchmarkTools.Trial: 7511 samples with 1 evaluation.
Range (min … max): 416.961 μs … 6.288 ms ┊ GC (min … max): 0.00% … 90.21%
Time (median): 511.857 μs ┊ GC (median): 0.00%
Time (mean ± σ): 553.784 μs ± 364.445 μs ┊ GC (mean ± σ): 4.80% ± 6.65%
▁▁▂▁▁ ▄▇██▇▆▅▄▃▂▂▂▂▁▁▁▁ ▂
▃▂▆██████▇█████████████████████▇███▇▆▆▆▇▇▆▅▆▅▆▆▅▅▆▅▅▄▆▆██▇▇▆▅ █
417 μs Histogram: log(frequency) by time 816 μs <
Memory estimate: 349.91 KiB, allocs estimate: 16.
julia> @benchmark sort!(x) setup=(x=OffsetArray(rand(20_000), 1:20_000))
BenchmarkTools.Trial: 4266 samples with 1 evaluation.
Range (min … max): 991.905 μs … 1.702 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.053 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.071 ms ± 65.684 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▃▄▆▇██▇▆▃▂▂▃▂▂▂▁▁▁ ▁
▅▇███████████████████████▇▇▇▇▆▆▇▆▆▆▆▇▇▇▆▅▆▆▆▆▅▆▄▄▄▄▅▅▃▃▃▃▃▃▃ █
992 μs Histogram: log(frequency) by time 1.38 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark sort!(x) setup=(x=rand(20_000))
BenchmarkTools.Trial: 4143 samples with 1 evaluation.
Range (min … max): 1.005 ms … 2.248 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.078 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.103 ms ± 85.456 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▄▅▆██▇▅▂▂▃▂▂▁▁▁ ▁▁▁ ▁
▃▆▇███████████████████▇▇▆▇██████▇▆▇▆▇▆▅▆▃▆▅▅▃▅▅▃▄▃▅▅▅▄▅▃▃▆ █
1.01 ms Histogram: log(frequency) by time 1.49 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an OffsetArray
is clever. Though that will only make a significant difference if lo:hi
covers a relatively small part of vs
. AFAICT this will only happen when there are lots of NaN
s (via fpsort!
), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is also the only case I'm aware of. Do you still think it's worth it? It adds a dependency on OffsetArrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not if that's the only situation where it happens.
function sort!(vs::AbstractVector, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering) | ||
if hi <= lo | ||
vs | ||
elseif hi - lo < RADIX_SMALL_THRESHOLD | ||
sort!(vs, lo, hi, Base.Sort.defalg(vs), o) | ||
else | ||
_sort!(vs, lo, hi, a, o, similar(vs, lo:hi)) | ||
end | ||
end | ||
function sort!(vs::AbstractVector{T}, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering, ts::AbstractVector{T}) where T | ||
if hi <= lo | ||
vs | ||
elseif hi - lo < RADIX_SMALL_THRESHOLD | ||
checkbounds(ts, lo:hi) # Consistently throw an error for insufficient ts. | ||
sort!(vs, lo, hi, Base.Sort.defalg(vs), o) | ||
else | ||
_sort!(vs, lo, hi, a, o, ts) | ||
end | ||
end | ||
function _sort!(vs::AbstractVector{T}, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering, ts::AbstractVector{T}) where T | ||
checkbounds(vs, lo:hi) | ||
checkbounds(ts, lo:hi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you simplify all this by taking the following approach (not tested)?
function sort!(vs::AbstractVector, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering) | |
if hi <= lo | |
vs | |
elseif hi - lo < RADIX_SMALL_THRESHOLD | |
sort!(vs, lo, hi, Base.Sort.defalg(vs), o) | |
else | |
_sort!(vs, lo, hi, a, o, similar(vs, lo:hi)) | |
end | |
end | |
function sort!(vs::AbstractVector{T}, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering, ts::AbstractVector{T}) where T | |
if hi <= lo | |
vs | |
elseif hi - lo < RADIX_SMALL_THRESHOLD | |
checkbounds(ts, lo:hi) # Consistently throw an error for insufficient ts. | |
sort!(vs, lo, hi, Base.Sort.defalg(vs), o) | |
else | |
_sort!(vs, lo, hi, a, o, ts) | |
end | |
end | |
function _sort!(vs::AbstractVector{T}, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering, ts::AbstractVector{T}) where T | |
checkbounds(vs, lo:hi) | |
checkbounds(ts, lo:hi) | |
function sort!(vs::AbstractVector{T}, lo::Int, hi::Int, a::RadixSortAlg, o::Ordering, | |
ts::Union{AbstractVector{T}, Nothing}=nothing) where T | |
if hi <= lo | |
return vs | |
elseif hi - lo < RADIX_SMALL_THRESHOLD | |
return sort!(vs, lo, hi, Base.Sort.defalg(vs), o) | |
elseif ts === nothing | |
ts = similar(vs, lo:hi) | |
end | |
checkbounds(vs, lo:hi) | |
checkbounds(ts, lo:hi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I didn't want to expose sort!(... ::Nothing)
for potential compatibility issues, but if this is internal &/or we are okay with declaring sort!(... ::Nothing)
, then that would be simpler. Is it okay to expand the interface in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also put the type check before the fallback & add a ts boundscheck on the fallback if ts is provided to give consistent errors. (Not require users to test large arrays to ensure they are getting errors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think it's okay to expand the interface in this way, I'll make the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalimilan, do you think it's okay to expand the interface in this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think that's fine.
src/SortingAlgorithms.jl
Outdated
function sort!(vs::AbstractVector, lo::Int, hi::Int, ::RadixSortAlg, o::Ordering, ts=similar(vs)) | ||
# Input checking | ||
if lo >= hi; return vs; end | ||
function sort!(vs::AbstractVector, lo::Int, hi::Int, ::RadixSortAlg, o::Ordering, ts=similar(vs, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an OffsetArray
is clever. Though that will only make a significant difference if lo:hi
covers a relatively small part of vs
. AFAICT this will only happen when there are lots of NaN
s (via fpsort!
), right?
Defer this to a separate PR
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
With radix sort in Base, I'm not as interested in maintaining RadixSort here. If someone has a use case for radix sort that isn't covered by Base, ping me, but until then I'm going to close this in favor of JuliaLang/julia#44230. |
Fixes #50
And defer allocation of the temporary array until after the fallback decision point because we will typically fall back to QuickSort which does not need a temporary array.
I chose 2^RADIX_SIZE because we allocate an array of that size. I validated the choice empirically with these tests: