Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitonic mergesort #62

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 72 additions & 4 deletions src/SortingAlgorithms.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,13 @@ using Base.Order
import Base.Sort: sort!
import DataStructures: heapify!, percolate_down!

export HeapSort, TimSort, RadixSort, CombSort
export HeapSort, TimSort, RadixSort, CombSort, BitonicSort

struct HeapSortAlg <: Algorithm end
struct TimSortAlg <: Algorithm end
struct RadixSortAlg <: Algorithm end
struct CombSortAlg <: Algorithm end
struct BitonicSortAlg <: Algorithm end

const HeapSort = HeapSortAlg()
const TimSort = TimSortAlg()
Expand Down Expand Up @@ -46,6 +47,25 @@ Characteristics:
"""
const CombSort = CombSortAlg()

"""
BitonicSort

Indicates that a sorting function should use the bitonic mergesort algorithm.
This algorithm performs a series of pre-determined comparisons, and tends to be very parallelizable.
The algorithm effectively implements a sorting network based on merging bitonic sequences.

Characteristics:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have some characteristics that help people decide whether this algorithm is appropriate for their use case (e.g. when could it be better than the default sorting algorithms)

- *not stable* does not preserve the ordering of elements which
compare equal (e.g. "a" and "A" in a sort of letters which
ignores case).
- *in-place* in memory.
- *parallelizable* suitable for vectorization with SIMD instructions because
it performs many independent comparisons.

See wikipedia.org/wiki/Bitonic_sorter for more information.
"""
const BitonicSort = BitonicSortAlg()


## Heap sort

Expand Down Expand Up @@ -609,14 +629,62 @@ function sort!(v::AbstractVector, lo::Int, hi::Int, ::CombSortAlg, o::Ordering)
interval = (3 * (hi-lo+1)) >> 2

while interval > 1
@inbounds for j in lo:hi-interval
a, b = v[j], v[j+interval]
v[j], v[j+interval] = lt(o, b, a) ? (b, a) : (a, b)
for j in lo:hi-interval
@inbounds comparator!(v, j, j+interval, o)
end
interval = (3 * interval) >> 2
end

return sort!(v, lo, hi, InsertionSort, o)
end

function sort!(v::AbstractVector, lo::Int, hi::Int, ::BitonicSortAlg, o::Ordering)
return bitonicsort!(view(v, lo:hi), o::Ordering)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using views in this context sometimes comes with a runtime performance penalty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an alternative? This is only called once, btw, would it still be relevant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative used throughout SortingAlgorithms and Base.Sort is to carry around lo and hi everywhere. It is annoying but efficient. If you can get away with using views without performance penalty that would be great, but you'd need to benchmark both ways to find out.

Yes, it would still be relevant because all future indexing operations will be on a View rather than a Vector (or other input type)

end

function bitonicsort!(data, o::Ordering)
N = length(data)
for n in 1:intlog2(N)
bitonicfirststage!(data, Val(n), o::Ordering)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is type-unstable because the type of Val(n) depends on something other than the input types of bitonicsort!. It is possible but unlikely that it is a good idea to include type instability in a good sorting algorithm.

I would start with type-stable code and only switch to type-unstable if you can find a convincing reason to do so (in the case of loop unrolling, that would require end to end benchmarks showing the unstable version is faster).

Type instabilities also have disadvantages other than runtime performance (e.g. precompilation efficacy & static analysis). In this case, the first time someone sorts a very large list, there will be new compilation for a new n.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simply force N and n to be integers somehow? And what makes it unstable in the first place?

The idea is indeed to be forcing compilation with knowledge of the input size, because this seems necessary to trigger compiler optimizations. I saw a lot more vectorization in the machine code with that, and I believe it won't be really interesting unless we do this.

We might have a limit above which we don't use compile-time-known values, if that's an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N and n are already integers (length is documented to return an Integer), but that is not the issue. What makes Val(n) unstable is that typeof(Val(4)) == Val{4} != typeof(Val(3)). The compiler only knows about types, not values, so it knows the type of Val(n::Int) is Val{???} but it does not know what value n holds, so it doesn't know the concrete type. This is in contrast with Set(n::Int) which is of type Set{Int} where the compiler can deduce the type of the result from the types of the inputs.

Forcing separate compilation for each step size is possible but should not be necessary to get SIMD. I originally took this approach in implementing radix sort in julia base to compile separately for every chunck size, but eventually realized that there wasn't any noticeable benefit from doing that and removed the optimization as it wasn't helpful. This is the commit where I removed it: JuliaLang/julia@17f45e8. The parent commit contains an example that avoids dynamic dispatch for small values by using explicit if statements. It is ugly IMO and I would only support it if it gave substantial speedups.

While it is possible to glean vlauble information from @code_llvm and @code_native outputs, it takes a great deal of understanding of CPU architectures to make accurate value judgements from them. I highly recommend benchmarking with @time (or BenchmarkTools' @btime) and profiling with Profile's @profile (or VSCode's @profview). Because those tools can quickly and easily point to potential problems and detect improvements or regressions.

For example, @profview for _ in 1:1000 sort!(rand(Int, 80); alg=BitonicSort) end reveals that 95% of sorting time is spent in the bitonicsecondstage! function and less than 1% is spent in the bitonicfirststage! function. This indicates substantial a performance problem in the bitonicsecondstage! function.

@profview for _ in 1:200000 sort!(rand(Int, 15); alg=BitonicSort) end indicates that about 65% the sorting time is spent in the Val function and in runtime dispatch, indicating a problem there, too.

for m in n-1:-1:1
bitonicsecondstage!(data, Val(m), o::Ordering)
end
end
return data
end

function bitonicfirststage!(v, ::Val{Gap}, o::Ordering) where Gap
N = length(v)
gap = 1 << Gap
halfgap = 1 << (Gap - 1)
for i in 0:gap:N-1
firstj = max(0, i + gap - N)
for j in firstj:(halfgap-1)
ia = i + j
ib = i + gap - j - 1
@inbounds comparator!(v, ia + 1, ib + 1, o)
end
end
end

function bitonicsecondstage!(v, ::Val{Gap}, o::Ordering) where Gap
N = length(v)
gap = 1 << Gap
for i in 0:gap:N-1
lastj = min(N - 1, N - (i + gap >> 1 + 1))
for j in 0:lastj
ia = i + j
ib = i + j + gap >> 1
@inbounds comparator!(v, ia + 1, ib + 1, o)
end
end
end

intlog2(n) = (n > 1) ? 8sizeof(n-1)-leading_zeros(n-1) : 0

Base.@propagate_inbounds function comparator!(v, i, j, o)
a, b = v[i], v[j]
v[i], v[j] = lt(o, b, a) ? (b, a) : (a, b)
end

end # module
6 changes: 3 additions & 3 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ using Random

a = rand(1:10000, 1000)

for alg in [TimSort, HeapSort, RadixSort, CombSort]
for alg in [TimSort, HeapSort, RadixSort, CombSort, BitonicSort]
b = sort(a, alg=alg)
@test issorted(b)
ix = sortperm(a, alg=alg)
Expand Down Expand Up @@ -85,7 +85,7 @@ for n in [0:10..., 100, 101, 1000, 1001]
end

# unstable algorithms
for alg in [HeapSort, CombSort]
for alg in [HeapSort, CombSort, BitonicSort]
p = sortperm(v, alg=alg, order=ord)
@test isperm(p)
@test v[p] == si
Expand All @@ -99,7 +99,7 @@ for n in [0:10..., 100, 101, 1000, 1001]

v = randn_with_nans(n,0.1)
for ord in [Base.Order.Forward, Base.Order.Reverse],
alg in [TimSort, HeapSort, RadixSort, CombSort]
alg in [TimSort, HeapSort, RadixSort, CombSort, BitonicSort]
# test float sorting with NaNs
s = sort(v, alg=alg, order=ord)
@test issorted(s, order=ord)
Expand Down