-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitonic mergesort #62
base: main
Are you sure you want to change the base?
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,12 +9,13 @@ using Base.Order | |
import Base.Sort: sort! | ||
import DataStructures: heapify!, percolate_down! | ||
|
||
export HeapSort, TimSort, RadixSort, CombSort | ||
export HeapSort, TimSort, RadixSort, CombSort, BitonicSort | ||
|
||
struct HeapSortAlg <: Algorithm end | ||
struct TimSortAlg <: Algorithm end | ||
struct RadixSortAlg <: Algorithm end | ||
struct CombSortAlg <: Algorithm end | ||
struct BitonicSortAlg <: Algorithm end | ||
|
||
const HeapSort = HeapSortAlg() | ||
const TimSort = TimSortAlg() | ||
|
@@ -46,6 +47,27 @@ Characteristics: | |
""" | ||
const CombSort = CombSortAlg() | ||
|
||
""" | ||
BitonicSort | ||
|
||
Indicates that a sorting function should use the bitonic mergesort algorithm. | ||
This algorithm performs a series of pre-determined comparisons, and tends to be very parallelizable. | ||
The algorithm effectively implements a sorting network based on merging bitonic sequences. | ||
|
||
Characteristics: | ||
- *not stable* does not preserve the ordering of elements which | ||
compare equal (e.g. "a" and "A" in a sort of letters which | ||
ignores case). | ||
- *in-place* in memory. | ||
- *parallelizable* suitable for vectorization with SIMD instructions because | ||
it performs many independent comparisons. | ||
|
||
## References | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://en.wikipedia.org/wiki/Bitonic_sorter is a great reference for this algorithm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not really a big fan of bloating the References sections, or citing Wikipedia, but fine by me if you think so. |
||
- Batcher, K.E., "Sorting networks and their applications", AFIPS '68 (1968), doi: https://doi.org/10.1145/1468075.1468121. | ||
- "Bitonic Sorter", in "Wikipedia" (Oct 2022). https://en.wikipedia.org/wiki/Bitonic_sorter | ||
nlw0 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
const BitonicSort = BitonicSortAlg() | ||
|
||
|
||
## Heap sort | ||
|
||
|
@@ -609,14 +631,68 @@ function sort!(v::AbstractVector, lo::Int, hi::Int, ::CombSortAlg, o::Ordering) | |
interval = (3 * (hi-lo+1)) >> 2 | ||
|
||
while interval > 1 | ||
@inbounds for j in lo:hi-interval | ||
a, b = v[j], v[j+interval] | ||
v[j], v[j+interval] = lt(o, b, a) ? (b, a) : (a, b) | ||
for j in lo:hi-interval | ||
@inbounds comparator!(v, j, j+interval, o) | ||
end | ||
interval = (3 * interval) >> 2 | ||
end | ||
|
||
return sort!(v, lo, hi, InsertionSort, o) | ||
end | ||
|
||
function sort!(v::AbstractVector, lo::Int, hi::Int, ::BitonicSortAlg, o::Ordering) | ||
return bitonicsort!(view(v, lo:hi), o::Ordering) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using views in this context sometimes comes with a runtime performance penalty. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there an alternative? This is only called once, btw, would it still be relevant? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The alternative used throughout SortingAlgorithms and Base.Sort is to carry around lo and hi everywhere. It is annoying but efficient. If you can get away with using views without performance penalty that would be great, but you'd need to benchmark both ways to find out. Yes, it would still be relevant because all future indexing operations will be on a |
||
end | ||
|
||
function bitonicsort!(data, o::Ordering) | ||
N = length(data) | ||
for n in 1:intlog2(N) | ||
bitonicfirststage!(data, Val(n), o::Ordering) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is type-unstable because the type of I would start with type-stable code and only switch to type-unstable if you can find a convincing reason to do so (in the case of loop unrolling, that would require end to end benchmarks showing the unstable version is faster). Type instabilities also have disadvantages other than runtime performance (e.g. precompilation efficacy & static analysis). In this case, the first time someone sorts a very large list, there will be new compilation for a new There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we simply force N and n to be integers somehow? And what makes it unstable in the first place? The idea is indeed to be forcing compilation with knowledge of the input size, because this seems necessary to trigger compiler optimizations. I saw a lot more vectorization in the machine code with that, and I believe it won't be really interesting unless we do this. We might have a limit above which we don't use compile-time-known values, if that's an issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Forcing separate compilation for each step size is possible but should not be necessary to get SIMD. I originally took this approach in implementing radix sort in julia base to compile separately for every chunck size, but eventually realized that there wasn't any noticeable benefit from doing that and removed the optimization as it wasn't helpful. This is the commit where I removed it: JuliaLang/julia@17f45e8. The parent commit contains an example that avoids dynamic dispatch for small values by using explicit if statements. It is ugly IMO and I would only support it if it gave substantial speedups. While it is possible to glean vlauble information from For example,
|
||
for m in n-1:-1:1 | ||
bitonicsecondstage!(data, Val(m), o::Ordering) | ||
end | ||
end | ||
return data | ||
end | ||
|
||
function bitonicfirststage!(v, ::Val{Gap}, o::Ordering) where Gap | ||
N = length(v) | ||
@inbounds begin | ||
nlw0 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
gap = 1 << Gap | ||
g2 = gap >> 1 | ||
for i in 0:gap:N-1 | ||
firstj = max(0, i + gap - N) | ||
for j in firstj:(g2-1) | ||
ia = i + j | ||
ib = i + gap - j - 1 | ||
@inbounds comparator!(v, ia + 1, ib + 1, o) | ||
end | ||
end | ||
end | ||
end | ||
|
||
function bitonicsecondstage!(v, ::Val{Gap}, o::Ordering) where Gap | ||
N = length(v) | ||
@inbounds begin | ||
for n in Gap:-1:1 | ||
gap = 1 << n | ||
for i in 0:gap:N-1 | ||
lastj = min(N - 1, N - (i + gap >> 1 + 1)) | ||
for j in 0:lastj | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When Perhaps the bounds of this loop should be tighter. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm still trying to understand it all, because in the sorting network literature it's all about the delay, or depth... On the wikipedia page, though, it clearly states the network has By eyeballing the diagram, or unrolling these two stages, we have this:
So it's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that that is what the runtime should be, but I think there is an error in the implementation that results in much larger runtimes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops, you're right! There was an extra outer for-loop for the second stage... Hopefully that helps a bit, but I'm still skeptical this is as good as it gets. I mean, I'm not even sure it will really be worth it in the end, but I'm not sure we're at the best for this algorithm yet. |
||
ia = i + j | ||
ib = i + j + gap >> 1 | ||
@inbounds comparator!(v, ia + 1, ib + 1, o) | ||
end | ||
end | ||
end | ||
end | ||
end | ||
|
||
intlog2(n) = (n > 1) ? 8sizeof(n-1)-leading_zeros(n-1) : 0 | ||
|
||
Base.@propagate_inbounds function comparator!(v, i, j, o) | ||
a, b = v[i], v[j] | ||
v[i], v[j] = lt(o, b, a) ? (b, a) : (a, b) | ||
end | ||
|
||
end # module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have some characteristics that help people decide whether this algorithm is appropriate for their use case (e.g. when could it be better than the default sorting algorithms)