Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PagedMergeSort, a merge sort using O(√n) space #71

Merged
merged 49 commits into from
Aug 18, 2023

Conversation

LSchwerdt
Copy link
Contributor

@LSchwerdt LSchwerdt commented Feb 6, 2023

As far as I am aware, there is no stable sorting algorithm in Julia that uses less than O(n) auxiliary space. This PR fixes that, by implementing PagedMergeSort, a merge sort using O(√n) space while achieving about the same speed as the regular MergeSort from Base.Sort.

This is done by using a merge routine that splits the array into blocks/pages of size √n, merges into these blocks, and than rearranges them using a page table. This is illustrated below. The auxiliary space is on the right.

pagedMerge_130_130

The basic idea is laid out here, but I use a different page table and reordering scheme than the author of that post. By using the inverse permutation in the page table (ie. storing the location of the data belonging in block i in the page table at index i), we can follow the permutation cycles during reordering without swapping, copying the correct data into the previously emptied location.

The additional data movement from reordering is almost negligible compared to the merging, and the main merging loop has only one condition (merge until block is full) as opposed to two (merge until either source array runs out), so the performance is about the same as the regular merge sort.

At deeper recursion levels, where the scratch space is big enough, normal merging is used, where one input is copied into the scratch space. When the scratch space is large enough to hold the complete subarray, the input is merged interleaved from both sides, which increases performance for random data, making PagedMergeSort actually faster than MergeSort. (But it decreases performamce for presorted data). Benchmarks are below. They can be replicated by running this Pluto notebook.

If someone needs a sorting algorithm with less than O(n) memory usage, sorting is probably a significant aspect of their whole program, so I also included a multithreaded version of PagedMergeSort.

Checklist before merging

  • Wait for Add comment to broken test and fix it by using Base.Sort.InitialOptimizations #70 to be merged, and include initial optimizations. This is necessary to pass the test for sorting floats with NaNs.
  • This PR adds a dependency on StaticArrays.jl. Maybe that can be avoided. If not, version compatibility must be checked.
  • PagedMerge! uses Base.midpoint, which was added in version 1.4 in Base.Sort and later moved to Base. To ensure compatibility with older Julia version, this should be accounted for. -> Redefined function here for compatibiliy with older Julia versions.
  • Threads.@Spawn was introduced after Julia 1.0. Since this package aims to be compatible with Julia 1.0, I see two options:
    • Create a second package for multithreaded sorting algorithms with different Julia version requirements.
    • Use the singlethreaded or multithreaded algorithm depending on the Julia version, when calling the multithreaded one. (implemented currently in this PR)
  • I am not 100% happy with using a macro (getNextBlock!) instead of a function, but I could not come up with a better way.
  • Rename buffer/buf to scratch or to t? -> renamed to scratch
  • Improve tests to find errors that were made here
    • Stable sorting with by=Returns(0)
    • Odd input length
  • CI only tests the fallback to PagedMergeSort for ThreadedPagedMergeSort. -> deleted ThreadedPagedMergeSort for now
  • Rebase on current master.

Some notes

  • The basic idea of the algorithm seems not that complicated. Maybe it was already published under a different name? Edit: I found it.
  • There are stable sorting algorithms that are in-place, but this is easier. And I think allocating 1 MiB to sort 8 GiB is practically in-place for most applications.
  • Moving the blocks before merging actually allows for even less data movement, but the required bookeeping is more complicated.
  • The two-ended merge can also be applied at all recursion levels to get an even faster merge sort algorithm, that uses auxilary memory the size of the input.
  • Splitting the processed data into blocks and rearranging them later can also be used for distributive sorting, see IPS4o.

Benchmarks

TLDR: PagedMergeSort is slightly faster than MergeSort, but slower than the default radix sort on random data. It allocates less than 400 KiB to sort a 1 GiB vector. Both MergeSort and PagedMergeSort are faster for sortperm! than the default ScratchQuicksort when sorting large vectors. The speedup depends on memory latency. ThreadedPagedMergeSort is the fastest algorithm for all tested inputs and machines (but it is the only multithreaded one).

Julia Version 1.9.0-beta3
Commit 24204a7344 (2023-01-18 07:20 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 8 on 16 virtual cores
Environment:
  JULIA_DEPOT_PATH = C:\Users\schwerdt\.julia;D:\Sonstiges\JuliaTest\julia-1.9.0-beta3\local\share\julia;D:\Sonstiges\JuliaTest\julia-1.9.0-beta3\share\julia
  JULIA_IMAGE_THREADS = 1
  JULIA_LOAD_PATH = @;@v#.#;@stdlib
  JULIA_REVISE_WORKER_ONLY = 1

sort! T=UInt64 n=131_072 (1.0 MiB)
default:                      0.003205 seconds (3 allocations: 1.008 MiB)
MergeSort:                    0.009005 seconds (2 allocations: 512.047 KiB)
PagedMergeSort:               0.006894 seconds (2 allocations: 11.625 KiB)
PagedMergeSort, 8 threads:    0.002509 seconds (116 allocations: 103.766 KiB)

sortperm! T=UInt64 n=131_072 (1.0 MiB)
default:                      0.006763 seconds (2 allocations: 1.000 MiB)
MergeSort:                    0.010892 seconds (2 allocations: 512.047 KiB)
PagedMergeSort:               0.010665 seconds (2 allocations: 11.625 KiB)
PagedMergeSort, 8 threads:    0.002941 seconds (117 allocations: 104.031 KiB)

sort! T=UInt64 n=8_388_608 (64.0 MiB)
default:                      0.282875 seconds (3 allocations: 64.008 MiB, 1.28% gc time)
MergeSort:                    0.808292 seconds (2 allocations: 32.000 MiB)
PagedMergeSort:               0.631584 seconds (4 allocations: 90.656 KiB)
PagedMergeSort, 8 threads:    0.148540 seconds (209 allocations: 745.922 KiB)

sortperm! T=UInt64 n=8_388_608 (64.0 MiB)
default:                      4.226174 seconds (2 allocations: 64.000 MiB, 0.01% gc time)
MergeSort:                    1.399869 seconds (2 allocations: 32.000 MiB)
PagedMergeSort:               1.378963 seconds (4 allocations: 90.656 KiB)
PagedMergeSort, 8 threads:    0.453981 seconds (210 allocations: 746.438 KiB)

sort! T=UInt64 n=134_217_728 (1.0 GiB)
default:                      4.341338 seconds (3 allocations: 1.000 GiB, 0.07% gc time)
MergeSort:                   15.697732 seconds (2 allocations: 512.000 MiB, 0.01% gc time)
PagedMergeSort:              12.191662 seconds (4 allocations: 362.219 KiB)
PagedMergeSort, 8 threads:    2.692278 seconds (210 allocations: 2.850 MiB)

sortperm! T=UInt64 n=134_217_728 (1.0 GiB)
default:                    158.911205 seconds (2 allocations: 1.000 GiB, 0.00% gc time)
MergeSort:                   33.823217 seconds (2 allocations: 512.000 MiB, 0.27% gc time)
PagedMergeSort:              34.031642 seconds (4 allocations: 362.219 KiB)
PagedMergeSort, 8 threads:   10.372993 seconds (210 allocations: 2.851 MiB)
Julia Version 1.9.0-beta3
Commit 24204a7344 (2023-01-18 07:20 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_DEPOT_PATH = C:\Users\Luke\.julia;C:\Users\Luke\.julia\juliaup\julia-1.9.0-beta3+0.x64.w64.mingw32\local\share\julia;C:\Users\Luke\.julia\juliaup\julia-1.9.0-beta3+0.x64.w64.mingw32\share\julia
  JULIA_IMAGE_THREADS = 1
  JULIA_LOAD_PATH = @;@v#.#;@stdlib
  JULIA_NUM_THREADS = auto
  JULIA_REVISE_WORKER_ONLY = 1

sort! T=UInt64 n=131_072 (1.0 MiB)
default:                      0.010040 seconds (3 allocations: 1.008 MiB)
MergeSort:                    0.015199 seconds (2 allocations: 512.047 KiB)
PagedMergeSort:               0.010011 seconds (2 allocations: 11.625 KiB)
PagedMergeSort, 8 threads:    0.004095 seconds (115 allocations: 103.734 KiB)

sortperm! T=UInt64 n=131_072 (1.0 MiB)
default:                      0.011959 seconds (2 allocations: 1.000 MiB)
MergeSort:                    0.017257 seconds (2 allocations: 512.047 KiB)
PagedMergeSort:               0.018372 seconds (2 allocations: 11.625 KiB)
PagedMergeSort, 8 threads:    0.007372 seconds (117 allocations: 104.031 KiB)

sort! T=UInt64 n=8_388_608 (64.0 MiB)
default:                      0.668358 seconds (3 allocations: 64.008 MiB, 0.57% gc time)
MergeSort:                    0.997630 seconds (2 allocations: 32.000 MiB)
PagedMergeSort:               0.845741 seconds (4 allocations: 90.656 KiB)
PagedMergeSort, 8 threads:    0.223406 seconds (212 allocations: 746.016 KiB)

sortperm! T=UInt64 n=8_388_608 (64.0 MiB)
default:                      5.376910 seconds (2 allocations: 64.000 MiB, 0.01% gc time)
MergeSort:                    1.776138 seconds (2 allocations: 32.000 MiB)
PagedMergeSort:               1.772253 seconds (4 allocations: 90.656 KiB)
PagedMergeSort, 8 threads:    0.673832 seconds (208 allocations: 746.375 KiB)

sort! T=UInt64 n=134_217_728 (1.0 GiB)
default:                      7.397737 seconds (3 allocations: 1.000 GiB, 0.05% gc time)
MergeSort:                   19.119953 seconds (2 allocations: 512.000 MiB, 0.67% gc time)
PagedMergeSort:              15.683231 seconds (4 allocations: 362.219 KiB)
PagedMergeSort, 8 threads:    4.070665 seconds (212 allocations: 2.850 MiB)

sortperm! T=UInt64 n=134_217_728 (1.0 GiB)
default:                    176.440430 seconds (2 allocations: 1.000 GiB, 0.07% gc time)
MergeSort:                   41.310611 seconds (2 allocations: 512.000 MiB, 0.59% gc time)
PagedMergeSort:              40.876463 seconds (4 allocations: 362.219 KiB)
PagedMergeSort, 8 threads:   16.644219 seconds (214 allocations: 2.851 MiB)
Julia Version 1.9.0-beta3
Commit 24204a7344 (2023-01-18 07:20 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 24 × AMD Ryzen 9 3900X 12-Core Processor            
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 12 on 24 virtual cores
Environment:
  JULIA_DEPOT_PATH = C:\Users\Luke\.julia;C:\Users\Luke\.julia\juliaup\julia-1.9.0-beta3+0.x64.w64.mingw32\local\share\julia;C:\Users\Luke\.julia\juliaup\julia-1.9.0-beta3+0.x64.w64.mingw32\share\julia
  JULIA_IMAGE_THREADS = 1
  JULIA_LOAD_PATH = @;@v#.#;@stdlib
  JULIA_REVISE_WORKER_ONLY = 1

sort! T=UInt64 n=131_072 (1.0 MiB)
default:                      0.002453 seconds (3 allocations: 1.008 MiB)
MergeSort:                    0.007748 seconds (2 allocations: 512.047 KiB)
PagedMergeSort:               0.006455 seconds (2 allocations: 11.625 KiB)
PagedMergeSort, 12 threads:   0.001617 seconds (134 allocations: 150.641 KiB)

sortperm! T=UInt64 n=131_072 (1.0 MiB)
default:                      0.005796 seconds (2 allocations: 1.000 MiB)
MergeSort:                    0.009661 seconds (2 allocations: 512.047 KiB)
PagedMergeSort:               0.009557 seconds (2 allocations: 11.625 KiB)
PagedMergeSort, 12 threads:   0.002576 seconds (130 allocations: 150.750 KiB)

sort! T=UInt64 n=8_388_608 (64.0 MiB)
default:                      0.274773 seconds (3 allocations: 64.008 MiB)
MergeSort:                    0.693271 seconds (2 allocations: 32.000 MiB)
PagedMergeSort:               0.558476 seconds (4 allocations: 90.656 KiB)
PagedMergeSort, 12 threads:   0.106098 seconds (393 allocations: 1.102 MiB)

sortperm! T=UInt64 n=8_388_608 (64.0 MiB)
default:                      1.129249 seconds (2 allocations: 64.000 MiB)
MergeSort:                    1.175415 seconds (2 allocations: 32.000 MiB)
PagedMergeSort:               1.118836 seconds (4 allocations: 90.656 KiB)
PagedMergeSort, 12 threads:   0.369979 seconds (394 allocations: 1.103 MiB)

sort! T=UInt64 n=134_217_728 (1.0 GiB)
default:                      6.363665 seconds (3 allocations: 1.000 GiB, 0.01% gc time)
MergeSort:                   13.181063 seconds (2 allocations: 512.000 MiB)
PagedMergeSort:              10.717851 seconds (4 allocations: 362.219 KiB)
PagedMergeSort, 12 threads:   1.958344 seconds (401 allocations: 4.285 MiB)

sortperm! T=UInt64 n=134_217_728 (1.0 GiB)
default:                     34.260657 seconds (2 allocations: 1.000 GiB, 0.00% gc time)
MergeSort:                   29.027326 seconds (2 allocations: 512.000 MiB)
PagedMergeSort:              28.057972 seconds (4 allocations: 362.219 KiB)
PagedMergeSort, 12 threads:   7.543024 seconds (394 allocations: 4.286 MiB)

Adds the PagedMergeSort algorithm, a merge sort with O(sqrt n) auxiliary space usage.
@codecov-commenter
Copy link

codecov-commenter commented Feb 7, 2023

Codecov Report

Merging #71 (d367ee8) into master (ef22e53) will increase coverage by 1.63%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #71      +/-   ##
==========================================
+ Coverage   89.16%   90.80%   +1.63%     
==========================================
  Files           1        1              
  Lines         360      511     +151     
==========================================
+ Hits          321      464     +143     
- Misses         39       47       +8     
Files Changed Coverage Δ
src/SortingAlgorithms.jl 90.80% <100.00%> (+1.63%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

- unify function signatures
- remove trailing whitespace
- rename variables
- always use space after comma in fuction definitions/calls
- improve comments
@LSchwerdt LSchwerdt mentioned this pull request Feb 11, 2023
10 tasks
@LilithHafner
Copy link
Member

Thank you! This is a valuable contribution! I hope you don't mind some critique :)

It's a very large sorting algorithm so I have yet to read it cover-to-cover. If possible, it would be nice to get similar or better performance with a simpler implementation, but that can be tricky. That said, I have performed some black-box analysis with a couple results:

First, a correctness issue, I ran this script to check stability and found a counterexample:
@testset begin
    for T in (Float64, Int, UInt8)
        for alg in [ThreadedPagedMergeSort, PagedMergeSort]
            for order in [Forward, Reverse, By(identity), By(abs), By(Returns(0)), By(Base.Fix2(÷, 100))]
                fails = Int[]
                for n in vcat(0:30, 40:10:100, 110:50:1000)
                    v = rand(T, n)
                    # @test sort(v; order) == sort(v; alg, order)
                    sort(v; order) == sort(v; alg, order) || push!(fails, n)
                end
                if !isempty(fails)
                    println("$(T) $(alg) $(typeof(order))\n\t$(fails)")
                end
            end
        end
    end
end
julia> issorted(sort(1:100, by=Returns(0), alg=PagedMergeSort))
false

Second, some notes on benchmarking, use BenchmarkTools.@btime rather than @time for more accurate results. @btime executes the test expression repeatedly and takes the minimum to reduce noise. I also prefer to work with benchmarking code that can be pasted into a REPL because that is more widely and quickly reproducible than a Pluto notebook. For example, the following can be includeed or pasted into a REPL

using BenchmarkTools, SortingAlgorithms, Random, Test
versioninfo()
for i in 0:5
    n = 17^i
    println("sort!(rand(Int, $n))")
    v = rand(Int, n)
    print("Default:  "); @btime sort!($v) setup=(rand!($v)) evals=1
    print("Paged MS: "); @btime sort!($v; alg=PagedMergeSort) setup=(rand!($v)) evals=1
    print("Threaded: "); @btime sort!($v; alg=ThreadedPagedMergeSort) setup=(rand!($v)) evals=1
end
On my computer, it produces the following results:
Julia Version 1.9.0-beta4
Commit b75ddb787ff (2023-02-07 21:53 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  CPU: 4 × Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 2 on 2 virtual cores

sort!(rand(Int, 1))
Default:    63.000 ns (0 allocations: 0 bytes)
Paged MS:   64.000 ns (0 allocations: 0 bytes)
Threaded:   63.000 ns (0 allocations: 0 bytes)
sort!(rand(Int, 17))
Default:    248.000 ns (0 allocations: 0 bytes)
Paged MS:   390.000 ns (2 allocations: 256 bytes)
Threaded:   386.000 ns (2 allocations: 256 bytes)
sort!(rand(Int, 289))
Default:    5.214 μs (1 allocation: 2.44 KiB)
Paged MS:   8.231 μs (2 allocations: 704 bytes)
Threaded:   8.453 μs (2 allocations: 704 bytes)
sort!(rand(Int, 4913))
Default:    91.831 μs (3 allocations: 46.67 KiB)
Paged MS:   206.361 μs (2 allocations: 2.38 KiB)
Threaded:   203.543 μs (2 allocations: 2.38 KiB)
sort!(rand(Int, 83521))
Default:    1.757 ms (3 allocations: 660.80 KiB)
Paged MS:   4.627 ms (2 allocations: 9.38 KiB)
Threaded:   3.074 ms (54 allocations: 23.03 KiB)
sort!(rand(Int, 1419857))
Default:    39.456 ms (3 allocations: 10.84 MiB)
Paged MS:   111.730 ms (3 allocations: 37.48 KiB)
Threaded:   69.457 ms (57 allocations: 79.28 KiB)

Adds the PagedMergeSort algorithm, a merge sort with O(sqrt n) auxiliary space usage.
- unify function signatures
- remove trailing whitespace
- rename variables
- always use space after comma in fuction definitions/calls
- improve comments
Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few inline comments from looking over the parts of the code I understand. I'll need to get a better understanding of block sorts before I can give a full review.

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
@LSchwerdt
Copy link
Contributor Author

Thank you for taking the time to review this PR. I am looking forward to the critique :).

First, a correctness issue, I ran this script to check stability and found a counterexample:

That was caused by the initial optimizations. Base sort returned a range, while all algorithms from SortingAlgorithms returned a vector. Fixed now after implementing initial optimizations.

Second, some notes on benchmarking, use BenchmarkTools.@btime rather than @time for more accurate results.

This algorithm is most useful for sorting large amounts of data, where @btime uses only one sample anyways. I only included the smaller input sizes to show the memory usage (allocations). But you are right, @btime would have avoided doubts about the timing accuracy.

I also prefer to work with benchmarking code that can be pasted into a REPL

I found Pluto notebooks that load the correct package from github very convenient when benchmarking on different PCs not setup for package development. Run the notebook -> done. But you are right, if you already checked out the branch for review, a simple script is even more convenient. I have added a benchmark script to PR #72.

it would be nice to get similar or better performance with a simpler implementation

There are some ways to simplify the code, but all are less performant:

  • Use array assignments instead of loops.
  • Use a reshaped view of the input array (blocksize x nBlocks matrix). The last elements that do not fit would need to be handled seperately, though.
  • Dont use twoended_merge!

Feel free to suggest improvements in this regard. I will continue trying to simplify, too. But this is the main reason I did not go for an algorithm with O(1) space. While not especially elegant, this PR is much shorter and concenpually simpler than HolyGrailSort, for example.

@LilithHafner
Copy link
Member

Fixed now after implementing initial optimizations.

I'm still seeing a problem:

julia> PagedMergeSort
Base.Sort.MissingOptimization(
    Base.Sort.BoolOptimization(
        Base.Sort.Small{10}(
            Base.Sort.InsertionSortAlg(),
            Base.Sort.IEEEFloatOptimization(
                SortingAlgorithms.PagedMergeSortAlg()))))

julia> sort(1:100, by=Returns(0), alg=PagedMergeSort)
100-element Vector{Int64}:
   1
   2
   3
   4
   5
   6
   7
  14
  15
  16
   
  98
  99
 100
  83
  84
  85
  86
  87
  88

@LSchwerdt
Copy link
Contributor Author

I'll need to get a better understanding of block sorts before I can give a full review.

This is not a block sort as defined here. That's why I called it PagedMergeSort and not BlockMergeSort, although block merge sort would be a fitting name.

Btw, is there a good way to make the gif in the OP available in the documentation somehow? I think it is very helpful to understand the merge procedure.

LSchwerdt and others added 6 commits February 12, 2023 19:17
Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
@LSchwerdt
Copy link
Contributor Author

LSchwerdt commented Mar 20, 2023

I think that the process of moving the pages into sorted order can be simpler: [...]

If I understand it correctly, what you describe is in essence the current solution. But I will try to improve it to make this obvious when looking at the code. And maybe I'll manage to remove the dependency on StaticArrays, by making the control flow explicit, following your suggestion.

Looping through the three free pages at the end of the subarrays is accomplished using freeBlocksIdx. It starts at 3, and after the 3 pages are moved from the buffer, only the first page in the buffer is used for all the following permutation cycles.

The linear scan is tracked with doneBlockIdx. It is initialized to 1, and while true is exited by doneBlockIdx == nBlocks && return when the linear scan is complete.

Update: Success! The dependency is eliminated and I think the code is more clear now.

and always use "page" instead of "block"
and eliminate dependency on StaticArrays
@LilithHafner
Copy link
Member

Great work! It is much simpler now. I think it can be even simpler, though. I've put some suggestions in a pr to the head branch of this pr because I wanted to make sure that my suggestions worked before recommending them.

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename buffer/buf to scratch or to t?

Base.Sort's "scratch" is an implementation detail and will likely change semantics to use buffers once they are a thing, so it's for the best not to integrate too closely with that system.

However, I still think sharing a common name might be a good idea. IIRC, Base uses t and scratch, the (obsolete) radix sort in SotihngAlgorithsm.jl uses ts.

Right now, PagedMergeSort uses both t and buf which brings the total number of names for this concept up to 3 in this repo and 4 in this repo+base. I recommend using scratch as an abbreviation for "scratch space".

CI only tests the fallback to PagedMergeSort for ThreadedPagedMergeSort.

That's not good. Maybe we should split off PagedMergeSort and try to merge it first and separately, and do ThreadedPagedMergeSort in a second PR.

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
Comment on lines 842 to 855
while_condition1(offset) = (_,_,k) -> k <= offset + pagesize
while a < m-pagesize && b < hi-pagesize
pages = next_page!(pageLocations, pages, pagesize, lo, a)
offset = page_offset(pages.current)
a,b,_ = merge!(while_condition1(offset),v,v,v,o,a,b,offset+1)
end
# merge until either A or B is empty or the last page is reached
k, offset = nothing, nothing
while_condition2(offset) = (a,b,k) -> k <= offset + pagesize && a <= m && b <= hi
while a <= m && b <= hi && pages.currentNumber + 3 < nPages
pages = next_page!(pageLocations, pages, pagesize, lo, a)
offset = page_offset(pages.current)
a,b,k = merge!(while_condition2(offset),v,v,v,o,a,b,offset+1)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These while loops and while_conditions feel strange to me; I wonder if there is a better approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the best way to implement this I could think of. But of course I am open to suggestions for implrovement.

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
@LSchwerdt
Copy link
Contributor Author

Right now, PagedMergeSort uses both t and buf which brings the total number of names for this concept up to 3 in this repo and 4 in this repo+base. I recommend using scratch as an abbreviation for "scratch space".

That's not good. Maybe we should split off PagedMergeSort and try to merge it first and separately, and do ThreadedPagedMergeSort in a second PR.

These are good suggestions imho. I have implemented them in 3f337f0 and d3ea719.

Except for rebasing, my checklist of open questions is now completed.

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@LilithHafner LilithHafner merged commit b668837 into JuliaCollections:master Aug 18, 2023
16 checks passed
@LilithHafner
Copy link
Member

Thank you! This is a nice algorithm to have.

@LSchwerdt
Copy link
Contributor Author

Thank you! This is a nice algorithm to have.

Thank you for your efforts reviewing and massively improving the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants