From 73cdf437350caa9d3a65c64d50e85e86ce8500c4 Mon Sep 17 00:00:00 2001 From: Lilith Hafner Date: Fri, 20 Jan 2023 11:36:12 -0600 Subject: [PATCH 1/9] revise sort.md and docstrings in sort.jl, take 1 --- base/sort.jl | 30 ++++++++++-- doc/src/base/sort.md | 109 ++++++++++++++----------------------------- 2 files changed, 61 insertions(+), 78 deletions(-) diff --git a/base/sort.jl b/base/sort.jl index 985e0e8f597f3..c6ffae174f0c6 100644 --- a/base/sort.jl +++ b/base/sort.jl @@ -1881,9 +1881,9 @@ struct MergeSortAlg <: Algorithm end """ PartialQuickSort{T <: Union{Integer,OrdinalRange}} -Indicate that a sorting function should use the partial quick sort -algorithm. Partial quick sort returns the smallest `k` elements sorted from smallest -to largest, finding them and sorting them using [`QuickSort`](@ref). +Indicate that a sorting function should use the partial quick sort algorithm. +Partial quick sort is like quick sort, but is only required to find and sort the +elements that would end up in `v[k]` were `v` fully sorted. Characteristics: * *not stable*: does not preserve the ordering of elements which @@ -1891,6 +1891,27 @@ Characteristics: ignores case). * *in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`MergeSort`](@ref). + +Note that `PartialQuickSort(k)` does not necessarily sort the whole array. For example, + +```jldoctest +julia> x = rand(100); + +julia> k = 50:100; + +julia> s1 = sort(x; alg=QuickSort); + +julia> s2 = sort(x; alg=PartialQuickSort(k)); + +julia> map(issorted, (s1, s2)) +(true, false) + +julia> map(x->issorted(x[k]), (s1, s2)) +(true, true) + +julia> s1[k] == s2[k] +true +``` """ struct PartialQuickSort{T <: Union{Integer,OrdinalRange}} <: Algorithm k::T @@ -1925,7 +1946,8 @@ Characteristics: * *stable*: preserves the ordering of elements which compare equal (e.g. "a" and "A" in a sort of letters which ignores case). - * *not in-place* in memory. + * *not in-place* in memory — requires a temporary + array of half the size of the input array. * *divide-and-conquer* sort strategy. """ const MergeSort = MergeSortAlg() diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index e93d9716b1487..6253f96476d4d 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -1,7 +1,7 @@ # Sorting and Related Functions -Julia has an extensive, flexible API for sorting and interacting with already-sorted arrays of -values. By default, Julia picks reasonable algorithms and sorts in standard ascending order: +Julia has an extensive, flexible API for sorting and interacting with already-sorted arrays +of values. By default, Julia picks reasonable algorithms and sorts in ascending order: ```jldoctest julia> sort([2,3,1]) @@ -11,7 +11,7 @@ julia> sort([2,3,1]) 3 ``` -You can easily sort in reverse order as well: +You can sort in reverse order as well: ```jldoctest julia> sort([2,3,1], rev=true) @@ -21,7 +21,8 @@ julia> sort([2,3,1], rev=true) 1 ``` -To sort an array in-place, use the "bang" version of the sort function: +`sort` constructs a sorted copy leaving its input unchanged. Use the "bang" version of +the sort function to mutate an existing array: ```jldoctest julia> a = [2,3,1]; @@ -35,8 +36,8 @@ julia> a 3 ``` -Instead of directly sorting an array, you can compute a permutation of the array's indices that -puts the array into sorted order: +Instead of directly sorting an array, you can compute a permutation of the array's +indices that puts the array into sorted order: ```julia-repl julia> v = randn(5) @@ -64,7 +65,7 @@ julia> v[p] 0.382396 ``` -Arrays can easily be sorted according to an arbitrary transformation of their values: +Arrays can be sorted according to an arbitrary transformation of their values: ```julia-repl julia> sort(v, by=abs) @@ -100,9 +101,12 @@ julia> sort(v, alg=InsertionSort) 0.382396 ``` -All the sorting and order related functions rely on a "less than" relation defining a total order +All the sorting and order related functions rely on a "less than" relation defining a +[strict partial order](https://en.wikipedia.org/wiki/Partially_ordered_set#Strict_partial_order) on the values to be manipulated. The `isless` function is invoked by default, but the relation -can be specified via the `lt` keyword. +can be specified via the `lt` keyword, a function that takes two array elements and returns true +if and only if the first argument is "less than" the second. See [Alternate orderings](@ref) for +more info. ## Sorting Functions @@ -134,65 +138,23 @@ Base.Sort.partialsortperm! ## Sorting Algorithms -There are currently four sorting algorithms available in base Julia: +There are currently four sorting algorithms publicly available in base Julia: * [`InsertionSort`](@ref) * [`QuickSort`](@ref) * [`PartialQuickSort(k)`](@ref) * [`MergeSort`](@ref) -`InsertionSort` is an O(n²) stable sorting algorithm. It is efficient for very small `n`, -and is used internally by `QuickSort`. +By default, the `sort` family of functions uses stable sorting algorithms that are fast +on most inputs. The exact algorithm choice is an implementation detail to allow for +future performance improvements. Currently, a hybrid of `RadixSort`, `ScratchQuickSort`, +`InsertionSort`, and `CountingSort` is used based on input type, size, and composition. +Implementation details are subject to change but currently availible in the extended help +of `??Base.DEFAULT_STABLE` and the docstrings of internal sorting algorithms listed there. -`QuickSort` is a very fast sorting algorithm with an average-case time complexity of -O(n log n). `QuickSort` is stable, i.e., elements considered equal will remain in the same -order. Notice that O(n²) is worst-case complexity, but it gets vanishingly unlikely as the -pivot selection is randomized. - -`PartialQuickSort(k::OrdinalRange)` is similar to `QuickSort`, but the output array is only -sorted in the range of `k`. For example: - -```jldoctest -julia> x = rand(1:500, 100); - -julia> k = 50:100; - -julia> s1 = sort(x; alg=QuickSort); - -julia> s2 = sort(x; alg=PartialQuickSort(k)); - -julia> map(issorted, (s1, s2)) -(true, false) - -julia> map(x->issorted(x[k]), (s1, s2)) -(true, true) - -julia> s1[k] == s2[k] -true -``` - -!!! compat "Julia 1.9" - The `QuickSort` and `PartialQuickSort` algorithms are stable since Julia 1.9. - -`MergeSort` is an O(n log n) stable sorting algorithm but is not in-place – it requires a temporary -array of half the size of the input array – and is typically not quite as fast as `QuickSort`. -It is the default algorithm for non-numeric data. - -The default sorting algorithms are chosen on the basis that they are fast and stable. -Usually, `QuickSort` is selected, but `InsertionSort` is preferred for small data. -You can also explicitly specify your preferred algorithm, e.g. -`sort!(v, alg=PartialQuickSort(10:20))`. - -The mechanism by which Julia picks default sorting algorithms is implemented via the -`Base.Sort.defalg` function. It allows a particular algorithm to be registered as the -default in all sorting functions for specific arrays. For example, here is the default -method from [`sort.jl`](https://github.com/JuliaLang/julia/blob/master/base/sort.jl): - -```julia -defalg(v::AbstractArray) = DEFAULT_STABLE -``` - -You may change the default behavior for specific types by defining new methods for `defalg`. +You can explicitly specify your preferred algorithm with the `alg` keyword +(e.g. `sort!(v, alg=PartialQuickSort(10:20))`) or reconfigure the default sorting algorithm +for a custom types by adding a specialized method to the `Base.Sort.defalg` function. For example, [InlineStrings.jl](https://github.com/JuliaStrings/InlineStrings.jl/blob/v1.3.2/src/InlineStrings.jl#L903) defines the following method: ```julia @@ -200,22 +162,21 @@ Base.Sort.defalg(::AbstractArray{<:Union{SmallInlineStrings, Missing}}) = Inline ``` !!! compat "Julia 1.9" - The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed - to be stable since Julia 1.9. Previous versions had unstable edge cases when sorting numeric arrays. + The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed to be stable + since Julia 1.9. Previous versions had unstable edge cases when sorting numeric arrays. ## Alternate orderings -By default, `sort` and related functions use [`isless`](@ref) to compare two -elements in order to determine which should come first. The -[`Base.Order.Ordering`](@ref) abstract type provides a mechanism for defining -alternate orderings on the same set of elements. Instances of `Ordering` define -a [total order](https://en.wikipedia.org/wiki/Total_order) on a set of elements, -so that for any elements `a`, `b`, `c` the following hold: - -* Exactly one of the following is true: `a` is less than `b`, `b` is less than - `a`, or `a` and `b` are equal (according to [`isequal`](@ref)). -* The relation is transitive - if `a` is less than `b` and `b` is less than `c` - then `a` is less than `c`. +By default, `sort`, `searchsorted`, and related functions use [`isless`](@ref) to compare +two elements in order to determine which should come first. The +[`Base.Order.Ordering`](@ref) abstract type provides a mechanism for defining alternate +orderings on the same set of elements. Instances of `Ordering` define a +[strict partial order](https://en.wikipedia.org/wiki/Partially_ordered_set#Strict_partial_order). +To be a strict partial order, for any elements `a`, `b`, `c` the following hold: + +* if `a == b`, then `lt(a, b) == false`; +* `lt(a, b) && lt(b, a) == false`; and +* if `lt(a, b) && lt(b, c) == true`, then `lt(a, c) == true` The [`Base.Order.lt`](@ref) function works as a generalization of `isless` to test whether `a` is less than `b` according to a given order. From 7c9cbb14b8def58f32fe4cb154cceeff02cec731 Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Mon, 23 Jan 2023 13:00:30 -0600 Subject: [PATCH 2/9] Change "partial" to "weak" Thanks @knuesel --- doc/src/base/sort.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index 6253f96476d4d..196981e7feff1 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -102,7 +102,7 @@ julia> sort(v, alg=InsertionSort) ``` All the sorting and order related functions rely on a "less than" relation defining a -[strict partial order](https://en.wikipedia.org/wiki/Partially_ordered_set#Strict_partial_order) +[strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings) on the values to be manipulated. The `isless` function is invoked by default, but the relation can be specified via the `lt` keyword, a function that takes two array elements and returns true if and only if the first argument is "less than" the second. See [Alternate orderings](@ref) for @@ -171,12 +171,12 @@ By default, `sort`, `searchsorted`, and related functions use [`isless`](@ref) t two elements in order to determine which should come first. The [`Base.Order.Ordering`](@ref) abstract type provides a mechanism for defining alternate orderings on the same set of elements. Instances of `Ordering` define a -[strict partial order](https://en.wikipedia.org/wiki/Partially_ordered_set#Strict_partial_order). -To be a strict partial order, for any elements `a`, `b`, `c` the following hold: +[strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings). +To be a strict weak order, for any elements `a`, `b`, `c` the following hold: -* if `a == b`, then `lt(a, b) == false`; -* `lt(a, b) && lt(b, a) == false`; and -* if `lt(a, b) && lt(b, c) == true`, then `lt(a, c) == true` +* `lt(a, b) && lt(b, a) === false`; +* if `lt(a, b) && lt(b, c)`, then `lt(a, c)`; and +* if `!lt(a, b) && !lt(b, c)`, then `!lt(a, c)` The [`Base.Order.lt`](@ref) function works as a generalization of `isless` to test whether `a` is less than `b` according to a given order. From bdf436eb095b1eacca98d974e149d0970671834f Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Fri, 7 Jul 2023 18:09:15 -0500 Subject: [PATCH 3/9] Apply suggestions from code review Co-authored-by: Jeremie Knuesel --- doc/src/base/sort.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index e4f8e37b67536..89444dc33f1f9 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -104,8 +104,8 @@ julia> sort(v, alg=InsertionSort) All the sorting and order related functions rely on a "less than" relation defining a [strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings) on the values to be manipulated. The `isless` function is invoked by default, but the relation -can be specified via the `lt` keyword, a function that takes two array elements and returns true -if and only if the first argument is "less than" the second. See [Alternate orderings](@ref) for +can be specified via the `lt` keyword, a function that takes two array elements and returns `true` +if and only if the first argument is "less than" the second. See [`sort!`](@ref) and [Alternate orderings](@ref) for more info. ## Sorting Functions @@ -175,16 +175,15 @@ orderings on the same set of elements: when calling a sorting function like `sort!`, an instance of `Ordering` can be provided with the keyword argument `order`. Instances of `Ordering` define a -[strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings). -To be a strict weak order, for any elements `a`, `b`, `c` the following hold: +[strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings) +through the [`Base.Order.lt`](@ref) function, which works as +a generalization of `isless`. +For `lt` to be a strict weak order, for any elements `a`, `b`, `c` the following must hold: * `lt(a, b) && lt(b, a) === false`; * if `lt(a, b) && lt(b, c)`, then `lt(a, c)`; and * if `!lt(a, b) && !lt(b, c)`, then `!lt(a, c)` -The [`Base.Order.lt`](@ref) function works as a generalization of `isless` to -test whether `a` is less than `b` according to a given order. - ```@docs Base.Order.Ordering Base.Order.lt From e052d85ef4db15771f724d02d4ccbd6e72fcd107 Mon Sep 17 00:00:00 2001 From: Lilith Hafner Date: Fri, 7 Jul 2023 18:12:49 -0500 Subject: [PATCH 4/9] Apply some more of @knuesel's suggestions --- base/sort.jl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/base/sort.jl b/base/sort.jl index 31a83b605e69a..0fb010cc047c7 100644 --- a/base/sort.jl +++ b/base/sort.jl @@ -2014,8 +2014,8 @@ struct MergeSortAlg <: Algorithm end PartialQuickSort{T <: Union{Integer,OrdinalRange}} Indicate that a sorting function should use the partial quick sort algorithm. -Partial quick sort is like quick sort, but is only required to find and sort the -elements that would end up in `v[k]` were `v` fully sorted. +`PartialQuickSort(k)` is like `QuickSort`, but is only required to find and +sort the elements that would end up in `v[k]` were `v` fully sorted. Characteristics: * *not stable*: does not preserve the ordering of elements that From db3c42966f5c0cf45aad45679a1576fd98ecd046 Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Fri, 7 Jul 2023 18:16:19 -0500 Subject: [PATCH 5/9] Update base/sort.jl --- base/sort.jl | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/base/sort.jl b/base/sort.jl index 0fb010cc047c7..abf0b9ed07682 100644 --- a/base/sort.jl +++ b/base/sort.jl @@ -2078,8 +2078,7 @@ Characteristics: * *stable*: preserves the ordering of elements that compare equal (e.g. "a" and "A" in a sort of letters that ignores case). - * *not in-place* in memory — requires a temporary - array of half the size of the input array. + * *not in-place* in memory. * *divide-and-conquer* sort strategy. * *good performance* for large collections but typically not quite as fast as [`QuickSort`](@ref). From 85b61f61bd91abd5491e865d8c3f124dda0f676a Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Mon, 10 Jul 2023 11:23:04 -0500 Subject: [PATCH 6/9] Apply suggestions from code review Co-authored-by: Jeremie Knuesel --- doc/src/base/sort.md | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index 89444dc33f1f9..6122a565bebb5 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -106,7 +106,7 @@ All the sorting and order related functions rely on a "less than" relation defin on the values to be manipulated. The `isless` function is invoked by default, but the relation can be specified via the `lt` keyword, a function that takes two array elements and returns `true` if and only if the first argument is "less than" the second. See [`sort!`](@ref) and [Alternate orderings](@ref) for -more info. +more information. ## Sorting Functions @@ -174,15 +174,11 @@ two elements in order to determine which should come first. The orderings on the same set of elements: when calling a sorting function like `sort!`, an instance of `Ordering` can be provided with the keyword argument `order`. -Instances of `Ordering` define a -[strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings) -through the [`Base.Order.lt`](@ref) function, which works as -a generalization of `isless`. -For `lt` to be a strict weak order, for any elements `a`, `b`, `c` the following must hold: - -* `lt(a, b) && lt(b, a) === false`; -* if `lt(a, b) && lt(b, c)`, then `lt(a, c)`; and -* if `!lt(a, b) && !lt(b, c)`, then `!lt(a, c)` +Instances of `Ordering` define an order through the [`Base.Order.lt`](@ref) +function, which works as a generalization of `isless`. +This function must satisfy all the conditions of a +[strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings). +See [`sort!`](@ref) for details and examples of valid and invalid `lt` functions. ```@docs Base.Order.Ordering From 1514ea5872d8764fff4dc4376466250710a55962 Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Mon, 10 Jul 2023 11:27:58 -0500 Subject: [PATCH 7/9] Line width and "Alternate orderings" => "Alternate Orderings" --- doc/src/base/sort.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index 6122a565bebb5..998cb37622ca7 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -103,10 +103,10 @@ julia> sort(v, alg=InsertionSort) All the sorting and order related functions rely on a "less than" relation defining a [strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings) -on the values to be manipulated. The `isless` function is invoked by default, but the relation -can be specified via the `lt` keyword, a function that takes two array elements and returns `true` -if and only if the first argument is "less than" the second. See [`sort!`](@ref) and [Alternate orderings](@ref) for -more information. +on the values to be manipulated. The `isless` function is invoked by default, but the +relation can be specified via the `lt` keyword, a function that takes two array elements +and returns `true` if and only if the first argument is "less than" the second. See +[`sort!`](@ref) and [Alternate Orderings](@ref) for more information. ## Sorting Functions From 6c774a65ca7d1ad65035959b070c30a5ece756fe Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Mon, 10 Jul 2023 11:47:29 -0500 Subject: [PATCH 8/9] Slightly reword alternate orderings In an effort to clarify that it's `Base.Order.lt`'s behavior on custom orders that users need to be worried about, not the function itself --- doc/src/base/sort.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index 998cb37622ca7..bf154f5053221 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -176,7 +176,7 @@ orderings on the same set of elements: when calling a sorting function like Instances of `Ordering` define an order through the [`Base.Order.lt`](@ref) function, which works as a generalization of `isless`. -This function must satisfy all the conditions of a +This function's behavior on custom `Ordering`s must satisfy all the conditions of a [strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings). See [`sort!`](@ref) for details and examples of valid and invalid `lt` functions. From d13db49d7789280f825b62df5f599d9aed493243 Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Mon, 10 Jul 2023 13:44:17 -0500 Subject: [PATCH 9/9] fix whitespace --- doc/src/base/sort.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index bf154f5053221..b9d333ef2a939 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -176,7 +176,7 @@ orderings on the same set of elements: when calling a sorting function like Instances of `Ordering` define an order through the [`Base.Order.lt`](@ref) function, which works as a generalization of `isless`. -This function's behavior on custom `Ordering`s must satisfy all the conditions of a +This function's behavior on custom `Ordering`s must satisfy all the conditions of a [strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings). See [`sort!`](@ref) for details and examples of valid and invalid `lt` functions.