From 9481be38bae3dcbb61f7ea50d6a6bded86c25730 Mon Sep 17 00:00:00 2001 From: Jeremie Knuesel Date: Mon, 23 Jan 2023 13:17:46 +0100 Subject: [PATCH] Improve documentation of sort-related functions In particular: * document the `order` keyword in `sort!` * list explicitly the required properties of `lt` in `sort!` * clarify the sequence of "by" transformations if both `by` and `order` are given * show default values in the signatures for `searchsorted` and related functions * add `isunordered` to the manual (it's already exported) --- base/operators.jl | 10 +- base/ordering.jl | 8 +- base/sort.jl | 225 ++++++++++++++++++++++++++++---------- doc/src/base/base.md | 1 + doc/src/base/sort.md | 2 +- doc/src/manual/missing.md | 2 +- 6 files changed, 179 insertions(+), 69 deletions(-) diff --git a/base/operators.jl b/base/operators.jl index da55981c5f7f8..b4fbea547238e 100644 --- a/base/operators.jl +++ b/base/operators.jl @@ -154,13 +154,13 @@ Values that are normally unordered, such as `NaN`, are ordered after regular values. [`missing`](@ref) values are ordered last. -This is the default comparison used by [`sort`](@ref). +This is the default comparison used by [`sort!`](@ref). # Implementation Non-numeric types with a total order should implement this function. Numeric types only need to implement it if they have special values such as `NaN`. Types with a partial order should implement [`<`](@ref). -See the documentation on [Alternate orderings](@ref) for how to define alternate +See the documentation on [Alternate Orderings](@ref) for how to define alternate ordering methods that can be used in sorting and related functions. # Examples @@ -328,6 +328,8 @@ New types with a canonical partial order should implement this function for two arguments of the new type. Types with a canonical total order should implement [`isless`](@ref) instead. +See also [`isunordered`](@ref). + # Examples ```jldoctest julia> 'a' < 'b' @@ -1344,7 +1346,7 @@ corresponding position in `collection`. To get a vector indicating whether each in `items` is in `collection`, wrap `collection` in a tuple or a `Ref` like this: `in.(items, Ref(collection))` or `items .∈ Ref(collection)`. -See also: [`∉`](@ref). +See also: [`∉`](@ref), [`insorted`](@ref), [`contains`](@ref), [`occursin`](@ref), [`issubset`](@ref). # Examples ```jldoctest @@ -1382,8 +1384,6 @@ julia> [1, 2] .∈ ([2, 3],) 0 1 ``` - -See also: [`insorted`](@ref), [`contains`](@ref), [`occursin`](@ref), [`issubset`](@ref). """ in diff --git a/base/ordering.jl b/base/ordering.jl index d0c9cb99f9c72..5383745b1dd1f 100644 --- a/base/ordering.jl +++ b/base/ordering.jl @@ -87,8 +87,8 @@ By(by) = By(by, Forward) """ Lt(lt) -`Ordering` which calls `lt(a, b)` to compare elements. `lt` should -obey the same rules as implementations of [`isless`](@ref). +`Ordering` that calls `lt(a, b)` to compare elements. `lt` must +obey the same rules as the `lt` parameter of [`sort!`](@ref). """ struct Lt{T} <: Ordering lt::T @@ -146,8 +146,8 @@ Construct an [`Ordering`](@ref) object from the same arguments used by Elements are first transformed by the function `by` (which may be [`identity`](@ref)) and are then compared according to either the function `lt` or an existing ordering `order`. `lt` should be [`isless`](@ref) or a function -which obeys similar rules. Finally, the resulting order is reversed if -`rev=true`. +that obeys the same rules as the `lt` parameter of [`sort!`](@ref). Finally, +the resulting order is reversed if `rev=true`. Passing an `lt` other than `isless` along with an `order` other than [`Base.Order.Forward`](@ref) or [`Base.Order.Reverse`](@ref) is not permitted, diff --git a/base/sort.jl b/base/sort.jl index 985e0e8f597f3..09acdb4389688 100644 --- a/base/sort.jl +++ b/base/sort.jl @@ -63,8 +63,8 @@ end """ issorted(v, lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) -Test whether a vector is in sorted order. The `lt`, `by` and `rev` keywords modify what -order is considered to be sorted just as they do for [`sort`](@ref). +Test whether a collection is in sorted order. The keywords modify what +order is considered sorted, as described in the [`sort!`](@ref) documentation. # Examples ```jldoctest @@ -79,6 +79,9 @@ false julia> issorted([(1, "b"), (2, "a")], by = x -> x[2], rev=true) true + +julia> issorted([1, 2, -2, 3], by=abs) +true ``` """ issorted(itr; @@ -94,14 +97,17 @@ maybeview(v, k) = view(v, k) maybeview(v, k::Integer) = v[k] """ - partialsort!(v, k; by=, lt=, rev=false) + partialsort!(v, k; by=identity, lt=isless, rev=false) -Partially sort the vector `v` in place, according to the order specified by `by`, `lt` and -`rev` so that the value at index `k` (or range of adjacent values if `k` is a range) occurs +Partially sort the vector `v` in place so that the value at index `k` (or +range of adjacent values if `k` is a range) occurs at the position where it would appear if the array were fully sorted. If `k` is a single index, that value is returned; if `k` is a range, an array of values at those indices is returned. Note that `partialsort!` may not fully sort the input array. +For the keyword arguments, see the documentation of [`sort!`](@ref). + + # Examples ```jldoctest julia> a = [1, 2, 4, 3, 4] @@ -148,9 +154,9 @@ partialsort!(v::AbstractVector, k::Union{Integer,OrdinalRange}; partialsort!(v, k, ord(lt,by,rev,order)) """ - partialsort(v, k, by=, lt=, rev=false) + partialsort(v, k, by=identity, lt=isless, rev=false) -Variant of [`partialsort!`](@ref) which copies `v` before partially sorting it, thereby returning the +Variant of [`partialsort!`](@ref) that copies `v` before partially sorting it, thereby returning the same thing as `partialsort!` but leaving `v` unmodified. """ partialsort(v::AbstractVector, k::Union{Integer,OrdinalRange}; kws...) = @@ -159,7 +165,7 @@ partialsort(v::AbstractVector, k::Union{Integer,OrdinalRange}; kws...) = # reference on sorted binary search: # http://www.tbray.org/ongoing/When/200x/2003/03/22/Binary -# index of the first value of vector a that is greater than or equal to x; +# index of the first value of vector a that is greater than or equivalent to x; # returns lastindex(v)+1 if x is greater than all values in v. function searchsortedfirst(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keytype(v) where T<:Integer hi = hi + T(1) @@ -178,7 +184,7 @@ function searchsortedfirst(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::key return lo end -# index of the last value of vector a that is less than or equal to x; +# index of the last value of vector a that is less than or equivalent to x; # returns firstindex(v)-1 if x is less than all values of v. function searchsortedlast(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keytype(v) where T<:Integer u = T(1) @@ -195,7 +201,7 @@ function searchsortedlast(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keyt return lo end -# returns the range of indices of v equal to x +# returns the range of indices of v equivalent to x # if v does not contain x, returns a 0-length range # indicating the insertion point of x function searchsorted(v::AbstractVector, x, ilo::T, ihi::T, o::Ordering)::UnitRange{keytype(v)} where T<:Integer @@ -288,14 +294,18 @@ for s in [:searchsortedfirst, :searchsortedlast, :searchsorted] end """ - searchsorted(a, x; by=, lt=, rev=false) + searchsorted(v, x; by=identity, lt=isless, rev=false) -Return the range of indices of `a` which compare as equal to `x` (using binary search) -according to the order specified by the `by`, `lt` and `rev` keywords, assuming that `a` -is already sorted in that order. Return an empty range located at the insertion point -if `a` does not contain values equal to `x`. +Return the range of indices in `v` where values are equivalent to `x`, or an +empty range located at the insertion point if `v` does not contain values +equivalent to `x`. The vector `v` must be sorted according to the order defined +by the keywords. Refer to [`sort!`](@ref) for the meaning of the keywords and +the definition of equivalence. -See also: [`insorted`](@ref), [`searchsortedfirst`](@ref), [`sort`](@ref), [`findall`](@ref). +The range is generally found using binary search, but there are optimized +implementations for `v` values that are ranges of real numbers. + +See also: [`searchsortedfirst`](@ref), [`sort!`](@ref), [`insorted`](@ref), [`findall`](@ref). # Examples ```jldoctest @@ -313,17 +323,28 @@ julia> searchsorted([1, 2, 4, 5, 5, 7], 9) # no match, insert at end julia> searchsorted([1, 2, 4, 5, 5, 7], 0) # no match, insert at start 1:0 + +julia> searchsorted([1, 0, 0, 2, 2, 7, 6], 2) # data unsorted but partitioned with respect to 2 +4:5 + +julia> searchsorted([1, -1, -2, 2, -2, 3, -4, 4], 2, by=abs) # sorted by absolute value, -2 equivalent to 2 +3:5 ``` """ searchsorted """ - searchsortedfirst(a, x; by=, lt=, rev=false) + searchsortedfirst(v, x; by=identity, lt=isless, rev=false) + +Return the index of the first value in `v` greater than or equivalent to `x`. +If `x` is greater than all values in `v` the function returns `lastindex(v) + 1`. -Return the index of the first value in `a` greater than or equal to `x`, according to the -specified order. Return `lastindex(a) + 1` if `x` is greater than all values in `a`. -`a` is assumed to be sorted. +The vector `v` must be sorted according to the order defined by the keywords. +`insert!`ing `x` at the returned index will maintain the sorted order. Refer to +[`sort!`](@ref) for the meaning of the keywords and the definition of +"greater than" and equivalence. -`insert!`ing `x` at this index will maintain sorted order. +The index is generally found using binary search, but there are optimized +implementations for `v` values that are ranges of real numbers. See also: [`searchsortedlast`](@ref), [`searchsorted`](@ref), [`findfirst`](@ref). @@ -343,15 +364,27 @@ julia> searchsortedfirst([1, 2, 4, 5, 5, 7], 9) # no match, insert at end julia> searchsortedfirst([1, 2, 4, 5, 5, 7], 0) # no match, insert at start 1 + +julia> searchsortedfirst([1, 0, 0, 2, 2, 7, 6], 2) # data unsorted but partitioned with respect to 2 +4 + +julia> searchsortedfirst([1, -1, -2, 2, -2, 3, -4, 4], 2, by=abs) # sorted by absolute value +3 ``` """ searchsortedfirst """ - searchsortedlast(a, x; by=, lt=, rev=false) + searchsortedlast(v, x; by=identity, lt=isless, rev=false) + +Return the index of the last value in `v` less than or equivalent to `x`. +If `x` is less than all values in `v` the function returns `firstindex(v) - 1`. -Return the index of the last value in `a` less than or equal to `x`, according to the -specified order. Return `firstindex(a) - 1` if `x` is less than all values in `a`. `a` is -assumed to be sorted. +The vector `v` must be sorted according to the order defined by the keywords. +Refer to [`sort!`](@ref) for the meaning of the keywords and the definition of +"less than" and equivalence. + +The index is generally found using binary search, but there are optimized +implementations for `v` values that are ranges of real numbers. # Examples ```jldoctest @@ -369,16 +402,25 @@ julia> searchsortedlast([1, 2, 4, 5, 5, 7], 9) # no match, insert at end julia> searchsortedlast([1, 2, 4, 5, 5, 7], 0) # no match, insert at start 0 + +julia> searchsortedlast([1, 0, 0, 2, 2, 7, 6], 2) # data unsorted but partitioned with respect to 2 +5 + +julia> searchsortedlast([1, -1, -2, 2, -2, 3, -4, 4], 2, by=abs) # sorted by absolute value +5 ``` """ searchsortedlast """ - insorted(x, a; by=, lt=, rev=false) -> Bool + insorted(x, v; by=identity, lt=isless, rev=false) -> Bool + +Determine whether a vector `v` contains any value equivalent to `x`. +The vector `v` must be sorted according to the order defined by the keywords. +Refer to [`sort!`](@ref) for the meaning of the keywords and the definition of +equivalence. -Determine whether an item `x` is in the sorted collection `a`, in the sense that -it is [`==`](@ref) to one of the values of the collection according to the order -specified by the `by`, `lt` and `rev` keywords, assuming that `a` is already -sorted in that order, see [`sort`](@ref) for the keywords. +The check is generally done using binary search, but there are optimized +implementations for `v` values that are ranges of real numbers. See also [`in`](@ref). @@ -398,6 +440,12 @@ false julia> insorted(0, [1, 2, 4, 5, 5, 7]) # no match false + +julia> insorted(2, [1, 0, 2, 2, 7, 6]) # data unsorted but partitioned with respect to 2 +true + +julia> insorted(2, [1, -1, -2, 3, -4, 4], by=abs) # sorted by absolute value +true ``` !!! compat "Julia 1.6" @@ -524,7 +572,7 @@ Base.size(v::WithoutMissingVector) = size(v.data) send_to_end!(f::Function, v::AbstractVector; [lo, hi]) Send every element of `v` for which `f` returns `true` to the end of the vector and return -the index of the last element which for which `f` returns `false`. +the index of the last element for which `f` returns `false`. `send_to_end!(f, v, lo, hi)` is equivalent to `send_to_end!(f, view(v, lo:hi))+lo-1` @@ -724,8 +772,8 @@ Insertion sort traverses the collection one element at a time, inserting each element into its correct, sorted position in the output vector. Characteristics: -* *stable*: preserves the ordering of elements which compare equal -(e.g. "a" and "A" in a sort of letters which ignores case). +* *stable*: preserves the ordering of elements that compare equal +(e.g. "a" and "A" in a sort of letters that ignores case). * *in-place* in memory. * *quadratic performance* in the number of elements to be sorted: it is well-suited to small collections but should not be used for large ones. @@ -965,8 +1013,8 @@ is treated as the first or last index of the input, respectively. `lo` and `hi` may be specified together as an `AbstractUnitRange`. Characteristics: - * *stable*: preserves the ordering of elements which compare equal - (e.g. "a" and "A" in a sort of letters which ignores case). + * *stable*: preserves the ordering of elements that compare equal + (e.g. "a" and "A" in a sort of letters that ignores case). * *not in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`QuickSort`](@ref). * *linear runtime* if `length(lo:hi)` is constant @@ -1242,7 +1290,7 @@ Otherwise, we dispatch to [`InsertionSort`](@ref) for inputs with `length <= 40` perform a presorted check ([`CheckSorted`](@ref)). We check for short inputs before performing the presorted check to avoid the overhead of the -check for small inputs. Because the alternate dispatch is to [`InseritonSort`](@ref) which +check for small inputs. Because the alternate dispatch is to [`InsertionSort`](@ref) which has efficient `O(n)` runtime on presorted inputs, the check is not necessary for small inputs. @@ -1323,15 +1371,52 @@ defalg(v::AbstractArray{Union{}}) = DEFAULT_UNSTABLE # for method disambiguation """ sort!(v; alg::Algorithm=defalg(v), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) -Sort the vector `v` in place. A stable algorithm is used by default. You can select a -specific algorithm to use via the `alg` keyword (see [Sorting Algorithms](@ref) for -available algorithms). The `by` keyword lets you provide a function that will be applied to -each element before comparison; the `lt` keyword allows providing a custom "less than" -function (note that for every `x` and `y`, only one of `lt(x,y)` and `lt(y,x)` can return -`true`); use `rev=true` to reverse the sorting order. These options are independent and can -be used together in all possible combinations: if both `by` and `lt` are specified, the `lt` -function is applied to the result of the `by` function; `rev=true` reverses whatever -ordering specified via the `by` and `lt` keywords. +Sort the vector `v` in place. A stable algorithm is used by default. A specific +algorithm can be selected via the `alg` keyword (see [Sorting Algorithms](@ref) +for available algorithms). + +Elements are first transformed with the function `by` and then compared +according to either the function `lt` or the ordering `order`. Finally, the +resulting order is reversed if `rev=true`. The current implemention applies the +`by` transformation before each comparison rather than once per element. + +Passing an `lt` other than `isless` along with an `order` other than +[`Base.Order.Forward`](@ref) or [`Base.Order.Reverse`](@ref) is not permitted, +otherwise all options are independent and can be used together in all possible +combinations. Note that `order` can also include a "by" transformation, in +which case it is applied after that defined with the `by` keyword. For more +information on `order` values see the documentation on [Alternate +Orderings](@ref). + +Relations between two elements are defined as follows (with "less" and +"greater" exchanged when `rev=true`): + +* `x` is less than `y` if `lt(by(x), by(y))` (or `Base.Order.lt(order, by(x), by(y))`) yields true. +* `x` is greater than `y` if `y` is less than `x`. +* `x` and `y` are equivalent if neither is less than the other ("incomparable" + is sometimes used as a synonym for "equivalent"). + +The result of `sort!` is sorted in the sense that every element is greater than +or equivalent to the previous one. + +The `lt` function must define a strict weak order, that is, it must be + +* irreflexive: `lt(x, x)` always yields `false`, +* asymmetric: if `lt(x, y)` yields `true` then `lt(y, x)` yields `false`, +* transitive: `lt(x, y) && lt(y, z)` implies `lt(x, z)`, +* transitive in equivalence: `!lt(x, y) && !lt(y, x)` and `!lt(y, z) && !lt(z, + y)` together imply `!lt(x, z) && !lt(z, x)`. In words: if `x` and `y` are + equivalent and `y` and `z` are equivalent then `x` and `z` must be + equivalent. + +For example `<` is a valid `lt` function for `Int` values but `≤` is not: it +violates irreflexivity. For `Float64` values even `<` is invalid as it violates +the fourth condition: `1.0` and `NaN` are equivalent and so are `NaN` and `2.0` +but `1.0` and `2.0` are not equivalent. + +See also [`sort`](@ref), [`sortperm`](@ref), [`sortslices`](@ref), +[`partialsort!`](@ref), [`partialsortperm`](@ref), [`issorted`](@ref), +[`searchsorted`](@ref), [`insorted`](@ref), [`Base.Order.ord`](@ref). # Examples ```jldoctest @@ -1358,6 +1443,29 @@ julia> v = [(1, "c"), (3, "a"), (2, "b")]; sort!(v, by = x -> x[2]); v (3, "a") (2, "b") (1, "c") + +julia> sort(0:3, by=x->x-2, order=Base.Order.By(abs)) # same as sort(0:3, by=abs(x->x-2)) +4-element Vector{Int64}: + 2 + 1 + 3 + 0 + +julia> sort([2, NaN, 1, NaN, 3]) # correct sort with default lt=isless +5-element Vector{Float64}: + 1.0 + 2.0 + 3.0 + NaN + NaN + +julia> sort([2, NaN, 1, NaN, 3], lt=<) # wrong sort due to invalid lt +5-element Vector{Float64}: + 2.0 + NaN + 1.0 + NaN + 3.0 ``` """ function sort!(v::AbstractVector{T}; @@ -1398,15 +1506,15 @@ sort(v::AbstractVector; kws...) = sort!(copymutable(v); kws...) ## partialsortperm: the permutation to sort the first k elements of an array ## """ - partialsortperm(v, k; by=, lt=, rev=false) + partialsortperm(v, k; by=ientity, lt=isless, rev=false) Return a partial permutation `I` of the vector `v`, so that `v[I]` returns values of a fully sorted version of `v` at index `k`. If `k` is a range, a vector of indices is returned; if `k` is an integer, a single index is returned. The order is specified using the same -keywords as `sort!`. The permutation is stable, meaning that indices of equal elements -appear in ascending order. +keywords as `sort!`. The permutation is stable: the indices of equal elements +will appear in ascending order. -Note that this function is equivalent to, but more efficient than, calling `sortperm(...)[k]`. +This function is equivalent to, but more efficient than, calling `sortperm(...)[k]`. # Examples ```jldoctest @@ -1432,7 +1540,7 @@ partialsortperm(v::AbstractVector, k::Union{Integer,OrdinalRange}; kwargs...) = partialsortperm!(similar(Vector{eltype(k)}, axes(v,1)), v, k; kwargs...) """ - partialsortperm!(ix, v, k; by=, lt=, rev=false) + partialsortperm!(ix, v, k; by=identity, lt=isless, rev=false) Like [`partialsortperm`](@ref), but accepts a preallocated index vector `ix` the same size as `v`, which is used to store (a permutation of) the indices of `v`. @@ -1498,7 +1606,7 @@ end Return a permutation vector or array `I` that puts `A[I]` in sorted order along the given dimension. If `A` has more than one dimension, then the `dims` keyword argument must be specified. The order is specified using the same keywords as [`sort!`](@ref). The permutation is guaranteed to be stable even -if the sorting algorithm is unstable, meaning that indices of equal elements appear in +if the sorting algorithm is unstable: the indices of equal elements will appear in ascending order. See also [`sortperm!`](@ref), [`partialsortperm`](@ref), [`invperm`](@ref), [`indexin`](@ref). @@ -1732,7 +1840,8 @@ end sort!(A; dims::Integer, alg::Algorithm=defalg(A), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) Sort the multidimensional array `A` along dimension `dims`. -See [`sort!`](@ref) for a description of possible keyword arguments. +See the one-dimensional version of [`sort!`](@ref) for a description of +possible keyword arguments. To sort slices of an array, refer to [`sortslices`](@ref). @@ -1886,8 +1995,8 @@ algorithm. Partial quick sort returns the smallest `k` elements sorted from smal to largest, finding them and sorting them using [`QuickSort`](@ref). Characteristics: - * *not stable*: does not preserve the ordering of elements which - compare equal (e.g. "a" and "A" in a sort of letters which + * *not stable*: does not preserve the ordering of elements that + compare equal (e.g. "a" and "A" in a sort of letters that ignores case). * *in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`MergeSort`](@ref). @@ -1903,8 +2012,8 @@ Indicate that a sorting function should use the quick sort algorithm, which is *not* stable. Characteristics: - * *not stable*: does not preserve the ordering of elements which - compare equal (e.g. "a" and "A" in a sort of letters which + * *not stable*: does not preserve the ordering of elements that + compare equal (e.g. "a" and "A" in a sort of letters that ignores case). * *in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`MergeSort`](@ref). @@ -1922,8 +2031,8 @@ subcollection at each step, until the entire collection has been recombined in sorted form. Characteristics: - * *stable*: preserves the ordering of elements which compare - equal (e.g. "a" and "A" in a sort of letters which ignores + * *stable*: preserves the ordering of elements that compare + equal (e.g. "a" and "A" in a sort of letters that ignores case). * *not in-place* in memory. * *divide-and-conquer* sort strategy. diff --git a/doc/src/base/base.md b/doc/src/base/base.md index 9a00a864907ec..da516f6acb6af 100644 --- a/doc/src/base/base.md +++ b/doc/src/base/base.md @@ -126,6 +126,7 @@ Core.:(===) Core.isa Base.isequal Base.isless +Base.isunordered Base.ifelse Core.typeassert Core.typeof diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index e93d9716b1487..64a832a6599f7 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -203,7 +203,7 @@ Base.Sort.defalg(::AbstractArray{<:Union{SmallInlineStrings, Missing}}) = Inline The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed to be stable since Julia 1.9. Previous versions had unstable edge cases when sorting numeric arrays. -## Alternate orderings +## Alternate Orderings By default, `sort` and related functions use [`isless`](@ref) to compare two elements in order to determine which should come first. The diff --git a/doc/src/manual/missing.md b/doc/src/manual/missing.md index 9bddcdfbb2ac2..8c8e801ccac9a 100644 --- a/doc/src/manual/missing.md +++ b/doc/src/manual/missing.md @@ -88,7 +88,7 @@ true ``` The [`isless`](@ref) operator is another exception: `missing` is considered -as greater than any other value. This operator is used by [`sort`](@ref), +as greater than any other value. This operator is used by [`sort!`](@ref), which therefore places `missing` values after all other values: ```jldoctest