Skip to content

Latest commit

 

History

History
302 lines (246 loc) · 11.7 KB

2351-is-sorted.md

File metadata and controls

302 lines (246 loc) · 11.7 KB

Summary

Add the methods is_sorted, is_sorted_by and is_sorted_by_key to [T]; add the methods is_sorted and is_sorted_by to Iterator.

Motivation

In quite a few situations, one needs to check whether a sequence of elements is sorted. The most important use cases are probably unit tests and pre-/post-condition checks.

The lack of an is_sorted() function in Rust's standard library has led to countless programmers implementing their own. While it is possible to write a one-liner using iterators (e.g. (0..arr.len() - 1).all(|i| arr[i] <= arr[i + 1])¹), it is still unnecessary mental overhead while writing and reading the code.

In the corresponding issue on the main repository (from which a few comments are referenced) everyone seems to agree on the basic premise: we want such a function.

Having is_sorted() and friends in the standard library would:

  • prevent people from spending time on writing their own,
  • improve readbility of the code by clearly showing the author's intent,
  • and encourage to write more unit tests and/or pre-/post-condition checks.

Another proof of this functions' usefulness is the inclusion in the standard library of many other languages: C++'s std::is_sorted, Go's sort.IsSorted, D's std.algorithm.sorting.is_sorted and others. (Curiously, many (mostly) more high-level programming language – like Ruby, Javascript, Java, Haskell and Python – seem to lack such a function.)

¹ In the initial version of this RFC, this code snippet contained a bug (< instead of <=). This subtle mistake happens very often: in this RFC, in the discussion thread about this RFC, in this StackOverflow answer and in many more places. Thus, avoiding this common bug is another good reason to add is_sorted().

Fast Implementation via SIMD

Lastly, it is possible to implement is_sorted for many common types with SIMD instructions which improves speed significantly. It is unlikely that many programmers will take the time to write SIMD code themselves, thus everyone would benefit if this rather difficult implementation work is done in the standard library.

Guide-level explanation

Possible documentation of the two new methods of Iterator as well as [T]::is_sorted_by_key:

fn is_sorted(self) -> bool
where
    Self::Item: PartialOrd,

Checks if the elements of this iterator are sorted.

That is, for each element a and its following element b, a <= b must hold. If the iterator yields exactly zero or one element, true is returned.

Note that if Self::Item is only PartialOrd, but not Ord, the above definition implies that this function returns false if any two consecutive items are not comparable.

Example

assert!([1, 2, 2, 9].iter().is_sorted());
assert!(![1, 3, 2, 4).iter().is_sorted());
assert!([0].iter().is_sorted());
assert!(std::iter::empty::<i32>().is_sorted());
assert!(![0.0, 1.0, std::f32::NAN].iter().is_sorted());

fn is_sorted_by<F>(self, compare: F) -> bool
where
    F: FnMut(&Self::Item, &Self::Item) -> Option<Ordering>,

Checks if the elements of this iterator are sorted using the given comparator function.

Instead of using PartialOrd::partial_cmp, this function uses the given compare function to determine the ordering of two elements. Apart from that, it's equivalent to is_sorted; see its documentation for more information.


(for [T])

fn is_sorted_by_key<F, K>(&self, f: F) -> bool
where
    F: FnMut(&T) -> K,
    K: PartialOrd,

Checks if the elements of this slice are sorted using the given key extraction function.

Instead of comparing the slice's elements directly, this function compares the keys of the elements, as determined by f. Apart from that, it's equivalent to is_sorted; see its documentation for more information.

Example

assert!(["c", "bb", "aaa"].is_sorted_by_key(|s| s.len()));
assert!(![-2i32, -1, 0, 3].is_sorted_by_key(|n| n.abs()));

The methods [T]::is_sorted and [T]::is_sorted_by will have analogous documentations to the ones shown above.

Reference-level explanation

This RFC proposes to add the following methods to [T] (slices) and Iterator:

impl<T> [T] {
    fn is_sorted(&self) -> bool
    where
        T: PartialOrd,
    { ... }

    fn is_sorted_by<F>(&self, compare: F) -> bool
    where
        F: FnMut(&T, &T) -> Option<Ordering>,
    { ... }

    fn is_sorted_by_key<F, K>(&self, f: F) -> bool
    where
        F: FnMut(&T) -> K,
        K: PartialOrd,
    { ... }
}

trait Iterator {
    fn is_sorted(self) -> bool
    where
        Self::Item: PartialOrd,
    { ... }

    fn is_sorted_by<F>(mut self, compare: F) -> bool
    where
        F: FnMut(&Self::Item, &Self::Item) -> Option<Ordering>,
    { ... }
}

In addition to the changes shown above, the three methods added to [T] should also be added to core::slice::SliceExt as they don't require heap allocations.

To repeat the exact semantics from the prior section: the methods return true if and only if for each element a and its following element b, the condition a <= b holds. For slices/iterators with zero or one element, true is returned. For elements which implement PartialOrd, but not Ord, the function returns false if any two consecutive elements are not comparable (this is an implication of the a <= b condition from above).

A sample implementation can be found here.

Drawbacks

It increases the size of the standard library by a tiny bit.

Rationale and alternatives

Only add the methods to Iterator, but not to [T]

Without is_sorted() defined for slices directly, one can still fairly easily test if a slice is sorted by obtaining an iterator via iter(). So instead of v.is_sorted(), one would need to write v.iter().is_sorted().

This always works for is_sorted() because of the PartialOrd blanket impl which implements PartialOrd for all references to an PartialOrd type. For is_sorted_by it would introduce an additional reference to the closures' arguments (i.e. v.iter().is_sorted_by(|a, b| ...)) where a and b are &&T).

While these two inconveniences are not deal-breakers, being able to call those three methods on slices (and all Deref<Target=[T]> types) directly, could be favourable for many programmers (especially given the popularity of slice-like data structures, like Vec<T>). Additionally, the sort method and friends are defined for slices, thus one might expect the is_sorted() method there, too.

Add the three methods to additional data structures (like LinkedList) as well

Adding these methods to every data structure in the standard library is a lot of duplicate code. Optimally, we would have a trait that represents sequential data structures and would only add is_sorted and friends to said trait. We don't have such a trait as of now; so Iterator is the next best thing. Slices deserve special treatment due to the reasons mentioned above (popularity and sort()).

Iterator::while_sorted, is_sorted_until, sorted_prefix, num_sorted, ...

In the issue on the main repository, concerns about completely consuming the iterator were raised. Some alternatives, such as while_sorted, were suggested. However, consuming the iterator is neither uncommon nor a problem. Methods like count(), max() and many more consume the iterator, too. One comment mentions:

I am a bit skeptical of the equivalent on Iterator just because the return value does not seem actionable -- you aren't going to "sort" the iterator after you find out it is not already sorted. What are some use cases for this in real code that does not involve iterating over a slice?

As mentioned above, Iterator is the next best thing to a trait representing sequential data structures. So to check if a LinkedList, VecDeque or another sequential data structure is sorted, one would simply call collection.iter().is_sorted(). It's likely that this is the main usage for Iterator's is_sorted methods. Additionally, code like if v.is_sorted() { v.sort(); } is not very useful: sort() already runs in O(n) for already sorted arrays.

Suggestions like is_sorted_until are not really useful either: one can easily get a subslice or a part of an iterator (via .take()) and call is_sorted() on that part.

Unresolved questions

Should Iterator::is_sorted_by_key be added as well?

This RFC proposes to add is_sorted_by_key only to [T] but not to Iterator. The latter addition wouldn't be too useful since once could easily achieve the same effect as .is_sorted_by_key(...) by calling .map(...).is_sorted(). It might still be favourable to include said function for consistency and ease of use. The standard library already hosts a number of sorting-related functions all of which come in three flavours: raw, _by and _by_key. By now, programmers could expect there to be an is_sorted_by_key as well.

Add std::cmp::is_sorted instead

As suggested here, one could also add this free function (plus the _by and _by_key versions) to std::cmp:

fn is_sorted<C>(collection: C) -> bool
where
    C: IntoIterator,
    C::Item: Ord,

This can be seen as a better design as it avoids the question about which data structure should get is_sorted methods. However, it might have the disadvantage of being less discoverable and also less convenient (long path or import).

Require Ord instead of only PartialOrd

As proposed in this RFC, is_sorted only requires its elements to be PartialOrd. If two non-comparable elements are encountered, false is returned. This is probably the only useful way to define the function for partially orderable elements.

While it's convenient to call is_sorted() on slices containing only partially orderable elements (like floats), we might want to use the stronger Ord bound:

  • Firstly, for most programmers it's probably not immediately clear how the function is defined for partially ordered elements (the documentation should be sufficient as explanation, though).
  • Secondly, being able to call is_sorted on something will probably make most programmers think, that calling sort on the same thing is possible, too. Having different bounds for is_sorted and sort thus might lead to confusion.
  • Lastly, the is_sorted_by function currently uses a closure which returns Option<Ordering>. This differs from the closure for sort_by and looks a bit more complicated than necessary for most cases.