-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add is_sorted
to the standard library
#2351
Merged
Merged
Changes from 8 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
6ed2eb4
Add first version of RFC "Add `is_sorted`"
LukasKalbertodt 04e6c33
Minor improvements to "add-is-sorted" RFC
LukasKalbertodt 65199fa
Additional fixes and clarifications (RFC "add is_sorted")
LukasKalbertodt 77310a3
Improve formulation (remove "I")
LukasKalbertodt 1a4bfdc
Update the "is_sorted" RFC to only require PartialOrd (instead of Ord)
LukasKalbertodt 22d12d9
Change `is_sorted` RFC to not add `Iterator::is_sorted_by_key`
LukasKalbertodt 85c3b5f
Mention SIMD impl of `is_sorted` in the motivation section
LukasKalbertodt 970ef5c
Fix subtle bug in RFC "Add `is_sorted`"
LukasKalbertodt e35ec18
Add note about bug in initial version of "is_sorted" RFC
LukasKalbertodt 1aa75ab
RFC 2351
Centril File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,296 @@ | ||
- Feature Name: is_sorted | ||
- Start Date: 2018-02-24 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Add the methods `is_sorted`, `is_sorted_by` and `is_sorted_by_key` to `[T]`; | ||
add the methods `is_sorted` and `is_sorted_by` to `Iterator`. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
In quite a few situations, one needs to check whether a sequence of elements | ||
is sorted. The most important use cases are probably **unit tests** and | ||
**pre-/post-condition checks**. | ||
|
||
The lack of an `is_sorted()` function in Rust's standard library has led to | ||
[countless programmers implementing their own](https://github.com/search?l=Rust&q=%22fn+is_sorted%22&type=Code&utf8=%E2%9C%93). | ||
While it is possible to write a one-liner using iterators (e.g. | ||
`(0..arr.len() - 1).all(|i| arr[i] <= arr[i + 1])`), it is still unnecessary | ||
overhead while writing *and* reading the code. | ||
|
||
In [the corresponding issue on the main repository](https://github.com/rust-lang/rust/issues/44370) | ||
(from which a few comments are referenced) everyone seems to agree on the | ||
basic premise: we want such a function. | ||
|
||
Having `is_sorted()` and friends in the standard library would: | ||
- prevent people from spending time on writing their own, | ||
- improve readbility of the code by clearly showing the author's intent, | ||
- and encourage to write more unit tests and/or pre-/post-condition checks. | ||
|
||
Another proof of this functions' usefulness is the inclusion in the | ||
standard library of many other languages: | ||
C++'s [`std::is_sorted`](http://en.cppreference.com/w/cpp/algorithm/is_sorted), | ||
Go's [`sort.IsSorted`](https://golang.org/pkg/sort/#IsSorted), | ||
D's [`std.algorithm.sorting.is_sorted`](https://dlang.org/library/std/algorithm/sorting/is_sorted.html) | ||
and others. (Curiously, many (mostly) more high-level programming language – | ||
like Ruby, Javascript, Java, Haskell and Python – seem to lack such a function.) | ||
|
||
|
||
## Fast Implementation via SIMD | ||
|
||
Lastly, it is possible to implement `is_sorted` for many common types with SIMD | ||
instructions which improves speed significantly. It is unlikely that many | ||
programmers will take the time to write SIMD code themselves, thus everyone | ||
would benefit if this rather difficult implementation work is done in the | ||
standard library. | ||
|
||
|
||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
Possible documentation of the two new methods of `Iterator` as well as | ||
`[T]::is_sorted_by_key`: | ||
|
||
> ```rust | ||
> fn is_sorted(self) -> bool | ||
> where | ||
> Self::Item: PartialOrd, | ||
> ``` | ||
> Checks if the elements of this iterator are sorted. | ||
> | ||
> That is, for each element `a` and its following element `b`, `a <= b` | ||
> must hold. If the iterator yields exactly zero or one element, `true` | ||
> is returned. | ||
> | ||
> Note that if `Self::Item` is only `PartialOrd`, but not `Ord`, the above | ||
> definition implies that this function returns `false` if any two | ||
> consecutive items are not comparable. | ||
> | ||
> ## Example | ||
> | ||
> ```rust | ||
> assert!([1, 2, 2, 9].iter().is_sorted()); | ||
> assert!(![1, 3, 2, 4).iter().is_sorted()); | ||
> assert!([0].iter().is_sorted()); | ||
> assert!(std::iter::empty::<i32>().is_sorted()); | ||
> assert!(![0.0, 1.0, std::f32::NAN].iter().is_sorted()); | ||
> ``` | ||
> --- | ||
> | ||
> ```rust | ||
> fn is_sorted_by<F>(self, compare: F) -> bool | ||
> where | ||
> F: FnMut(&Self::Item, &Self::Item) -> Option<Ordering>, | ||
> ``` | ||
> Checks if the elements of this iterator are sorted using the given | ||
> comparator function. | ||
> | ||
> Instead of using `PartialOrd::partial_cmp`, this function uses the given | ||
> `compare` function to determine the ordering of two elements. Apart from | ||
> that, it's equivalent to `is_sorted`; see its documentation for more | ||
> information. | ||
> | ||
> --- | ||
> | ||
> (*for `[T]`*) | ||
> | ||
> ```rust | ||
> fn is_sorted_by_key<F, K>(&self, f: F) -> bool | ||
> where | ||
> F: FnMut(&T) -> K, | ||
> K: PartialOrd, | ||
> ``` | ||
> Checks if the elements of this slice are sorted using the given | ||
> key extraction function. | ||
> | ||
> Instead of comparing the slice's elements directly, this function | ||
> compares the keys of the elements, as determined by `f`. Apart from | ||
> that, it's equivalent to `is_sorted`; see its documentation for more | ||
> information. | ||
> | ||
> ## Example | ||
> | ||
> ```rust | ||
> assert!(["c", "bb", "aaa"].is_sorted_by_key(|s| s.len())); | ||
> assert!(![-2i32, -1, 0, 3].is_sorted_by_key(|n| n.abs())); | ||
> ``` | ||
|
||
The methods `[T]::is_sorted` and `[T]::is_sorted_by` will have analogous | ||
documentations to the ones shown above. | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
This RFC proposes to add the following methods to `[T]` (slices) and | ||
`Iterator`: | ||
|
||
```rust | ||
impl<T> [T] { | ||
fn is_sorted(&self) -> bool | ||
where | ||
T: PartialOrd, | ||
{ ... } | ||
|
||
fn is_sorted_by<F>(&self, compare: F) -> bool | ||
where | ||
F: FnMut(&T, &T) -> Option<Ordering>, | ||
{ ... } | ||
|
||
fn is_sorted_by_key<F, K>(&self, f: F) -> bool | ||
where | ||
F: FnMut(&T) -> K, | ||
K: PartialOrd, | ||
{ ... } | ||
} | ||
|
||
trait Iterator { | ||
fn is_sorted(self) -> bool | ||
where | ||
Self::Item: PartialOrd, | ||
{ ... } | ||
|
||
fn is_sorted_by<F>(mut self, compare: F) -> bool | ||
where | ||
F: FnMut(&Self::Item, &Self::Item) -> Option<Ordering>, | ||
{ ... } | ||
} | ||
``` | ||
|
||
In addition to the changes shown above, the three methods added to `[T]` should | ||
also be added to `core::slice::SliceExt` as they don't require heap | ||
allocations. | ||
|
||
To repeat the exact semantics from the prior section: the methods return | ||
`true` if and only if for each element `a` and its following element `b`, the | ||
condition `a <= b` holds. For slices/iterators with zero or one element, | ||
`true` is returned. For elements which implement `PartialOrd`, but not `Ord`, | ||
the function returns `false` if any two consecutive elements are not | ||
comparable (this is an implication of the `a <= b` condition from above). | ||
|
||
A sample implementation can be found | ||
[here](https://play.rust-lang.org/?gist=431ff42fe8ba5980fcf9250c8bc4492b&version=stable). | ||
|
||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
It increases the size of the standard library by a tiny bit. | ||
|
||
# Rationale and alternatives | ||
[alternatives]: #alternatives | ||
|
||
### Only add the methods to `Iterator`, but not to `[T]` | ||
Without `is_sorted()` defined for slices directly, one can still fairly easily | ||
test if a slice is sorted by obtaining an iterator via `iter()`. So instead of | ||
`v.is_sorted()`, one would need to write `v.iter().is_sorted()`. | ||
|
||
This always works for `is_sorted()` because of the `PartialOrd` blanket impl | ||
which implements `PartialOrd` for all references to an `PartialOrd` type. For | ||
`is_sorted_by` it would introduce an additional reference to the closures' | ||
arguments (i.e. `v.iter().is_sorted_by(|a, b| ...))` where `a` and `b` are | ||
`&&T`). | ||
|
||
While these two inconveniences are not deal-breakers, being able to call those | ||
three methods on slices (and all `Deref<Target=[T]>` types) directly, could be | ||
favourable for many programmers (especially given the popularity of slice-like | ||
data structures, like `Vec<T>`). Additionally, the `sort` method and friends | ||
are defined for slices, thus one might expect the `is_sorted()` method there, | ||
too. | ||
|
||
|
||
### Add the three methods to additional data structures (like `LinkedList`) as well | ||
Adding these methods to every data structure in the standard libary is a lot of | ||
duplicate code. Optimally, we would have a trait that represents sequential | ||
data structures and would only add `is_sorted` and friends to said trait. We | ||
don't have such a trait as of now; so `Iterator` is the next best thing. Slices | ||
deserve special treatment due to the reasons mentioned above (popularity and | ||
`sort()`). | ||
|
||
|
||
### `Iterator::while_sorted`, `is_sorted_until`, `sorted_prefix`, `num_sorted`, ... | ||
[In the issue on the main repository](https://github.com/rust-lang/rust/issues/44370), | ||
concerns about completely consuming the iterator were raised. Some alternatives, | ||
such as [`while_sorted`](https://github.com/rust-lang/rust/issues/44370#issuecomment-327873139), | ||
were suggested. However, consuming the iterator is neither uncommon nor a | ||
problem. Methods like `count()`, `max()` and many more consume the iterator, | ||
too. [One comment](https://github.com/rust-lang/rust/issues/44370#issuecomment-344516366) mentions: | ||
|
||
> I am a bit skeptical of the equivalent on Iterator just because the return | ||
> value does not seem actionable -- you aren't going to "sort" the iterator | ||
> after you find out it is not already sorted. What are some use cases for this | ||
> in real code that does not involve iterating over a slice? | ||
|
||
As mentioned above, `Iterator` is the next best thing to a trait representing | ||
sequential data structures. So to check if a `LinkedList`, `VecDeque` or | ||
another sequential data structure is sorted, one would simply call | ||
`collection.iter().is_sorted()`. It's likely that this is the main usage for | ||
`Iterator`'s `is_sorted` methods. Additionally, code like | ||
`if v.is_sorted() { v.sort(); }` is not very useful: `sort()` already runs in | ||
O(n) for already sorted arrays. | ||
|
||
Suggestions like `is_sorted_until` are not really useful either: one can easily | ||
get a subslice or a part of an iterator (via `.take()`) and call `is_sorted()` | ||
on that part. | ||
|
||
|
||
# Unresolved questions | ||
[unresolved]: #unresolved-questions | ||
|
||
|
||
### Should `Iterator::is_sorted_by_key` be added as well? | ||
|
||
This RFC proposes to add `is_sorted_by_key` only to `[T]` but not to | ||
`Iterator`. The latter addition wouldn't be too useful since once could easily | ||
achieve the same effect as `.is_sorted_by_key(...)` by calling | ||
`.map(...).is_sorted()`. It might still be favourable to include said function | ||
for consistency and ease of use. The standard library already hosts a number of | ||
sorting-related functions all of which come in three flavours: *raw*, `_by` and | ||
`_by_key`. By now, programmers could expect there to be an `is_sorted_by_key` | ||
as well. | ||
|
||
|
||
### Add `std::cmp::is_sorted` instead | ||
|
||
As suggested [here](https://github.com/rust-lang/rust/issues/44370#issuecomment-345495831), | ||
one could also add this free function (plus the `_by` and `_by_key` versions) | ||
to `std::cmp`: | ||
|
||
```rust | ||
fn is_sorted<C>(collection: C) -> bool | ||
where | ||
C: IntoIterator, | ||
C::Item: Ord, | ||
``` | ||
|
||
This can be seen as a better design as it avoids the question about which data | ||
structure should get `is_sorted` methods. However, it might have the | ||
disadvantage of being less discoverable and also less convenient (long path or | ||
import). | ||
|
||
|
||
### Require `Ord` instead of only `PartialOrd` | ||
|
||
As proposed in this RFC, `is_sorted` only requires its elements to be | ||
`PartialOrd`. If two non-comparable elements are encountered, `false` is | ||
returned. This is probably the only useful way to define the function for | ||
partially orderable elements. | ||
|
||
While it's convenient to call `is_sorted()` on slices containing only | ||
partially orderable elements (like floats), we might want to use the stronger | ||
`Ord` bound: | ||
|
||
- Firstly, for most programmers it's probably not *immediately* clear how the | ||
function is defined for partially ordered elements (the documentation should | ||
be sufficient as explanation, though). | ||
- Secondly, being able to call `is_sorted` on something will probably make | ||
most programmers think, that calling `sort` on the same thing is possible, | ||
too. Having different bounds for `is_sorted` and `sort` thus might lead to | ||
confusion. | ||
- Lastly, the `is_sorted_by` function currently uses a closure which returns | ||
`Option<Ordering>`. This differs from the closure for `sort_by` and looks a | ||
bit more complicated than necessary for most cases. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps if Rust has a built in operator for function composition and made point-free notation easy, but that is not the case, so let's keep the RFC as is.