Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add is_sorted to the standard library #2351

Merged
merged 10 commits into from
Aug 19, 2018
Merged

RFC: Add is_sorted to the standard library #2351

merged 10 commits into from
Aug 19, 2018

Conversation

LukasKalbertodt
Copy link
Member

@LukasKalbertodt LukasKalbertodt commented Feb 25, 2018

Add the methods is_sorted, is_sorted_by and is_sorted_by_key to [T] and Iterator.

Rendered
Tracking issue


CC rust-lang/rust#44370

EDIT: I posted a comment summarizing the discussion so far.

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Feb 25, 2018
overhead while writing *and* reading the code.

In [the corresponding issue on the main repository](https://github.com/rust-lang/rust/issues/44370)
(from which I will reference a few comments) everyone seems to agree on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/from which I will reference a few comments/from which a few comments are referenced/

[unresolved]: #unresolved-questions


### Is `Iterator::is_sorted_by_key` useless?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so, precisely for the reason you mention.

We already have quite many methods in Iterator and I think that additions should only be made if they increase readability and helps you express things which are quite tedious and unidiomatic to express otherwise. In this case I think iter.map(extract).is_sorted() is better than iter.is_sorted_by_key(extract).

If there is popular demand for this later we can always add it then, but for now I think it is premature.

C::Item: Ord,
```

This can be seen as a better design as it avoids the question about which data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps if Rust has a built in operator for function composition and made point-free notation easy, but that is not the case, so let's keep the RFC as is.


fn is_sorted_by<F>(mut self, compare: F) -> bool
where
F: FnMut(&Self::Item, &Self::Item) -> Ordering,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for FnMut here as opposed to Fn?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FnMut is less restrictive than Fn in where-clauses

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am aware. But just because you can do something doesn't mean you ought to do something. So I'm wondering if there are any good reasons why you should be able to pass in a stateful closure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. But this would apply to almost all libstd functions that take closures, then. I don't think any method uses an Fn bound when FnMut suffices. This also applies to all sort methods we already have, so consistency is another argument for FnMut.

Copy link
Member Author

@LukasKalbertodt LukasKalbertodt Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's exactly the reason I chose FnMut: it's everywhere already. In particular: [T]::sort_by_key.

Do you think I should mention this in the RFC? Maybe that would be a good idea to prevent future readers from asking the same question about FnMut...

Oh, and just for protocol: I wondered about the usage of FnMuts, too. Exactly for the "should be stateless" reason you, @Centril, mentioned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I knew about the consistency reason beforehand, but I think that we should decide on this with a rationale more than "this is what we've done in other places", unless there's some other place where rationale has been given(?).

Copy link

@ExpHP ExpHP Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Centril why should a function deliberately choose a more restrictive Fn than it needs, unless it wants to open itself to implementation changes that may require those restrictions?

The rules for picking Fn traits are simple:

  • If you only need to call it once, ask for FnOnce.
  • If you call it multiple times but do not require reentrancy, ask for FnMut.
  • If you need to call it concurrently, ask for Fn.

...and I rather doubt the stdlib needs to keep itself open to the possibility of changing this function to run in multiple threads.

By requiring Fn you make it difficult for the user to do things like caching and memoization, forcing people to use RefCell and thereby deferring compile-time borrow checks to runtime. There is no upside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ExpHP great arguments, so FnMut it is.

> ```rust
> fn is_sorted(self) -> bool
> where
> Self::Item: Ord,
Copy link

@ExpHP ExpHP Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Ord instead of PartialOrd? Testing if a <= b <= c <= ... <= z is a perfectly well-defined operation on a partially-ordered set, and to give floating point numbers the shaft yet again even for something as innocent as this just seems spiteful.

(note: I'm anthropomorphizing the std lib here; not calling you spiteful)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, sorry for not properly thinking about that beforehand. So the comparator for the _by versions may also return an Option<Ordering>, right? And how do we want to handle a None then? I might be too sleepy think... but it's not immediately obvious whether [1.0, NaN, 2.0] should be considered ordered or not, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. This is a predicament.

If this existed in a vacuum, then I would say the comparator should produce bool, because an <= operation really is most natural definition for a partial ordering. Option<Ordering> seems to me to be a sort of an allowance to make deriving Ord from PartialOrd possible.

But obviously, this does not exist in a vacuum, and so a bool comparitor could create unnecessary confusion for users. So yes, I suppose we should try Option<Ordering>.


Re: [1.0, NaN, 2.0], I am about to post a much larger comment on this, but in summary, I am of the stance that there is only one sensible implementation of is_sorted on a partially ordered set, and this implementation returns false for a sequence like [1.0, NaN, 2.0].

@ExpHP
Copy link

ExpHP commented Feb 26, 2018

In response to my request for PartialOrd support, @LukasKalbertodt voiced a concern about what it should return for sequences like [2.0, NaN, 1.0]. I want to respond to this in the main comment area where GitHub won't squash it.

Needless to say, there is more than one way that somebody could define is_sorted for T: PartialOrd, and these might produce different results for a sequence like [2.0, NaN, 1.0]. However, I will also argue that there is only one right choice when one considers the wide variety of types that may impl PartialOrd.

The possible implementations of is_sorted

These are the only two reasonable definitions I can think of for is_sorted that take constant memory and linear time. Suppose we have a helper function called Iterator::pairwise that takes an iterator and produces overlapping pairs. That is, an iterator producing a, b, c, ... is adapted to produce (a, b), (b, c), (c, d), .... Then here are the possible definitions of Iterator::is_sorted:

  1. self.pairwise().all(|a, b| a <= b)
  2. self.pairwise().all(|a, b| !(a > b))

Notice that definition 1 implies definition 2 (by antisymmetry), which is to say that definition 1 uses a stricter definition of the term "sorted" than definition 2.

Example impls of PartialOrd

We want to test these definitions against a variety of different possible implementations of PartialOrd which obey the laws of antisymmetry and transitivity. Some interesting examples include:

  • Float comparison, of course, with the pure horror that is NaN.
  • The suffix comparator for strings: a <= b if and only if b.ends_with(a).
    This is interesting because there are pairs of incomparable elements which are otherwise perfectly well-behaved. E.g. "llo" < "hello", while the strings "hello" and "world" cannot be compared.
  • The subset comparator for sets: a <= b if and only if a.is_subset(b)
    This is interesting because there are diamonds in the partial ordering DAG. That is to say, {} < {2} < {2 3}, and {} < {3} < {2 3}, but {2} and `{3} are incomparable.

Test cases

Here is a comparison of the implementations on a variety of test cases:

The definitions agree on the easy cases (i.e. where all elements are pairwise comparable), so I'm going to focus on the trickier cases in this post. The following are all test cases where definition (1) returns false (because, as stated, that's the only time the two definitions may disagree), and it shows the output of definition (2). I hope you will agree the results produced by definition (2) are worthless.

// NOTE:
// - These give the results of 'is_sorted_by_key' for definition 2
//   (the "not greater" definition).

// Test cases for floating point comparison
assert_eq!(by_key_ng(|x| x, vec![nan, nan, nan]), true);
assert_eq!(by_key_ng(|x| x, vec![1.0, nan, 2.0]), true);
assert_eq!(by_key_ng(|x| x, vec![2.0, nan, 1.0]), true);
assert_eq!(by_key_ng(|x| x, vec![2.0, nan, 1.0, 7.0]), true); // [1]
assert_eq!(by_key_ng(|x| x, vec![2.0, nan, 1.0, 0.0]), false);

// Test cases for the subset comparator for sets
assert_eq!(by_key_ng(Subset, vec![set![3], set![2]]), true);
assert_eq!(by_key_ng(Subset, vec![set![2], set![3]]), true);
assert_eq!(by_key_ng(Subset, vec![set![2], set![3], set![2, 3]]), true);
assert_eq!(by_key_ng(Subset, vec![set![2], set![2, 3], set![3]]), false);
assert_eq!(by_key_ng(Subset, vec![set![2], set![2, 3], set![5]]), true);
assert_eq!(by_key_ng(Subset, vec![set![2, 3], set![5], set![2]]), true); // [2]

// Test cases for the suffix comparator for strings
assert_eq!(by_key_ng(Suffix::from, vec!["a", "aa", "aaa", "b", "bb", "bbb"]), true);
assert_eq!(by_key_ng(Suffix::from, vec![ "",  "a",  "aa",  "",  "b",  "bb"]), false);

// [1]: There's an alternate definition of `is_sorted_ng` that could return
//      false for `[2.0, nan, 1.0, 7.0]`, but it has quadratic complexity.
// [2]: This counterexample was added when it almost appeared that `is_sorted_ng`
//      might be useful for a dependency solver like `cargo` to verify that a
//      sequence of packages is topologically ordered. (Hint: It isn't)

Complete code here: https://gist.github.com/ExpHP/c23b51f0a9f5f94f2375c93137299604

@LukasKalbertodt
Copy link
Member Author

@ExpHP So to summarize, you propose to use the stricter version self.pairwise().all(|a, b| a <= b), which in particular means, that is_sorted returns false if any two consecutive elements are not comparable. Right? If so, I agree with you on that.

I'm not quite sure about the "what should the closure return?" issue. I'd love to read more comments on these two questions.

I'll change the RFC later :)

@scottmcm
Copy link
Member

scottmcm commented Feb 26, 2018

Great comment, @ExpHP!

I'd add that I'd expect the following, which also suggests definition 1:

      is_sorted(a) → ∀i,j (i ≤ j → ai ≤ aj ∧ ai < aj → i < j)

(You do obliquely mention this, but I think it's more interesting as a property-but-not-implementation, since transitivity means a linear implementation provides a quadratic guarantee.)

@ExpHP
Copy link

ExpHP commented Feb 26, 2018

So to summarize, you propose to use the stricter version self.pairwise().all(|a, b| a <= b), which in particular means, that is_sorted returns false if any two consecutive elements are not comparable. Right?

Yes, this is my recommendation.

@clarfonthey
Copy link
Contributor

@ExpHP that pairwise function is essentially itertools' coalesce, which IMO is useful enough to add into the standard library.

@ExpHP
Copy link

ExpHP commented Mar 1, 2018

@clarcharr Maybe not coalesce, but rather tuple_windows::<(_, _)>

@eddyb
Copy link
Member

eddyb commented Mar 2, 2018

@ExpHP What about array_windows<[_; 2]>? With const generics you could make 2 itself the parameter, although I'm not sure that will arrive soon enough :(.

@ExpHP
Copy link

ExpHP commented Mar 2, 2018

@eddyb I'm not sure, what makes array_windows better? Just the ability you mentioned to be more general with const generics?

Comparing the two, I feel tuple_windows pulls its weight far better, if only because array_windows is crippled by the many ergonomical issues of arrays. The ability to pattern match against tuples means that things like the following Just Work, even without type annotations:

fn is_sorted<I>(iter: I) -> bool
where I: IntoIterator, I::Item: PartialOrd + Clone,
{
    use itertools::Itertools;
    
    iter.into_iter().tuple_windows().all(|(a, b)| a < b)
}

N.B. my original intent was not to propose pairwise as an alternative (it was just to simplify the definitions in my post). But now that I think about it, maybe it (or tuple_windows or similar) should be considered!

@LukasKalbertodt
Copy link
Member Author

@ExpHP Do you think tuple_windows (or similar) should be considered as an alternative to is_sorted or just as a cool iterator feature, independent of is_sorted?

@ExpHP
Copy link

ExpHP commented Mar 2, 2018

I am currently leaning towards considering them as an alternative. (i.e. for the Alternatives section). These are my thoughts:

  • .tuple_windows()/pairwise() alone solves the hardest part of the problem for the user, by eliminating the need for a for loop. I feel that being able to write pairwise().all(|(a, b)| a < b) is succinct enough that there would be no pressing need for an Iterator::is_sorted.
  • It allows the stdlib to punt on issues like PartialOrd and comparator return types. All use-cases are trivially supported by Iterator::all.

It is perhaps less discoverable than is_sorted, but at the same time, it is much more powerful.

@leonardo-m
Copy link

is_sorted() is sufficiently common and sufficiently clear to deserve a function. If a programmer writes ".all(|(a, b)| a < b)" instead of ".all(|(a, b)| a <= b)" this could lead to troubles.

@LukasKalbertodt
Copy link
Member Author

I agree with @leonardo-m: writing is_sorted yourself is already quite simple. Making it even simpler to implement doesn't quite conflict with the motivation laid out in this RFC. The code snippet .is_sorted() is still easier to read and understand than .pairwise().all(|a, b| a <= b). And as @leonardo-m said, with pairwise() it's still somewhat easy to make a small mistake -- which, interestingly, happened in @ExpHP's last comment (Sorry for bringing this up, I don't want to denounce you or anything °_°).

Apart from that, I proposed to add is_sorted() to slices too. Therefore I think we should discuss pairwise and is_sorted independently from one another.

@eddyb
Copy link
Member

eddyb commented Mar 6, 2018

Fully generic tuple_windows requires variadic generics and doesn't play well itself with generics, while array_windows could use const generics. Today both of them require harcoded impls.
What's really frustrating is that array pattern-matching hasn't been stabilized yet.
Slice pattern-matching has issues, but array pattern-matching should've been stabilized long ago.

@clarfonthey
Copy link
Contributor

I don't think that Iterator should have any by_key functions considering how it already has map.

@ExpHP
Copy link

ExpHP commented Mar 11, 2018

I don't think that Iterator should have any by_key functions considering how it already has map.

Here are all of the "by key" methods in Iterator and Itertools: (i.e. any method that takes a FnMut(Self::Item) -> V where V satisfies a comparison trait)

  • Iterator::min_by_key
  • Iterator::max_by_key
  • Itertools::sorted_by_key (not the same as this RFC; it produces Vec<Self::Item>)
  • Itertools::group_by
  • Itertools::unique_by
  • Itertools::minmax_by_key

None of them could be accomplished with a map followed by a method that places the trait bound on Self::Item.

@scottmcm
Copy link
Member

is_sorted certainly seems a reasonable thing to have, though I think it would be a shame if whatever we did for it didn't make is_strictly_increasing or is_descending easy too.

Hmm, I suppose something like .all_tuples(PartialOrd::le) could work:

    fn all_tuples<F>(&mut self, mut f: F) -> bool
        where Self: Sized, F: FnMut<Self::Item, Output=bool>
    {
        self.all(|x| f.call_mut(x))
    }

@ExpHP
Copy link

ExpHP commented Mar 11, 2018

all_tuples is strictly less powerful than tuple_windows (and IMO not as cleanly orthogonal of a concept), and is subject to the same criticisms. As I (accidentally) demonstrated above, you can easily implement is_strictly_sorted_by by accident when you meant to implement is_sorted_by.

(and while I will admit that I have needed is_strictly_sorted for some things, notably sparse vectors, I believe it is needed much less frequently than is_sorted)

With the specific way you have written it, the function takes multiple args instead of a tuple, which is neat, but also relies on unstable details about the Fn traits (to my understanding, the parenthetical syntax for Fn isn't just sugar; it's the only part that is stable).

@LukasKalbertodt
Copy link
Member Author

LukasKalbertodt commented Mar 11, 2018

@scottmcm Shouldn't it be possible to get the behaviors you mentioned by using is_sorted_by?

  • is_strictly_increasing: is_sorted_by(|a, b| a.cmp(b).then(Ordering::Greater))
  • is_descending: is_sorted_by(|a, b| a.cmp(b).reverse())

@LukasKalbertodt
Copy link
Member Author

LukasKalbertodt commented Apr 12, 2018

I just read an article about implementing is_sorted using SIMD instructions: this is another great argument why we would want this function in the standard library. The standard library could provide several SIMD implementation to speed up the function for common types and users could just benefit from the speed boost. Hardly anyone would write an SIMD implementation themselves, I'd say.

I added a paragraph about this in the RFC.


And since the discussion died down in the last month, a quick summary (although there aren't too many comments):

  • @ExpHP suggested to relax the bound from T: Ord to T: PartialOrd at several places. This was discussed and we concluded that this is sound, thus the RFC was changed and now uses PartialOrd.
  • There is a discussion about whether Iterator::is_sorted_by_key should be added or not (since one can easily use map(...).is_sorted() instead. I don't have a strong preference here, but most people would like to not add this method. Thus I edited the RFC to not add Iterator::is_sorted_by_key.
  • There has been some discussion about additional iterator methods like pairwise() or tuple_windows(), but I think those are not an alternative to is_sorted but rather a pretty orthogonal discussion. Others and I still think is_sorted and friends are useful.
  • While my top comment received two 👎, there doesn't seem to be a comment which doubts the usefulness of the methods in general. If you think those methods shouldn't be added at all, please tell us your reasons :)

So to me it seems like all issues (that were mentioned) were addressed and there are not really major concerns left. So... maybe FCP? ^_^

@gnzlbg
Copy link
Contributor

gnzlbg commented Apr 17, 2018

So I just read the RFC. I wrote a crate that implements Iterator::{is_sorted, is_sorted_by_key, is_sorted_by} (using std::arch) here: https://github.com/gnzlbg/is_sorted

Iterator::{is_sorted, is_sorted_by} require Ord in my implementation, and I agree with the concerns above that how non Ord types should be explicit, in particular for consistency with sort, but haven't looked deeply into whether I agree with PartialOrd as a better default.

Should Iterator::is_sorted_by_key be added as well?

For consistency with sort I'd say yes.

The only other comment I have is that the motivation of this RFC is pretty weak:

In quite a few situations, one needs to check whether a sequence of elements is sorted. The most important use cases are probably unit tests and pre-/post-condition checks.

The most important case is probably implementing a sorting algorithm, since the first thing you want to do there is check whether the elements are sorted, and if the answer is yes, you are done. The check is O(N), so it doesn't change the complexity of the sorting operation. Basically, every user that uses sorting algorithms would directly benefit from having vectorized implementations in the standard library that others can also use in their algorithms.

The vectorized implementations in my library require a long list of unstable features: fn_traits+unboxed_closures+specialization+stdsimd+cfg_target_feature+target_feature, so it is unlikely that users will be able to benefit from these in the near future unless we put them in std.

@LukasKalbertodt
Copy link
Member Author

It might not say so explicitly, but I think the following is taken to be implied:

  • Na: (the "notational" axiom): (a < b) if and only if (b > a)

Seems a bit important to me to be left implicit 😕

Well if we assume that the lt and gt methods have to agree with the partial_cmp method, then my example fails to satisfy this axiom. (By the way: in the case that all methods have to agree, the only reason to implement lt and friends manually is speed. Curiously, the docs seem to imply that lt and friends need to be manually implemented to get the behavior of floats. Seems like a bug in docs.)

This notational axiom also implies irreflexitivity for < and >.

So maybe it is more fitting to think about PartialOrd like this:

PartialOrd defines three disjunct binary relations: <, == and >, meaning that at most one of a < b, a == b and a > b is true. The < and > relations are strict partial orders; == is a partial equivalence relation. And since the == relation comes from PartialEq, it's probably fair to say that PartialOrd represents strictly partially ordered sets.

Additionally, two relations are derived: <= as union(<, ==) and >= as union(>, ==). Interestingly, these two relations aren't necessarily transitive.

No, it fails 1a. (antisymmetry)

Why? X < X is true and X > X is false. Thus we have:

  • a < b implies !(a > b): true implies !false
  • a > b implies !(a < b): false implies !true
    both of which are true. But this isn't important anymore since my example doesn't satisfy (Na).

I still think it should be fairly obvious what is_sorted does given that it's clearly stated in the docs. On the other hand, the bound can be relaxed from Ord to PartialOrd later, so maybe it really shouldn't block this RFC. On the other other hand again, maybe later no one will invest the time to think about this issue and we'll never relax the bound if we don't do it now.

@gnzlbg
Copy link
Contributor

gnzlbg commented Apr 25, 2018

@LukasKalbertodt

I've filled rust-lang/rust#50230 to ask for at least clarification. I agree with you that, at least the way things are currently documented, the requirements (and guarantees) of PartialOrd are unclear.

Please fill free to chime in there with other things that might need to be clarified, like the notational axiom: (a < b) if and only if (b > a) or the fact that le and ge for PartialOrd are default implemented as a < b || a == b.

@LukasKalbertodt
Copy link
Member Author

I'd propose to not let the trait bound decision block merging this RFC.

  • Features proposed in RFCs don't need to be completely figured out before merging the RFC. In fact, often, smaller technical details are far easier to decide on in the time between implementation and stabilization.
  • Pinpointing the exact semantics of PartialOrd in the issue @gnzlbg created might take a while. Discussing this question without a solid foundation is not the most useful thing to do, I think. (Implying that stabilization of is_sorted should be blocked by Document "latent requirements" of PartialOrd  rust#50230)
  • I think the quesiton "do we want the function is_sorted?" does not depend on whether it requires T: Ord or T: PartialOrd.

I also think that we should initially implement T: PartialOrd (as the RFC is currently written). That way we can more easily figure out if this relaxed trait bound can result in strange behavior.

@gnzlbg
Copy link
Contributor

gnzlbg commented May 18, 2018

@LukasKalbertodt I think we can leave that as an unresolved question to resolve before stabilization, but whether it requires Ord or PartialOrd is a pretty important API consideration.

The most conservative thing to do is to require Ord, and those that want to use PartialOrd can just use is_sorted_by(|a,b| a.partial_cmp(b).unwrap()) like it is now the case for sort/sort_by.

The ideal thing would be if we could just require PartialOrd here, but whether that is sound pretty much depends on what do we exactly define PartialOrd to mean.

The lack of an `is_sorted()` function in Rust's standard library has led to
[countless programmers implementing their own](https://github.com/search?l=Rust&q=%22fn+is_sorted%22&type=Code&utf8=%E2%9C%93).
While it is possible to write a one-liner using iterators (e.g.
`(0..arr.len() - 1).all(|i| arr[i] < arr[i + 1])`), it is still unnecessary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a <= (thanks @orlp)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my secret plan: hide a subtle bug in the "motivation" section that no one notices for months. Now that it has been discovered, it's the perfect motivation that is_sorted is certainly needed! Muhaha!

In all seriousness, it happens all the time.

Will fix this bug in the RFC later. It's blocked right now anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping, given FCP :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I'd almost love to see a section in the motivation showcasing just how easy it is to make this mistake!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ExpHP Great idea, done!

@SimonSapin
Copy link
Contributor

The libs team discussed this today and we’d like to accept the RFC as-is: three methods for each of slices and iterators, with PartialOrd, and .pairwise().all(|(a, b)| a <= b) semantics. (With the understanding that these choices can still be revised before stabilization.)

@rfcbot fcp merge

This algorithm is just tricky enough to implement that having it in the standard library seems helpful enough to be worth the added API surface.

Some comments have said that adding pairwise / tuple_windows / array_windows would also be useful, but that should be proposed separately.

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 8, 2018

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Aug 8, 2018
@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 8, 2018

@LukasKalbertodt If we merge LukasKalbertodt/partial-cmp-docs#1 with the proposed tweaks pretty much as is, the blocker here is resolved anyways.

So with those semantics merged, what would be the appropriate bound for is_sorted and friends?

IIRC:

  • PartialOrd requires that < is a strict partial ordering relation
  • PartialOrd + Eq requires that <= is a non-strict partial ordering relation

EDIT: Floats < is a strict partial ordering relation, but floats <= is not a non-strict partial ordering relation (because floats do not implement Eq).

Under a strict partial ordering relation, all of the following would be sorted (that is, is_sorted returns true for all of them):

  • {-NaN,-1.0,+0.0,+1.0,+NaN}
  • {+NaN,-1.0,+0.0,+1.0,-NaN}
  • {-1.0,-NaN,+0.0,+NaN,+1.0}
  • {-1.0,+0.0,+1.0,-NaN,+NaN}
  • {-1.0,-NaN,+1.0,+NaN,+0.0}
  • {+NaN,-NaN,-1.0,+0.0,+1.0}

Even in the absence of NaN, all following sequences would also be sorted:

  • {-1.0,-0.0,+0.0,-0.0,+1.0}
  • {-1.0,-0.0,+0.0,-0.0,+1.0}
  • {-1.0,-0.0,-0.0,+0.0,+1.0}
  • {-1.0,+0.0,-0.0,-0.0,+1.0}

because -0.0 == +0.0 so when we use <=, the == is only a weak/partial equality relation (if x == y does not imply f(x) == f(y), e.g., with f(x) = 1 / x we get that -0.0 == +0.0, but f(-0.0) = -Inf != f(+0.0) = +Inf).

I don't know, my intuition tells me that if we use <= as the ordering, then it should at least be a non-strict partial order. Also, if I have a sequence, and is_sorted returns false, then I should be able to sort it, but we would need T: Ord for that anyways.

In packed_simd, the problem of sorting floats is currently addressed in a completely different way because floats have two different ordering relations that you might want to use for sorting or comparisons:

  • a non-strict partial order with <=
  • a total order provided in the IEEE spec

Since we can't implement both ordering using PartialOrd anyways (there is no way to implement PartialOrd twice for a type, and then somehow select one of both implementations), what I did there is have a PartiallyOrder<T> and TotallyOrdered<T> wrappers (like Wrapping<T>), that implement the different kinds of orderings. You can use PartiallyOrdered<T> for a < b comparisons, but you cannot use it to sort. For that, you have to use TotallyOrdered<T>.


Another example that's similar to floats is a CaseInsensitveString.

It cannot implement Eq because a == b does not imply f(a) == f(b). Example: "FOO" == "foo" but given f(x) = x[0] == 'o' then f("FOO") = false but f("foo") == true (the strings are case insensitive, but the bytes in the strings are still distinct).

That is, for all permutations of ["FOO", "fOo", "foo"], ["foo", "fOo", "FOO"], ... is_sorted returns true.

@glaebhoerl
Copy link
Contributor

It cannot implement Eq because a == b does not imply f(a) == f(b).

I don't believe that Eq has this (very strong) condition! The docs at least don't mention it. Only reflexivity, symmetry, and transitivity. Among probably others, it would fail for &T (due to as *const T, then comparing for address-equality), and (due to similar reasons) for Cell (which as-implemented compares the current contents of the Cell, while an f could distinguish whether they are the same Cell).

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 8, 2018

I don't believe that Eq has this (very strong) condition!

I mixed Eq with std::strong_equality. Eq doesn't require that and the new docs don't change that. The f(a) == f(b) requirement is not a requirement of an equivalence relation, and is not necessary for a total order. std::strong_equality requires it because it tells you whether two equivalent values can be distinguished or not and implies substitutability, but it does not require it for every f either. The wording is a bit... fuzzy but basically f(x) can only observe the "public" state of x, and the address of x is not part of x's state, so this is not required to hold for f(x) = &x. EDIT: It would still break Cell though. No it would not. The only state that f can access per the C++ spec is the "comparison-salient state". Each type specifies what this state is, so Cell could just specify that its "comparison-salient state" is that of the wrapped T and call it a day.

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 9, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Aug 9, 2018
@LukasKalbertodt
Copy link
Member Author

@gnzlbg Unfortunately, I currently don't have the time to work on the PartialOrd docs. But it would be great if you could take LukasKalbertodt/partial-cmp-docs#1 , add all the tweaks and create a PR on rust-lang/rust!

I think then it would be the best to reboot the Ord vs PartialOrd discussion in the tracking issue for this feature (with a good summary comment describing the problem and the discussion so far).

@DDOtten
Copy link

DDOtten commented Aug 18, 2018

I would be very surprised when is_sorted_by_key would not work on an iterator when is_sorted and is_sorted_by do work. We have sort, sort_by, and sort_by_key so I would expect the same variants of is_sorted especially when they do work on slices and are simple to implement. I feel like it is not a question about how easy it is to do it yourself using map but more about completeness of the api.

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 19, 2018

The final comment period, with a disposition to merge, as per the review above, is now complete.

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Aug 19, 2018
@Centril Centril merged commit d424128 into rust-lang:master Aug 19, 2018
@Centril
Copy link
Contributor

Centril commented Aug 19, 2018

Huzzah! This RFC has been merged!

Tracking issue: rust-lang/rust#53485

@Centril Centril added A-traits-libstd Standard library trait related proposals & ideas A-iterators Iterator related proposals & ideas labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Iterator related proposals & ideas A-traits-libstd Standard library trait related proposals & ideas disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.