Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe comparison traits (PartialEq, Eq, PartialOrd, Ord) #956

Closed
wants to merge 14 commits into from

Conversation

theemathas
Copy link

rendered

See previous discussion at #926

- `a.le(b) <=> a.lt(b) || a.eq(b)`
- `a.ge(b) <=> a.gt(b) || a.eq(b)`
- `a.lt(b) <=> b.gt(a)`
- `a.lt(b) && b.lt(c) => a.lt(c)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably add this form of antisymmetry:
a.lt(b) => !(b.lt(a))
This is in the library documentation already. Of course, the implication must be one-way, because for IEEE NaNs a.lt(b) and b.lt(a) will both be false.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quantheory It is implied by the other rules.

Proof:
Suppose that a.lt(b) and b.lt(a)
From Rule 6 of PartialOrd (with a and b swapped) and b.lt(a), we get a.gt(b)
From Rule 1 of PartialOrd and a.lt(b), we get a.partial_cmp(b) == Some(Less)
From Rule 2 of PartialOrd and a.gt(b), we get a.partial_cmp(b) == Some(Greater)
Therefore, a.partial_cmp(b) == Some(Less) and a.partial_cmp(b) == Some(Greater), which is absurd.
Thus, the assumption is false, and that one of a.lt(b) and b.lt(a) is false.
Therefore, a.lt(b) => !(b.lt(a))

Correct me if this proof is wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I misread the partial_cmp bits as =>. (Well, there is a tiny problem with your proof, which is that b.lt(a) doesn't have to return at all, but that's not what I meant and not relevant here, probably.)

On a different note, I'm pretty sure that the transitive property is no good here (and as written in the current docs). The reason is that you can define two types in different crates that can compare to the same type in some third crate, but not to each other. Then if you use all three together there's suddenly a problem. It's a modularity hazard.

I think that transitivity should only be required if all of the comparisons are defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a.lt(b) && b.lt(c) => a.lt(c)

It's impossible to guarantee this unless adding an implementation of PartialOrd is a breaking change.

Consider two crates A and B. B defines a type Y. A defines type X and implements the <= operator between X and Y. Now B defines a new type Z and implements the <= operator for Y and Z. Then X <= Y and Y <= Z are defined but X <= Z is not defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see that @quantheory already mentioned this.

@quantheory
Copy link
Contributor

I don't think I could get behind this without a more clear motivating example at a minimum, since:

  1. A lot of sorting algorithms can be implemented efficiently with just a single operation (e.g. <). In those cases, the only relevant properties of < are that it is consistent with itself, antisymmetric, and transitive, and none of the relationships between different operators are relevant.

  2. skiplist is not a really compelling example. There are a lot of unsafe blocks, a few of which seem to be big enough to cover a lot of safe operations. I can't see any pinpoint any particular situation where there is definitely going to be a memory safety error that would be too expensive to avoid. (To the contrary, there are bounds-checked vector operations all over the place.)

    skiplist also seems to require the invariants of Ord, but uses PartialOrd instead because it wants to handle floats, since we don't have refinement types that exclude NaN. That confuses things further. It's not immediately clear to me what specific memory safety issue for that crate requires this RFC.

  3. Items with interior mutability might not compare the same twice in a row, e.g. because they wrap counters that can be shared with with other threads. This RFC implicitly forbids any implementation of PartialOrd that depends on, say, a value in a Mutex.

  4. The traits involved here don't really have anything to do with memory, and there's no obvious reason why comparisons that aren't "well-behaved" would cause memory issues in most code. It feels like this is undermining the purpose of unsafe, which is to call out the operations that most directly threaten memory and type safety. In that case, the unsafe should be on functions or traits related to whatever interfaces to a container can go wrong, not on comparison operations or other common, inherently safe code.

@theemathas
Copy link
Author

@quantheory

  1. In practice, people are not going to implement all of the other operators, so making extra guarantees should not cause extra friction for implementing these traits. On the other hand, the users of these traits can get extra guarantees and convenience.
  2. I admit that skiplist is not a good example, which is the reason I only mentioned it once, and not in this pull request. It was just the first example I could find, and I just used it without much thought.
  3. That is an interesting observation, but that is pretty much the point of making these traits unsafe. I cannot think of any use case that requires comparison to depend on Mutex or Cell. Race conditions in comparisons are definitely not what you would want (What would happen if I put such values into, say, a binary search tree?). Comparison of structs that just contain them are sometimes OK, though.
  4. Your approach has some problems. In some cases, the unsafe must be put either on the constructor (e.g. .new()), which is weird, or on all other methods, which is extremely unergonomic. Additionally, it forbids implementing some traits, such as FromIterator, which is required to use .collect().
    There are precedents for putting unsafe on traits that do not directly violate memory safety: Send and Sync. They do absolutely nothing on their own, and yet are unsafe traits.

@Diggsey
Copy link
Contributor

Diggsey commented Mar 9, 2015

Under "drawbacks" you're missing this case:

  • Might want to implement the traits such that an accidental mistake in the implementation will not cause memory unsafety.

What about adding an additional trait which is safe:

trait SafeCmp {
    type Image;
    fn map(&self) -> <Self as SafeCmp>::Image;
}

Implementing SafeCmp for a type T is equivalent to implementing whichever of PartialEq, Eq, PartialOrd, Ord are implemented by <T as SafeCmp>::Image.

This allows a good amount of flexibility for safe code to implement these traits by simply composing more primitive, unsafe implementations. By returning a tuple, for example, safe code can order by several fields. In addition, the standard library can provide some useful building blocks, such as a Reverse type which is a newtype of T and just inverts the ordering operations. In combination with tuples this allows the full set of orderings across a set of fields.

@quantheory
Copy link
Contributor

@theemathas

Race conditions in comparisons are definitely not what you would want (What would happen if I put such values into, say, a binary search tree?).

I don't know what would happen, but I'm asserting that it shouldn't cause a memory safety issue (at worst a panic, unless an unsafe interface is used).

Anyway, there are reasons to compare things that have nothing to do with sorted containers, so defining a comparison on internally mutable types is not such a ridiculous thing to do, even if they wouldn't play nice with some structures.

In some cases, the unsafe must be put either on the constructor (e.g. .new()), which is weird, or on all other methods, which is extremely unergonomic.

I don't see how this follows. Lots of containers provide both safe functions that perform lots of checks, and much faster but unsafe ones for niche cases where the extra speed trumps ergonomics. I'm not sure what cases cause this approach to fail. This is why I think a good example would help.

There are precedents for putting unsafe on traits that do not directly violate memory safety: Send and Sync. They do absolutely nothing on their own, and yet are unsafe traits.

Send and Sync have no purpose other than to represent information about how memory can be accessed, so unsafe traits were invented especially with them in mind, IIRC, and they are kind of a no-brainer.

The comparison operators are different in that they have no inherent semantic relationship to representations in memory, and they have many uses outside of sorted structures. I think that it's much less intuitive, and less likely to aid debugging, to mark them as unsafe.

@theemathas
Copy link
Author

@Diggsey Your SafeCmp trait is not so safe, since the map function might not return the same value every time it is called (it might do I/O).

@quantheory My point is that most types will simply #[derive] whatever traits they want. I believe that most types will simply do #[derive(PartialEq, Eq)]. In that case, you get speed and ergonomics. Do you have a lot code that implement these manually?

@mahkoh
Copy link
Contributor

mahkoh commented Mar 9, 2015

👍 in general but if we're doing a breaking change anyway then we should fix the current operator system. I've written down why the current system is fundamentally broken and how I'd like to fix it:


Why the current implementation is flawed

Rust has <, >, <=, >=, ==, and != comparison operators that users
are allowed to implement for their types. This is accomplished through four
traits:

  • PartialEq,
  • Eq,
  • PartialCmp, and
  • Cmp.

The last two are designed to correspond to the mathematical structures called
partial order and total
order
. Let's quickly recall the definitions
of those structures:

An operator <= defines a partial order if for all a, b, and c the
following holds:

  • a <= a,
  • a <= b and b <= a implies a = b, and
  • a <= b and b <= c implies a <= c.

An operator <= defines a total order on a set X if it defines a partial
order and for all a and b in X

  • a <= b or b <= a.

Intuitively, total orders look like lines and partial orders look like
graphs without cycles.

Unfortunately these concepts cannot be translated to a programming language
such as Rust as we will now show.

The equality operator

In mathematics the = operator is always defined. For all a and b the
equation a = b is always defined and either true or false. This means that
things like 1 = "hello world" are valid and false.

In programming languages, however, there are other concerns that prevent us from
following suit: Compile time type safety prevents us from writing
1 == "hello world". Compile time guarantees such as this are much more
important than modeling our comparison operator after the comparison operator
used in mathematics.

We can see that the definition of partial order uses the mathematical comparison
operator and does not require that a, b, and c have the same "type". As
we've seen above, this requirement is not necessary. Since we've not modeled our
comparison operator after the mathematical comparison operator, it's
questionable whether we should model <= after "partial order".

Rust works around this problem by introducing the PartialEq trait with the
following definition:

  • if a == b then b == a, and
  • if a == b and b == c then a == c.

Before you are allowed to implement PartialOrd, you first have to implement
PartialEq. This trait is called "partial" because it does not behave like the
comparison operator in mathematics. One fundamental property is that a = a is
always true. But in programming languages this is not always true. For example:

let a = std::f64::NAN;
assert!(a != a);

Since we want to implement <= for f64 we have to implement PartialEq for
f64 and therefore PartialEq cannot require a == a.

The one partial order and coherence rules

Above we wrote the following definition:

An operator <= defines a partial order

In mathematics you can have many different partial orders and call them or
. However, in Rust, there is only one partial order called <= and all
types that implement PartialOrd are part of it. What does it mean to add one
of your types to the partial order? It means that you are responsible for
ensuring that all of the rules above still hold after you've added your
type.

This is impossible

It's impossible because there is no after. I can implement a type X in my
crate and X <= u32 and you can implement a type Y in your crate and
u32 <= Y and none of us even know that the other crate exists and that we have
to implement X <= Y to satisfy the rules of partial orders.

Consider the following dependency graph:

        std
       /   \
      /     \
my crate   your crate
      \     /
       \   /
      consumer

I cannot implement X <= Y in my crate because I don't depend on your crate.
You cannot implement Y <= X in your crate because you don't depend on my
crate. The consumer cannot implement it because of the coherence rules for trait
implementations.

To fix this we would have to either change the graph to this:

        std
         |
      my crate
         |
     your crate
         |
      consumer

or the other way around. Obviously this is impossible.

Conclusion

I believe we have demonstrated that trying to implement the fundamental
operators <= and == via mathematical structures is not possible in Rust.

How to improve the situation

What we are trying to accomplish:

  • The system should not try to (incorrectly) imitate mathematical notions
  • The system should be as easy to use as the current system
  • Optionally the system should be more powerful than the current one.

Unfortunately the rust trait system is not really powerful enough to express the
symmetry required by operators such as ==. Therefore we will rely on some
compiler magic for built-in traits.

The <= operator

/// A trait that encodes the `<=` operator.
/// 
/// This trait is unsafe because you must follow the following rule:
/// 
/// - Transitivity: ... only necessary if (A, B), (B, C), and (A, C) implement
///   Ordered.
unsafe trait WeakOrd<T=Self> {
    /// Returns true if `self <= other`.
    fn le(&self, other: &T) -> bool;
    /// Returns true if `self >= other`.
    fn ge(&self, other: &T) -> bool;
}

When the user implements this trait for the pair (A, B), then the compiler
will automatically implement it for (B, A). The user is not allowed to
implement both.

WeakOrd<Self> can be derived.

Example: We have an enum RustOperator that contains all Rust operators (+,
-, *, ...). We can access their
operator precedence
via the operator_precedence method and want a <= b to be true if the
precedence of a is lower than the precedence of b.

impl WeakOrd for RustOperator {
    fn le(&self, other: &RustOperator) -> bool {
        self.operator_precedence() <= other.operator_precedence()
    }
    fn ge(&self, other: &RustOperator) -> bool {
        self.operator_precedence() >= other.operator_precedence()
    }
}

The == operator

/// A trait that encodes the `==` operator.
///
/// This trait is unsafe because you have to follow the following rules
///
/// - transitivity: ... but only if all operators are implemented
/// - Exactly one of the following properties holds:
///     * `a == b`
///     * `a != b`
unsafe trait Eq<T=Self> {
    /// Returns true if `self == other`.
    fn eq(&self, other: &T) -> bool

    /// Return true if `self != other`
    fn ne(&self, other: &T) -> bool
}

When the user implements this trait for the pair (A, B), then the compiler
will automatically implement it for (B, A). The user is not allowed to
implement both.

This trait does not require a == a because such a definition would be
circular.

The < operator

/// A trait that encodes the `<` operator.
///
/// This trait is unsafe because you have to follow the following rules
///
/// - transitivity: ... but only if all operators are implemented
/// - At most one of the following properties holds:
///     * `a < b`
///     * `a > b`
///     * `a == b`
/// - `a <= b` if and only if `a < b` or `a == b`
unsafe trait StrongOrd<T=Self>: Eq<T> {
    fn cmp(&self, other: &T) -> Option<Ordering>;

    fn lt(&self, other: &T) -> bool { .. }
    fn le(&self, other: &T) -> bool { .. }
    fn gt(&self, other: &T) -> bool { .. }
    fn ge(&self, other: &T) -> bool { .. }
}

When the user implements this trait for the pair (A, B), then the compiler
will automatically implement it for (B, A). The user is not allowed to
implement both.

StrongOrd<Self> can be derived.

The user is not allowed to implement both StrongOrd and WeakOrd. If
StrongOrd is implemented, then the compiler automatically implements WeakOrd
to enable the <= operator.

We add another trait to assert the existence of a total order:

unsafe trait Ord: StrongOrd<Self> { }

@theemathas
Copy link
Author

@mahkoh

I am not sure what do you mean. However I think I can define lt(), gt(), eq(), ne() in terms of le() and ge()

  • a.lt(b) == a.le(b) && !(a.ge(b))
  • a.gt(b) == a.ge(b) && !(a.le(b))
  • a.eq(b) == a.le(b) && a.ge(b)
  • a.ne(b) == !(a.le(b)) || !(a.ge(b))

@theemathas
Copy link
Author

After some discussion in IRC, I have put the conditions for a trait to require unsafe due to extra invariants/properties:

unsafe is required if relying on the invariants/properties of the trait can lead to more efficient code.

Are the cmp traits and ExactSizeIterator the only traits that fit this criteria?

Note: ExactSizeIterator is mentioned in the Unresolved questions section

@theemathas
Copy link
Author

I'm starting to think that the Relaxed alternative sounds good after talking with @reem on IRC.

@nikomatsakis
Copy link
Contributor

I'm not sure of my current opinion on this. Some scattered thoughts:

  1. I think the RFC would definitely be stronger with more specific examples and perhaps some performance numbers. Another motivation beyond performance might be correctness: I can easily imagine that sorting of other algorithms accidentally assume consistent results from <= or ==. If those algorithms were implemented in unsafe code, it is certainly plausible that segfaults or other badness could result.
  2. This change seems to exacerbate the ergonomic cost when #[derive] is not suitable for your case. Now not only must you write the annoying impl by hand, but you must make unsafe assertions, trivial though they may be. It'd be nice if that wasn't the case.
  3. In general I'm wary of making unsafe too common. This seems to lie right on the line -- there is a motivation, but Eq and PartialOrd etc are very common.
  4. I think that @quantheory's point about Mutex and Cell is very good. There are (I think) potentially good reasons to implement Ord or Eq on such types, despite the risk that things could go haywire if the contents of the Cell or Mutex change while the value is stored in a data structure (basically, you're taking it into your own hands not to make such changes while the value is stored). This seems ok so long as the result is "the key is not found" rather than "segfault".
  5. I haven't fully digested @mahkoh's suggestion yet, but I know I am relucant (to say the least) to add a lot of compiler magic around such as automatic inverse pairs and so on. I'll have to re-read the comment again.

@mahkoh
Copy link
Contributor

mahkoh commented Mar 10, 2015

@nikomatsakis: The two types of compiler magic mentioned above are

  • If X implements WeakOrd<U> then the compiler automatically implements WeakOrd<T> for U
  • If X implements StrictOrd<U> then the compiler automatically implements WeakOrd<U>

The second one already seems to be possible without compiler magic:

impl<U, T: StrictOrd<U>> WeakOrd<U> for T {
    fn f(&self, u: &U) {
        <T as StrictOrd<U>>::f(self, u)
    }
}

But making the implementation symmetric does not seem to work:

impl<T, U: WeakOrd<T>> WeakOrd<U> for T {
    fn f(&self, u: &U) {
        <U as WeakOrd<T>>::f(u, self)
    }
}

because it also affects T that already implement WeakOrd<U>. Maybe something like negative impls is needed

impl<T: !WeakOrd<U>, U: WeakOrd<T>> WeakOrd<U> for T {

Then no compiler magic would be required.

@mahkoh
Copy link
Contributor

mahkoh commented Mar 10, 2015

In any case, the compiler magic doesn required doesn't look like something that cannot (later) be moved into the general language.

@Gankra
Copy link
Contributor

Gankra commented Mar 11, 2015

As a datapoint nothing in the stdlib that I'm aware of would do something unsafe for an awful Ord impl like "false". This is in spite of the fact that much of the std code that works with orderings is unsafe.

All ordering-based algorithms/structures I've ever seen have to be prepared for things like "this is the biggest/smallest" which is generally want incoherent Ord impls will slam into.

Of course they will produce incoherent results like producing unsorted results or "losing" values for the purpose of lookup, but this will always happen in a safe way. In particular for a collection an all-element iterator will yield all inserted values because iterators are always structural and don't actually invoke comparisons. Bad impls can also cause collections to have degenerately bad perf. Still not a safety issue.

I would be very interested to see some real code that empirically runs faster on correct Ord impls by relying on Ord to be coherent.

@theemathas
Copy link
Author

@gankro As en example, I think that a fast sorting algorithm implementation could do an array out-of-bounds index given a bad Ord implementation. Dealing with those would likely lead to a slow-down (although by a constant factor).

I don't have the time to empirically test this yet. Anybody want to try?

@Gankra
Copy link
Contributor

Gankra commented Mar 12, 2015

You'll need to be more concrete than that. QuickSort, MergeSort, InsertionSort, BubbleSort; none of those have this problem. All of them need to handle that an arbitrary element is the biggest/smallest for some subrange, which would prevent any out-of-bounds access.

Non-comparison algorithms like RadixSort rely on something more powerful than Ord (bitwise representations).

As for fancier algorithms like WikiSort and GrailSort, I'll be the first to admit I'm not the most experienced with them. What I recall of WikiSort there isn't anywhere where you "know" that you'll find an x with x < y or something. The sketchiest thing is maybe a sorting network that it uses as a subroutine. However it's just based on hard-coded comparisons and swaps. It doesn't search at all.

GrailSort's literature is pretty impenetrable, so I really have no idea. I think it's pretty similar to WikiSort, though.

@Gankra
Copy link
Contributor

Gankra commented Mar 12, 2015

The only thing I can think of is honestly that some other unsafe code is relying on a data structure or algorithm to not behave entirely insane. Like if it doesn't find the given key in the map, or the element doesn't sort to be the minimum it does something unsafe.

But that's... super theoretical.

@reem
Copy link

reem commented Mar 12, 2015

Another major problem here actually seems to be that it's extremely hard to figure out exactly what code is relying on what guarantees, and that code is likely to be extremely unpredictable when those guarantees are broken. In other languages, this is no big deal since the worst you can do is throw an exception, but we have to be careful with this situation, because in Rust you can cause memory unsafety.

What we have to be most careful of is ruining our main guarantee: if you write safe rust code, your code will be memory safe. If we have to amend that to "if you write safe rust code, your code will be memory safe, as long as you don't write any incorrect code" then we've done ourselves a heavy blow.

I think the only way this is really feasible is for us to create a guideline that unsafe code behind a safe interface (an unsafe interface can do anything) must not violate memory safety even if safe code does crazy things.

@theemathas
Copy link
Author

@gankro I am thinking about a quicksort implementation that goes out of bounds if x<x is true. I will implement a "proof of concept" (or whatever you call it) soon.

@reem I suppose that your argument is in favor of this RFC, right? I think that the simplest way to solve this problem is to rely on code in an unsafe trait not doing crazy things. We already rely on sane Sync and Send implementations.

@theemathas
Copy link
Author

@gankro I think I do not have to write a proof-of-concept any more, since I found this real world example with C++.

Now, you cannot say that it is just theoretical.

@reem
Copy link

reem commented Mar 13, 2015

@theemathas the difference is that Send and Sync both answer questions directly related to memory safety: "is it memory safe to send this type across thread boundaries" and "is it safe to immutably access from multiple threads concurrently", respectively; the comparison traits do not.

@nikomatsakis nikomatsakis self-assigned this Mar 19, 2015
@huonw
Copy link
Member

huonw commented Mar 19, 2015

It occurred to me just now that this could be handled in the same way that we handle "trusted iterator length" for Iterator::size_hint (whatever way that is).

@quantheory
Copy link
Contributor

Thinking about this, what would be wrong with StrictOrd/StrictEq again? (Or OrdStrict or whatever?) All it has to do is signify that the Ord implementation fulfills a certain contract, so the definitions would be simple:

unsafe trait StrictEq: Eq {}
unsafe trait StrictOrd: StrictEq+Ord {}
// ...
unsafe impl StrictEq for i32 {}
unsafe impl StrictOrd for i32 {}

I guess that this is basically the same thing as the RelaxedOrd suggestion, with the difference being that the latter tries to encourage people to implement the unsafe version where possible by making the name shorter. However it's a bit weird that Ord would not itself be the trait used to implement the comparison operators, so you might get the opposite problem, where functions are written to accept Ord when they really only need RelaxedOrd. I'm not sure where the right balance to strike is. StrictOrd is backwards compatible, which leaves it as the only option if this is delayed long enough.

If someone forgets to implement StrictOrd, but you're sure that they could have, and you really need StrictOrd in some external crate, I think that you can work around it with a wrapper pretty easily:

#[derive(Ord, PartialOrd, Eq, PartialEq)]
struct Wrapper(ExternalType);
unsafe impl StrictEq for Wrapper {}
unsafe impl StrictOrd for Wrapper {}

Of course there's the whole issue with "derive", but I kind of feel like we should fix that anyway. Maybe the simplest thing would be to have a marker that says "derive not only this, but all derivable super-traits, recursively"). E.g. to pick an arbitrary syntax (I'm sure that something like this has been proposed before):

// Instead of `#[derive(Ord, PartialOrd, Eq, PartialEq, Copy, Clone, Debug)]`
#[derive(Ord:*, Copy, Clone, Debug)]
struct MyType;
// Instead of `#[derive(StrictOrd, Ord, PartialOrd, StrictEq, Eq, PartialEq, Copy, Clone, Debug)]`
#[derive(StrictOrd:*, Copy, Clone, Debug)]
struct MyOtherType;

@theemathas
Copy link
Author

@huonw The problem is that I still don't know how did we / are we going to solve the iterator problem "whatever way that is". I also believe that the problem with comparisons is easier to solve than the problem with iterators.

@huonw
Copy link
Member

huonw commented Mar 20, 2015

@theemathas Could you go into more detail about why you think this problem is easier? AFAIK, the problems are isomorphic: there's a constraint about the behaviour of certain methods that unsafe code wants to rely on. I believe every solution proposed for Iterator::size_hint would work for the comparison traits (at least, have the same up/down-sides as they do for Iterator).

E.g. if unsafe method were to work as we wish:

trait Eq: PartialEq {
     unsafe fn is_correct() -> bool { false }
}

// in std:
impl Eq for u8 {
     unsafe fn is_correct() -> bool { true }
}

// in an external lib:
impl Eq for Dubious {}

#[derive] would call is_correct on all the subcomponents and && them together.

@theemathas
Copy link
Author

@huonw The main differences are that making Iterator an unsafe trait will definitely not work out well, and that most implementations of comparison traits use #[derive].

@glaebhoerl
Copy link
Contributor

@theemathas

Your argument almost makes sense. The main problem is that how the opt-in should be done.

I didn't address this only for the simple reason that I don't have any better ideas than what have already been floated. Basically the question is about where to put the unsafe. I think most of the other "solutions" (unsafe subtraits like StrictEq, unsafe constructors for the collection, unsafe methods) seem less bad than making Eq and Ord themselves unsafe. Probably every solution is going to be unsatisfying on some level, but at least they localize the pain to code where the unsafety is actually needed. (I don't think renaming Ord to RelaxedOrd makes much sense, unless we want to add the Relaxed prefix to every trait that has implied laws (which, arguably, should be every trait) for consistency's sake.)

I think I might prefer UnsafeEq and UnsafeOrd as names to StrictEq and StrictOrd. Not to force people to write "unsafe" more times, but only for greater clarity as to their purpose. (Eq and Ord are already supposed to be "strict". We merely don't wish to bet the soundness and security of our program on it when not forced to.)

If we had the Glasgow Haskell-like ability to store evidence of trait bounds in datatypes and retrieve it by pattern matching (i.e. data HasUnsafeOrd a where Proof :: UnsafeOrd a => HasUnsafeOrd a), then we could perhaps do something like this:

struct HasUnsafeOrd<T: UnsafeOrd>;

trait Ord: Eq, PartialOrd {
    /* fn compare, etc. */
    const HasUnsafe: Option<HasUnsafeOrd<Self>> = None;
}

unsafe trait UnsafeOrd: Ord { }

fn fast_unsafe_sort<T: UnsafeOrd>(list: &mut [T]) { ... }

fn not_as_fast_safe_sort<T: Ord>(list: &mut [T]) { ... }

fn sort<T: Ord>(list: &mut [T]) {
    if let Some(proof) = <T as Ord>::HasUnsafe {
        // the missing piece: the typechecker would know that `T: UnsafeOrd` here!
        fast_unsafe_sort(list)
    } else {
        not_as_fast_safe_sort(list)
    }
}

Thinking out of the box, while it's not possible to prove that Eq or Ord (or whichever trait) is implemented correctly from within Rust itself, perhaps we could integrate with an external theorem prover where it is possible (Isabelle, Agda?) somehow, and allow writing impl StrictOrd for MyType without any unsafe (in this case the UnsafeOrd name seems less appropriate - maybe VerifiedOrd after Idris?), provided that an external proof from that other system is (somehow) supplied. (I haven't the faintest idea how this could actually be implemented. For unknown reasons I vaguely feel like pinging @kmcallister... but my subconscious may be getting things mixed up.)

@theemathas
Copy link
Author

@huonw I retract my claim that the safety of comparisons is easier to deal with than of iterators, due to rust-lang/rust#23452

I now think that comparison traits are harder than the Iterator trait.

@comex
Copy link

comex commented Mar 22, 2015

For amusement purposes, you might like to read about this CTF challenge, where participants had to actually exploit the GNU libstdc++ std::sort implementation by specifying an unfaithful comparison function (in Lua):

https://kitctf.de/writeups/31c3-ctf/saas/

@huonw
Copy link
Member

huonw commented Mar 22, 2015

@theemathas

due to rust-lang/rust#23452

That seems to offer a fairly reasonable way to tackle the comparison traits too, via something like the sketch in #956 (comment) .

@theemathas
Copy link
Author

@huonw Your alternative is interesting, but should we put it in PartialEq, Eq, PartialOrd, Ord, or all of them?

By the way, it might be better for is_correct to be an associated constant, which is not due 1.0. Can this be added with backwards-compatibility?

@dhardy
Copy link
Contributor

dhardy commented Mar 23, 2015

So we have a demonstrated 10% performance gain when removing the checks. What's the worst case if the comparator is not correct? If it's a security hole or seg-fault, I'm not sure if it's worth it...

That said, this (very roughly, probably wrong property syntax) is what I'd propose as an alternative to unsafe traits:

trait MyTrait {
    /// Implementations must guarantee "is_correct", meaning ...
    #[require_guarantee "is_correct"]
    fn my_prop (&self, other: &Self) -> bool;
}
impl MyTrait for X {
    #[guarantee "ABCDEF" "is_correct"]
    fn my_prop (&self, other: &Self) -> bool { ... }
}

There's an added (unnecessary) feature here: ABCDEF is a compiler-generated checksum on the AST of the function implementation and all functions/constants that uses up to the crate boundary, such that the compiler can warn whenever the guarantee needs to be re-checked.

@petrochenkov
Copy link
Contributor

I like the @huonw 's approach with is_correct, although ideally is_correct would be an associated constant and not a function.
I especially like that is_correct is completely optional and is only a matter of optimization. If someone doesn't care about possible extra optimizitions, he can just omit the override of is_correct, keep it default (i.e. false) and everything will work as expected.
Another nice property is that is_correct can be backward-compatibly added later to other traits if deemed necessary. Even PartialEq can be modified later, when implementation of associated constants lands, and not immediately.

@glaebhoerl
Copy link
Contributor

I would give it a different name though. OPT_IN_TO_UNSAFE_SORT_FUNCTIONS or whatever. From is_correct it's not clear what purpose it serves, and at first blush it seems absurd not to make it true - why would you write an implementation that's not correct?!

@nikomatsakis
Copy link
Contributor

I've been reading and re-reading this comment thread and RFC and I confess to feeling somewhat torn. On the one hand, I think that the current PartialEq/Eq (resp. Ord) split isn't buying us much. We have these extra "totality" guarantees, but we can really rely on in the implementation, and they mean we can't sort an array of floats (not to mention writing more in #[derive] than one would like, though that seems surmountable). If we tied Eq and Ord to stronger guarantees, that would seem to justify them better as separate traits.

On the other hand, I am uncomfortable with inching the line on where unsafe code is required forward. I've certainly had plenty of crashes in C due to writing poor comparators and giving them to qsort. It's easy to forget that the goal of Rust is to reduce crashes in practice -- not just be able to say "it's your fault, you wrote unsafe". Put another way, unsafe is most effective if it is unusual. (Granted, deriving spares you this pain most of the time, but it's not that uncommon to want to compare just one or two fields from a struct or something like that; same goes for inserting mutable data as keys (that you know you won't mutate).)

I guess what I think I really want is to go the opposite direction from this RFC and simplify the current hierarchy, since the "guarantees" we're trying to enforce aren't real guarantees. Drop the PartialEq/Eq (resp. Ord) distinction and just collapse the two traits into one trait with no particular guarantees. Just implement it for floats and say that using Ord on NaN will yield inconsistent results, so we suggest you don't sort an array with NaN in it, or else you'll get some random gibberish (but not a crash). Since the primary gain seems to be from sort implementations, external crates could certainly provide unsafe sort implementations that (unsafely) rely on Ord, or even unsafe traits.

All that said, what I really don't want to do is detail the 1.0 effort over this question, whichever way it is decided. It feels like with specialization we could probably move to the simple, collapsed hierarchy approach via deprecation after 1.0, which is probably a more practical path. (For that matter, depending on the details, some form of specialization might also allow us to have a sort that takes advantage of unsafe traits when they are available and falls back to something else otherwise.)

@Gankra
Copy link
Contributor

Gankra commented Mar 23, 2015

@nikomatsakis you can get quite more troublesome results using NaNs in a sorted map (unable to find things that are inserted).

@Gankra
Copy link
Contributor

Gankra commented Mar 24, 2015

To be clear, this isn't unsafe, it's just more nasty than NaN's giving a stupid sort.

@nikomatsakis
Copy link
Contributor

@gankro oh, I know, things can go totally wrong, same as if you are use a hashtable key that changes its hash mid-execution.

@theemathas
Copy link
Author

@nikomatsakis The difference is that a hashtable implementation would probably want to hash the value only once.

@bluss
Copy link
Member

bluss commented Mar 29, 2015

Shouldn't this rfc explicitly state "never panic" as part of the rules for the unsafe comparisons? Regarding the connection to the "finally" issue.

@nikomatsakis
Copy link
Contributor

The discussion on this RFC seems to have stalled without a clear consensus. In any case, now that beta is released, I don't think breaking changes like this are in the cards. It seems likely we can layer these kinds of improvements on later if necessary. That end, I've opened issue #1051 and I'm going to close this RFC as postponed. Thanks all for the discussion and thoughts, and thanks @theemathas for the well-reasoned RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.