Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Settle execution order uncertainty for += #28160

Closed
nikomatsakis opened this issue Sep 1, 2015 · 45 comments
Closed

Settle execution order uncertainty for += #28160

nikomatsakis opened this issue Sep 1, 2015 · 45 comments
Labels
A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. P-medium Medium priority T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

When translating something like a += b to MIR, an uncertainty arose about what the semantics of

a = 0;
a += { a = 22; 2 }

ought to be. Should resulting value of a be 24 or 2, basically? The current MIR yields 24, on the basis that this is more-or-less what will happen in the case of overloading as well (presuming non-lexical lifetimes, since I think otherwise you get a borrowck error).

@nikomatsakis nikomatsakis added I-nominated I-needs-decision Issue: In need of a decision. T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Sep 1, 2015
@eefriedman
Copy link
Contributor

See also #27868.

@llogiq
Copy link
Contributor

llogiq commented Sep 2, 2015

I think this is a degenerate case, and should be at least linted against.

@petrochenkov
Copy link
Contributor

I'm surprised this is allowed by the borrow checker (regardless of non-lexical lifetimes).
In my intuition a is clearly firstly mutably borrowed here (and released when the whole A += B is evaluated)

a += { a = 22; 2 }
^

, but then modified here

a += { a = 22; 2 }
         ^

despite being borrowed.

@llogiq
Copy link
Contributor

llogiq commented Sep 2, 2015

@petrochenkov a could be Copy.

@petrochenkov
Copy link
Contributor

@llogiq
And? You still can't modify borrowed Copy data.

@llogiq
Copy link
Contributor

llogiq commented Sep 2, 2015

There is no borrow. a is taken by value (and copied, because it is Copy). Note that in @nikomatsakis' example the values are integers, so they are Copy by definition.

@petrochenkov
Copy link
Contributor

@llogiq
+= is interested in the location of a, not only in its value, it has to borrow it.

I mean, the effective signature of += is fn(&mut Lhs, Rhs); (the same as in rust-lang/rfcs#953) and this snippet doesn't pass borrow checker after being rewritten with a function.
http://is.gd/itvea6
IMO, built-in += should behave in the same prohibitive way as its overloaded equivalent with regards to borrow checker instead of allowing snippets with weird semantics, like the original example.

@llogiq
Copy link
Contributor

llogiq commented Sep 2, 2015

Ah, I see. You're right, a must be borrowed, because else the mutation couldn't have been observed in the containing code. Btw. with a = a + { a = 22; 2 };, it currently works: http://is.gd/rvPQAm though it has a probably surprising warning that is inconsistent with the result.

@tomaka
Copy link
Contributor

tomaka commented Sep 2, 2015

The following code prints 488447264 on the playpen, which is bad: (http://is.gd/k4cPWw)

fn main() {
    let mut a;
    a += { a = 2; 3 };
    println!("{:?}", a);
}

@llogiq
Copy link
Contributor

llogiq commented Sep 2, 2015

Does it add the address of &a to the value?

@petrochenkov
Copy link
Contributor

Edit: erroneous example removed.

@Stebalien
Copy link
Contributor

@llogiq no. 3 is getting added to uninitialized memory. It's a repeating byte (0x1d) followed by a 0x1d + 3 byte. http://is.gd/Q0Hhr3

@nagisa
Copy link
Member

nagisa commented Sep 2, 2015

IMO += should be simply a sugar for

AddAssign::add_assign(&mut self, rhs);

where AddAssign is some trait added in the future to overload +=.

a = 0;
AddAssign::add_assign(&mut a, &{ a = 22; 2 });

@eddyb
Copy link
Member

eddyb commented Sep 2, 2015

@nagisa Yes, that could be a valid desugaring, but no, the borrow doesn't have to happen before rhs is evaluated (disallowing @tomaka's example).
In fact, it's a current limitation that borrows from autorefs are evaluated like that (see the common issue of vec.push(vec.len()) being denied by borrowck despite being safe).

There can be no memory unsafety here, even if borrowck allows a to be read or written in rhs:

  • as long as a is not moved, it will have the same address
  • a is accessed through the mutable borrow only by the called function, which is after all the other arguments have been evaluated

However, there's an evaluation bug in trans here (looking at @tomaka's example): the LHS must not be read before evaluating the RHS, but the current implementation disobeys that rule and can cause UB AFAICT.

EDIT: clarified some bits here and there.

@arielb1
Copy link
Contributor

arielb1 commented Sep 3, 2015

Currently borrowck evaluates += right-to-left while trans evaluates it left-to-right.

fn main() {
    let a = Box::new(2);
    let mut b = 2;
    *{ *a; &mut b } //~ ERROR use of moved value
        +=
    { drop(a); 1 };
}

=-assignment is both evaluated and borrow-checked right-to-left (trans erroneously tries to elide the copy of the RHS, but that is a different issue).

Unfortunately, evaluating = left-to-right would break code like

    let x: &mut [u8] = calculate();
    x[i] = x[j];

As it would translate into

    %temp : &mut [u8] = call calculate, onerr exit
    x = %temp
    %boundck_1 : bool = i >= x.len
    bt %boundck_1, panic_boundck_1
    %lhs : &mut u8 = &mut x[i]
    %boundck_2: bool = j >= x.len
    bt %boundck_2, panic_boundck_2
    %rhs : u8 = x[j]
    *%lhs = %rhs
    ret
exit:
    unwind

Here the borrow of x[i] conflicts with the read of x[j]. Note that we need some kind of borrow to at least prevent x's length from changing after the bound-check (it does not need to be mutable, but there are more complicated examples where the RHS can change x[j]).

One way to fix this (suggested by @eddyb) is to move the lvalue evaluation (including the bounds check) to after the RHS computation, which relies on lvalue evaluations not being able to invalidate borrows.

We also thought of potentially packing coercions into the lvalue evaluation. This could also make code like v.push(v.len()) work. Unfortunately, it doesn't work well with UFCS. We may want to add an autoref expression for that and/or move addr-ofs too.

@petrochenkov
Copy link
Contributor

Note that we need some kind of borrow to at least prevent x's length from changing after the bound-check

Heh.

fn report(a: &Vec<i32>) {
    println!("{:?} {:?} {:?} {:?}", a.len(), a.capacity(), a[0], &a[0] as *const _);
}

fn main() -> () {
    let mut a = vec![0];
    report(&a);
    let xxx = &a[0] as *const _;
    println!("{:?} {:?}", xxx, unsafe { *xxx });
    a[0] += {
        report(&a);
        a.push(1);
        report(&a);
        a.push(2);
        report(&a);
        3
    };
    report(&a);
    println!("{:?} {:?}", xxx, unsafe { *xxx });
}

1 1 0 0x7f4386024008
0x7f4386024008 0
1 1 0 0x7f4386024008
2 2 0 0x7f4386024008
3 4 0 0x7f4386023000
3 4 0 0x7f4386023000
0x7f4386024008 3

@eddyb
Copy link
Member

eddyb commented Sep 3, 2015

Since several people voiced their disagreement on the need for supporting assignment in the RHS, as a counterexample I have this piece of code written this week (coincidentally) which was miscompiled (|= and any other OP= operators have the same issue as +=).

The relevant bit:

            dirty |= match event {
                E::KeyboardInput(ElementState::Pressed, _, Some(key)) => {
                    dirty |= root.dispatch(&ui::event::KeyDown(key));
                    for e in key_tracker.down(key) {
                        dirty |= root.dispatch(&e);
                    }
                    false
                }
                ...
            };

And the workaround looks like this:

            dirty |= match event {
                E::KeyboardInput(ElementState::Pressed, _, Some(key)) => {
                    let mut dirty = root.dispatch(&ui::event::KeyDown(key));
                    for e in key_tracker.down(key) {
                        dirty |= root.dispatch(&e);
                    }
                    dirty
                }
                ...
            };

Is this an usecase we want to support or not?

If there is an easy way to trigger borrowck errors on such usecases (or warnings, but that seems less likely), can we get a crater run to estimate the prevalence of LHS reads/writes in the RHS of LHS OP= RHS expression, in the wild?

@eddyb
Copy link
Member

eddyb commented Sep 3, 2015

@arielb1 You could argue that bounds-checked indexing is not "pure" and thus cannot avoid being borrowed, in some way, unlike other lvalues.

@eddyb
Copy link
Member

eddyb commented Sep 3, 2015

On IRC I came up with this example which seems to make the smarter schemes fall apart:

let mut boxed = Box::new(vec![...]);
(*boxed).push({
    mem::replace(&mut boxed, Box::new(vec![])).len()
})

Do we need to differentiate between GEP-style lvalue computations (lvalue -> lvalue with only offsets) and lvalue computations that actually load data from memory?
The result of the former (such as struct/tuple field accesses) should be the same before and after any possible expression that may mutate the lvalues in question.

@arielb1
Copy link
Contributor

arielb1 commented Sep 3, 2015

@eddyb

This is annoying because overloaded deref can be both.

@petrochenkov

We know about the bugs with += (the lvalue borrow and the order-of-evaluation mismatch).

@arielb1
Copy link
Contributor

arielb1 commented Sep 3, 2015

One proposal that mostly maintains LTR and handles x[i] = x[j]

Definitions

An lvalue expression is basically the current rustc lvalue expression.

Unresolved Question: do we include overloaded index/deref in index/deref lvalues? this makes more code compile but could be confusing in some cases?

LVALUE_EXPR = 
       MUTABILITY RVALUE_EXPR           - temporary
       | LVALUE_EXPR . FIELD            - field access
       | LVALUE_EXPR [ RVALUE_EXPR ]    - indexing (including overloaded?)
       | *MUTABILITY LVALUE_EXPR        - deref
       | LOCAL                          - local variable

Evaluating an expression "to an lvalue" is just evaluating all the rvalue-expressions in it. Note that after an expression is evaluated to an lvalue, finalizing its evaluation can still read memory and possibly run user code (if we allow overloaded derefs/indexing).

Evaluating an expression "to an rvalue" is just standard evaluation.

To evaluate an expression with a receiver, that's it "LHS = RHS", "LHS OP= RHS", "LHS.foo(ARG1, ARG2)", first the pre-final-autoref receiver is evaluated to an lvalue, then the other operands are evaluated to rvalues, then the receiver evaluation is finalized and autoref-ed if needed.

Unresolved question: should this happen with by-value-self taking methods too? Provide an example for and against.

Examples

Simple assignment

x[I] = x[J];
// equiv
let i = I;
let rhs = x[J];
x[I] = rhs;

Simple function call

a.b.f(a.b.g(), a.b.h())
// equiv
let arg0 = a.b.g();
let arg1 = a.b.h();
a.b.f(arg0, arg1); // potentially with overloaded autoderef

Changing receiver

(*boxed).push({
    mem::replace(&mut boxed, Box::new(vec![])).len()
})
// equiv
let arg0 = mem::replace(&mut boxed, Box::new(vec![])).len();
Vec::push(&mut *boxed, arg0); // this can be surprising, I guess.

Changing receiver, deep

let y = Vec::new();
(***boxed).get({boxed=&&&y; 42})
// equiv
let y = Vec::new();
let ix = {boxed=&&&y; 42};
<[u8]>::get(&<Vec<u8> as Deref>::deref(&***boxed), ix)

Cutting your own receiver

let boxed: Box<&[u8]> = get();
boxed.get({drop(boxed); 4})
// equiv
let boxed: Box<&[u8]> = get();
let ix = { drop(boxed); 4 };
<[u8]>::get(&**boxed) //~ ERROR use of moved value

Cutting your own receiver, by-value

let a: &mut [u32] = &mut [1,2];
let b: &mut [u32] = &mut [3,4];
let c;
a.get_mut({a=b; c=a; 1})
// equiv
let ix = {a=b; c=a; 1}
<&mut [u32]>::get_mut(a, ix) //~ ERROR use of moved value

Chained function call

a.b().c().d()
// equiv
let t0 = a.b(); // this is an rvalue
let t1 = t0.c();
let t2 = t1.d();

Pushing length

x[ix()].push(x[0].len());
// equiv
let index = ix();
let arg0 = x[0].len();
x[index].push(arg0);

Pushing length, via function

x.get_mut().push(x[0].len());
// equiv
let t0 = x.get_mut();
let len = x[0].len(); //~ ERROR cannot borrow
t0.push(len);

Simple assign-op

dirty |= SOMETHING_MODIFYING_DIRTY;
// equiv
let rhs = SOMETHING_MODIFYING_DIRTY;
dirty = dirty | rhs;

Advantages

Most code should compile and do something sane.

Disadvantages

Indexing/deref is handled somewhat differently from normal methods. If we don't allow overloaded deref/indexing, they can behave differently from primitive deref/indexing. If we do, their order-of-evaluation can be somewhat confusing.

The handling of coercions needs to be thought about - if they can occur in the middle of an lvalue, they can break it. It would be nice if someone who knows them could provide an example.

This also changes order-of-evaluation, which could subtly break existing code.

Coercions

Could someone help me there (@eddyb, you know these best)

Lvalues and deref

With the borrow checker, overloaded deref as well as deref of &mut references is very similar to an operation on lvalues, turning an 'a Ptr<T> (or 'a &'b mut T) into an 'a T.

This is actually not totally precise: references can be "leaked". A by-value &'a mut T can be transformed to an 'a mut T. This is similar to a turning a by-value Box<T> into an 'static mut T (the latter transformation is not automatic, as it has the unfortunate side-effect effect of leaking memory), but is a good model in many cases (this "leak" functionality is somewhat confusing).

Dereference of &T always uses the "leak" functionality, but that is less confusing as &T is Copy.

@arielb1
Copy link
Contributor

arielb1 commented Sep 5, 2015

Application

When should the "evaluate lvalue's rvalue components first, then lvalue" rule be applied? Only to method calls and assignment ops? Everywhere an lvalue is required? Does this include the pseudo-lvalue of *?

@llogiq
Copy link
Contributor

llogiq commented Sep 7, 2015

@arielb1 I think the "Simple assignment" should read

x[I] = x[J];
// equiv
let i = I;
let rhs = x[J];
x[i] = rhs;

@arielb1
Copy link
Contributor

arielb1 commented Sep 11, 2015

@nikomatsakis (replying here because discourse refuses to work)

My feeling is that we should keep the evaluation order LTR where possible, but overall it should be > relatively easy to predict. Thus complex orderings that "mostly" preserve LTR seems (to me) to be > overall less good than just saying "assignments are RTL".

I don't really have a problem with a "typically LTR, receiver and assignment RHSLHS last" ordering, but @eddyb is rather surprised by it. However, it may just be that my rules just push the surprisingness into subtler cases.

The interaction with overloaded operators is also an important question. It is not clear to me that self.balance -= x and self.balance = self.balance - x should always be equivalent. They are certainly distinct with overloaded operators, since one desugars into an &mut self operator (or at least that is the proposed desugaring). I think we should strive that the builtin operations can be modeled using overloaded versions, which would imply that self.balance -= foo() will "read" the current value of self.balance only after foo() is evaluated (unlike self.balance = self.balance - foo()). Subtle, but they you go. (I imagine that all languages which desugar -= specially can encounter a similar situation.)

You can't have the trivial desugaring anyway - self.balance = self.balance - x calls both Deref and DerefMut on Self. With my rules it (self.balance -= x) should behave the same as self.balance.sub_by(x), except the adjustment rules are different (and aren't they different anyway?). C# does this naïvely IIRC.

@nikomatsakis
Copy link
Contributor Author

@arielb1 I don't think changing the order of assignment for receiver is a
good idea. Is your goal here to overcome the "nested method call" borrowck
errors?

On Fri, Sep 11, 2015 at 4:48 PM, arielb1 notifications@github.com wrote:

@nikomatsakis https://github.com/nikomatsakis (replying here because
discourse refuses to work)

My feeling is that we should keep the evaluation order LTR where possible,
but overall it should be > relatively easy to predict. Thus complex
orderings that "mostly" preserve LTR seems (to me) to be > overall less
good than just saying "assignments are RTL".

I don't really have a problem with a "typically LTR, receiver and
assignment RHS last" ordering, but @eddyb https://github.com/eddyb is
rather surprised by it. However, it may just be that my rules just push the
surprisingness into subtler cases.

The interaction with overloaded operators is also an important question.
It is not clear to me that self.balance -= x and self.balance =
self.balance - x should always be equivalent. They are certainly distinct
with overloaded operators, since one desugars into an &mut self operator
(or at least that is the proposed desugaring). I think we should strive
that the builtin operations can be modeled using overloaded versions, which
would imply that self.balance -= foo() will "read" the current value of
self.balance only after foo() is evaluated (unlike self.balance =
self.balance - foo()). Subtle, but they you go. (I imagine that all
languages which desugar -= specially can encounter a similar situation.)

You can't have the trivial desugaring anyway - self.balance =
self.balance - x calls both Deref and DerefMut on Self. With my rules it (self.balance
-= x) should behave the same as self.balance.sub_by(x), except the
adjustment rules are different (and aren't they different anyway?). C# does
this naïvely IIRC.


Reply to this email directly or view it on GitHub
#28160 (comment).

@rust-highfive rust-highfive added P-medium Medium priority and removed I-nominated I-needs-decision Issue: In need of a decision. labels Oct 15, 2015
@bstrie bstrie added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Oct 24, 2015
@bstrie
Copy link
Contributor

bstrie commented Oct 24, 2015

Given that commenters in here have demonstrated undefined behavior in here, I'm tagging this with I-soundness.

Personally I'd like to keep the rules as simple as possible even if it means a trivial amount of breakage (which is permissible from fixing soundness flaws). If the operators can be desugared to method calls as you'd expect everywhere else, that would be ideal.

@arielb1
Copy link
Contributor

arielb1 commented Oct 24, 2015

@bstrie

This will not be unsound with the MIR, just cause more compile-time errors.

@bstrie
Copy link
Contributor

bstrie commented Oct 24, 2015

@arielb1 , good to hear, but I'd prefer not to remove the label until we start trans-ing from MIR, which won't be for a while yet. :)

@nrc nrc added the A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html label Nov 19, 2015
@shepmaster
Copy link
Member

Now that MIR presumably has had to deal with this, what was the decision?

@nikomatsakis
Copy link
Contributor Author

Actually this has not been formally decided. (I mean, we've always had the impl doing something in any case, and it still does -- but is it the right thing?) I'm pretty dubious about changing the order of execution just to allow for fewer errors though -- I think we should stick to left-to-right whenever possible (a = b seems to be a prominent exception, since there we evaluate b first and then a). I've been meaning for some time to try and write-up some thoughts on this matter and in particular on how we can extend borrowck to avoid some of the errors that we currently see from nested method calls and the like.

@eddyb
Copy link
Member

eddyb commented Jul 1, 2016

@nikomatsakis Huh, I'm very surprised *a() = b() is evaluated RTL in MIR, although I've lost track of the conversations on the matter.
We had a discussion on IRC, related to @rjw245's draft report where they found that *mutex.lock() += expensive_computation(); would keep the mutex locked for the whole duration of the RHS call - apparently it seems with MIR that is not the case?

Only overloaded augmented assignment seems to have LTR ordering in MIR.

@eddyb
Copy link
Member

eddyb commented Jul 1, 2016

@nikomatsakis If I had to give one reason for evaluating assignments LTR it'd be that it enables RVO.
Otherwise there's a necessary copy if the functions can't be inlined or have any side-effects at all.

@strega-nil
Copy link
Contributor

strega-nil commented Jul 1, 2016

@nikomatsakis I agree with @eddyb: LTR is better. It's how the rest of the language works, so it's less confusing.

Also, I'd like it if a += b was equivalent to AddAssign::add_assign(&mut a, b). This means, probably, that a += [expr], where [expr] includes a, would not be allowed. This is obviously not ideal, but I can't think of a much better way...

@nikomatsakis
Copy link
Contributor Author

@cristicbz
Copy link
Contributor

Putting all the examples into the playground now suggests soundness issues have been resolved. Maybe remove the label, since now it's just a matter of figuring out what the right execution order is?

https://is.gd/2xjOGo (prints 5 rather than the uninitialised memory it did previously)
https://is.gd/02GpcD (prints 3 4 3, so I think it's fixed)
https://is.gd/JLN7M6 (doesn't compile now)

@nikomatsakis nikomatsakis removed the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Mar 2, 2017
@nikomatsakis
Copy link
Contributor Author

I'm not sure that we have much leeway to change this at this point anyhow. Maybe a little. =)

@nikomatsakis
Copy link
Contributor Author

It could use someone to kind of draw together all the considerations and make a summary of current status, at minimum.

@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Jul 22, 2017
@Centril
Copy link
Contributor

Centril commented Nov 10, 2018

Triage, @nikomatsakis any updates?

@RalfJung
Copy link
Member

@nikomatsakis @eddyb are there any updates on this?

@eddyb
Copy link
Member

eddyb commented Apr 14, 2020

I thought this has been resolved forever ago, am surprised this is still open.

@nikomatsakis
Copy link
Contributor Author

I'm going to just close the issue. We're not going to change the MIR we generate now in any major way. If in the process of looking at some of the issues around defining the semantics of MIR we make tweaks here, we can address those individually.

@RalfJung
Copy link
Member

If in the process of looking at some of the issues around defining the semantics of MIR we make tweaks here, we can address those individually.

Note that this here is not a question of MIR semantics but of surface Rust semantics -- so MIR semantics discussion seems entirely orthogonal to me.

@nikomatsakis
Copy link
Contributor Author

nikomatsakis commented Apr 14, 2020

Yes, I know. What I meant was: if we uncover some kind of problem with the current lowering in the course of that work, we might consider a specific proposal, but it seems obvious that we don't want to be altering the observed semantics of running code at this juncture absent a very concrete proposal and strong motivation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. P-medium Medium priority T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests