Do move forwarding on MIR #32966

pcwalton · 2016-04-14T20:17:28Z

It'd be cool to rewrite chains of the form "a moves to b, b moves to c" to "a moves to c".

eddyb · 2016-04-28T14:48:58Z

What you describe seems to be "destination (back-)propagation" aka NRVO.
I wanted to do this, but sadly, it requires some form of static drop analysis, otherwise:

let x = f();
if g() {
    let y = x;
}

Turns into:

y = f();
if g() {
    drop(y);
} else {
    forget(y);
}

nikomatsakis · 2016-06-01T23:55:35Z

I'm thinking about suggesting that @scottcarr takes this on. We have to be a bit careful because it interacts with the "memory model" discussion, but I imagine we can start by being conservative, and in particular trying to cleanly separate the part that decides whether or not an intervening write/read is a problem.

nikomatsakis · 2016-06-03T17:04:26Z

We discussed this in the @rust-lang/compiler meeting yesterday. There was general consensus that there are in fact 3 related transformations we might perform:

De-aggregation:
- convert from x = Foo { ... } into direct stores into x.a etc
Destination propagation (NRVO):
- when you have temp = <rvalue>; ... (does not access lvalue) ; lvalue = temp;
- convert to lvalue = <rvalue>; ...
Source propagation:
- when you have temp = <operand>; ... (does not access temp or mutate operand) ; use(temp)
  - convert to ...; use(operand)
- when you have temp = <rvalue>; ...; lvalue = temp
  - convert to ...; lvalue = <rvalue>
  - but this is a bit trickier

The end-goal is to improve certain MIR builder patterns that we have to generate initially but which often introduce unnecessary temporaries, as well as for general optimization purposes. For example, one suboptimal pattern is around aggregates, where an expression like x = Foo { a: <expr-a>, ..., z: ...z } would get translated into MIR like:

tmp0 = <expr-a>;
...
tmp25 = <expr-z>;
x = Foo { a: tmp0, ..., z: tmp25 } // "aggregate rvalue" in MIR

assuming that the expressions don't look at x, as is frequently the case, we would like to eventually have:

x.a = <expr-a>;
...
x.z = <expr-z>;

We might get there in one of two ways. The first might be to "de-aggregate" the aggregate expression, resulting in:

tmp0 = ...
tmp25 = ...
x.a = tmp0
...
x.z = tmp25

this would then be combined with destination propagation to yield the desired result.

Another variation might be to have destination propagation understand aggregates, resulting in:

x.a = <expr-a>;
...
x.z = ...z;
x = Foo { a: x.a, ..., z: x.z} // or remove as it is a noop

and then the final assignment can be recognized as a no-op. Personally the first route seems a bit simpler to me. (Note that we still need to generate the aggregates because they are important to borrowck; but after static analysis is done, they are not needed.)

Another example of intermediate operands that are not needed are the arguments to binary expressions or calls. For example a call like whatever(arg0, 0) yields MIR like:

    bb0: {
        var0 = arg0;                     // we don't need this copy
        tmp0 = var0;                     // this one too (but need source propagation)
        return = whatever(tmp0, const 0u32) -> bb1; // scope 3 at <anon>:2:5: 2:19
    }

This can be significantly optimized to whatever(arg0, const 0u32) through a combination of source and destination propagation. However, it's interesting to note that removing var0 (which is not a temporary) results in a user-defined variable going away -- so we might want to keep track of user-defined variables that were optimized away and track where they went, for debuginfo purposes. This mapping presumably becomes part of the MIR (and may be location-dependent).

Finally there are some classic cases from C++ where destination propagation (aka NRVO) can help:

fn foo() -> [u8; 1024] {
    let x = [0; 1024];
    x // will interact w/ debuginfo
}

Here the goal would be to remove the variables and temporaries and just write the array directly into the return pointer.

nikomatsakis · 2016-06-03T17:05:52Z

One complication: in the formulations above, you can see that I made reference to things like "... (does not access lvalue)". We need to decide what kind of code may legally "access" an lvalue, particularly around usafe/unsafe code. Given that we are actively debating this, the best thing is to try and isolate the code that makes these decisions so we can easily keep it up to date.

For now, some simple rules might be to forego optimization in methods containing unsafe code. Some other simple possibilities for predicates:

in unsafe fns or fns with unsafe blocks, do not optimize (simple)
raw pointers to locals cannot exist without borrows, don't act if there are any borrows (simple)
simple points-to analysis may eventually be handy

Note that in safe code the borrow checker rules can help us as well.

eddyb · 2016-06-03T18:10:26Z

My suggestion around quantifying lvalue accesses is that we do not do anything special about unsafe code, but instead rely on the property that a local cannot be accessed indirectly without borrowing it.
In other words, don't try to assume anything about a local once it, or any field path in it (including array indices, excluding deref projections) has been borrowed.

Most cases with moves that I can think of don't involve borrows, and the ones that do have to do with unsafe APIs such as ptr::{read,write}.
Speaking of which, an interesting transformation is this (b = a.take() after inlining):

tmp0 = &mut var0
tmp1 = None
tmp2 = &mut tmp1
tmp3 = tmp0 as *const T
tmp4 = ptr::read(tmp3)
tmp5 = tmp2 as *const T
tmp6 = tmp0 as *mut T
ptr::copy(tmp5, tmp6)
tmp7 = tmp2 as *mut T
ptr::write(tmp7, tmp4)
var1 = tmp1

After expanding intrinsics:

tmp0 = &mut var0
tmp1 = None
tmp2 = &mut tmp1
tmp3 = tmp0 as *const T
tmp4 = *tmp3
tmp5 = tmp2 as *const T
tmp6 = tmp0 as *mut T
*tmp6 = *tmp5
tmp7 = tmp2 as *mut T
*tmp7 = tmp4
var1 = tmp1

With a bit of reaching definition analysis we can make those accesses direct, assuming we're not violating any MIR rules (the LLVM IR would be identical anyway, so this couldn't cause UB without another MIR pass that assumes otherwise):

tmp0 = &mut var0
tmp1 = None
tmp2 = &mut tmp1
tmp3 = tmp0 as *const T
tmp4 = var0
tmp5 = tmp2 as *const T
tmp6 = tmp0 as *mut T
var0 = tmp1
tmp7 = tmp2 as *mut T
tmp1 = tmp4
var1 = tmp1

Now here's the cool part: we might not be able to do much becase of the borrows, but we can eliminate dead rvalues with no side-effects, first the casts:

tmp0 = &mut var0
tmp1 = None
tmp2 = &mut tmp1
tmp4 = var0
var0 = tmp1
tmp1 = tmp4
var1 = tmp1

Then the actual borrows:

tmp1 = None
tmp4 = var0
var0 = tmp1
tmp1 = tmp4
var1 = tmp1

Destination propagation on var0 -> tmp4 -> tmp1 -> var1:

tmp1 = None
var1 = var0
var0 = tmp1

Source propagation on tmp1:

var1 = var0
var0 = None

The last one is the only hard step IMO, although it might not be harder than destination propagation.
All of this, with no alias analysis to speak of, or any consideration for unsafe/memory model semantics, given that no other MIR passes make more specific assumptions around borrows.

nikomatsakis · 2016-06-03T20:01:47Z

If you put the "return an array" example into play you get the following MIR:

    bb0: {
        var0 = [const 0u8; const 1024usize]; // scope 5 at <anon>:2:13: 2:22
        tmp0 = var0;                     // scope 9 at <anon>:3:12: 3:13
        return = tmp0;                   // scope 9 at <anon>:3:12: 3:13
        return;                          // scope 0 at <anon>:1:1: 4:2
    }

If we do the simple case and eliminate the temporary only (which avoids interaction with debuginfo, at least to start) then we'd expect to get:

    bb0: {
        var0 = [const 0u8; const 1024usize]; // scope 5 at <anon>:2:13: 2:22
        return = var0;                     // scope 9 at <anon>:3:12: 3:13
        return;                          // scope 0 at <anon>:1:1: 4:2
    }

scottcarr · 2016-06-17T16:18:02Z

I'm starting to think about implementing what @nikomatsakis called "destination propagation." I made a internals post about it here: https://internals.rust-lang.org/t/mir-move-up-propagation/3610

My current plan is to implement something pretty simple first and later iterate.

Also I didn't read #34164 even though I am commenting after it. Sorry! I will read it soon to see if there is overlap.

scottcarr · 2016-06-23T22:37:14Z

I've written a transform that can identify candidates for destination propagation. It would be nice to have some more examples (preferably in rust).

My branch: https://github.com/scottcarr/rust/tree/move-up-propagation2

steveklabnik · 2017-09-30T16:05:23Z

Triage: not aware of any movement on this issue.

jamesmunns · 2019-04-10T11:57:10Z

Just ran into (the lack of) this, as it made some embedded code use significantly more stack RAM than expected. I have a 1K buffer type that is returned down the call-stack, and ends up using a total of 4-5K maximum stack in the call chain (this is somewhat significant, as the device has a total of 64K of RAM).

@pftbest put together a reasonable minified version of my problem, showing 2x stack usage over what is necessary with a single function call. It might be useful for anyone looking for a test case.

pub fn bar() {
    let x = BigTest::new();
}

pub struct BigTest {
    arr: [u32; 128]
}

impl BigTest {
    pub fn new() -> BigTest {
        BigTest {
            arr: [123; 128],
        }
    }
}

example::bar:
  sub rsp, 520
  lea rdi, [rsp + 8]
  call qword ptr [rip + example::BigTest::new@GOTPCREL]
  add rsp, 520
  ret

.LCPI1_0:
  .long 123
  .long 123
  .long 123
  .long 123
example::BigTest::new:
  push rbx
  sub rsp, 512
  mov rbx, rdi
  movaps xmm0, xmmword ptr [rip + .LCPI1_0]
  movaps xmmword ptr [rsp], xmm0
  movaps xmmword ptr [rsp + 16], xmm0
  movaps xmmword ptr [rsp + 32], xmm0
  movaps xmmword ptr [rsp + 48], xmm0
  movaps xmmword ptr [rsp + 64], xmm0
  movaps xmmword ptr [rsp + 80], xmm0
  movaps xmmword ptr [rsp + 96], xmm0
  movaps xmmword ptr [rsp + 112], xmm0
  movaps xmmword ptr [rsp + 128], xmm0
  movaps xmmword ptr [rsp + 144], xmm0
  movaps xmmword ptr [rsp + 160], xmm0
  movaps xmmword ptr [rsp + 176], xmm0
  movaps xmmword ptr [rsp + 192], xmm0
  movaps xmmword ptr [rsp + 208], xmm0
  movaps xmmword ptr [rsp + 224], xmm0
  movaps xmmword ptr [rsp + 240], xmm0
  movaps xmmword ptr [rsp + 256], xmm0
  movaps xmmword ptr [rsp + 272], xmm0
  movaps xmmword ptr [rsp + 288], xmm0
  movaps xmmword ptr [rsp + 304], xmm0
  movaps xmmword ptr [rsp + 320], xmm0
  movaps xmmword ptr [rsp + 336], xmm0
  movaps xmmword ptr [rsp + 352], xmm0
  movaps xmmword ptr [rsp + 368], xmm0
  movaps xmmword ptr [rsp + 384], xmm0
  movaps xmmword ptr [rsp + 400], xmm0
  movaps xmmword ptr [rsp + 416], xmm0
  movaps xmmword ptr [rsp + 432], xmm0
  movaps xmmword ptr [rsp + 448], xmm0
  movaps xmmword ptr [rsp + 464], xmm0
  movaps xmmword ptr [rsp + 480], xmm0
  movaps xmmword ptr [rsp + 496], xmm0
  mov rsi, rsp
  mov edx, 512
  call qword ptr [rip + memcpy@GOTPCREL]
  mov rax, rbx
  add rsp, 512
  pop rbx
  ret

Link on Godbolt: https://godbolt.org/z/AHzinO

eddyb · 2019-05-01T23:48:54Z

@jamesmunns I can't wait to get back to #47954, hopefully within the coming months.
I still think it's the ticket forward (pretty sure I had tests similar to yours, and more complex ones even).

doctorn · 2020-05-14T20:11:40Z

@jonas-schievink should this be A-mir-nrvo?

pcwalton added I-slow Issue: Problems and improvements with respect to performance of generated code. C-enhancement Category: An issue proposing an enhancement or a PR with one. A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html labels Apr 14, 2016

nikomatsakis added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Jun 3, 2016

pcwalton mentioned this issue Jun 12, 2016

[WIP] [MIR] Generic lattice-based dataflow framework, rebased #34164

Closed

emberian mentioned this issue Jul 21, 2016

implement named return value optimization in the frontend rust-lang/rfcs#788

Closed

pcwalton mentioned this issue Aug 8, 2016

array.push(Foo { ... }) should construct Foo in-place #35531

Open

jrmuizel mentioned this issue Aug 3, 2017

Inefficient codegen for function returning String #34861

Closed

povilasb mentioned this issue Oct 30, 2017

Fix clippy warnings maidsafe-archive/crust#878

Merged

scottmcm mentioned this issue Dec 20, 2017

rustc_mir: implement an "lvalue reuse" optimization (aka destination propagation aka NRVO). #46321

Closed

ecstatic-morse mentioned this issue Jul 7, 2019

Unnecessary copy when constructing arrays from returned arrays? #62446

Open

bwesterb mentioned this issue Jul 7, 2019

Add fixed_result_into to prevent copy RustCrypto/hashes#86

Closed

burdges mentioned this issue Sep 27, 2019

Large structs constructed on stack rust-random/rand#817

Closed

jamesmunns mentioned this issue Oct 12, 2019

Too many memcpys #64301

Closed

jonas-schievink added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-embedded Working group: Embedded systems labels Nov 9, 2019

efenniht mentioned this issue Dec 11, 2019

Tracking issue: missing features of Rust for idiomatic refactoring of Hafnium kaist-cp/hafnium-verification#69

Open

jonas-schievink added the A-mir-opt Area: MIR optimizations label Mar 29, 2020

jonas-schievink added the A-mir-opt-nrvo Fixed by NRVO label May 14, 2020

workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do move forwarding on MIR #32966

Do move forwarding on MIR #32966

pcwalton commented Apr 14, 2016

eddyb commented Apr 28, 2016

nikomatsakis commented Jun 1, 2016

nikomatsakis commented Jun 3, 2016 •

edited

Loading

nikomatsakis commented Jun 3, 2016 •

edited

Loading

eddyb commented Jun 3, 2016

nikomatsakis commented Jun 3, 2016

scottcarr commented Jun 17, 2016

scottcarr commented Jun 23, 2016

steveklabnik commented Sep 30, 2017

jamesmunns commented Apr 10, 2019 •

edited

Loading

eddyb commented May 1, 2019 •

edited

Loading

doctorn commented May 14, 2020

Do move forwarding on MIR #32966

Do move forwarding on MIR #32966

Comments

pcwalton commented Apr 14, 2016

eddyb commented Apr 28, 2016

nikomatsakis commented Jun 1, 2016

nikomatsakis commented Jun 3, 2016 • edited Loading

nikomatsakis commented Jun 3, 2016 • edited Loading

eddyb commented Jun 3, 2016

nikomatsakis commented Jun 3, 2016

scottcarr commented Jun 17, 2016

scottcarr commented Jun 23, 2016

steveklabnik commented Sep 30, 2017

jamesmunns commented Apr 10, 2019 • edited Loading

eddyb commented May 1, 2019 • edited Loading

doctorn commented May 14, 2020

nikomatsakis commented Jun 3, 2016 •

edited

Loading

nikomatsakis commented Jun 3, 2016 •

edited

Loading

jamesmunns commented Apr 10, 2019 •

edited

Loading

eddyb commented May 1, 2019 •

edited

Loading