Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MIR] Reconsider invoked function return value strategy #32105

Closed
nagisa opened this issue Mar 7, 2016 · 36 comments
Closed

[MIR] Reconsider invoked function return value strategy #32105

nagisa opened this issue Mar 7, 2016 · 36 comments
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html

Comments

@nagisa
Copy link
Member

nagisa commented Mar 7, 2016

Currently in MIR function calls look like this:

destination = function(...) -> [...]

where destination is a lvalue of some sort. However this assignment is a little bit troublesome where under-the-covers we need to copy the return value and zero-fill the arguments after the call finishes.

LLVM has two kind of cals invoke and call. call is a regular instruction, whereas invoke is a basic block terminator. There’s no problem with the call instruction, because we can always produce a bunch of post-call code after call and then simply br into the destination block. invoke, however, does not give us that kind of freedom and we must branch into something right after the call. (remember, we still must to copy the return value and zero-fill the arguments!).

Previously we used to generate an intermediate block and translate the copy into that, but it is problematic and gets even more-so with zeroing considered. Lately we’ve moved to translating drops/copies straight into the target block (the at_start approach) – it turns out this is wrong in its current form, because the target block can easily have more than one predecessor!

The options considered and not considered are:

  • Pre-trans pass to add empty blocks for all invokes like that and just use at_start approach;

    1. this is pretty clean in a sense that it is fully contained within trans and we get to see whole mirmap so we can do the transformation (something that’s not possible generating blocks during trans);
    2. lets us keep the at_start approach which is the cleanest one I’ve thought up so far; but
    3. generating blocks correctly would be not-really-trivial.
  • @nikomatsakis had proposed having function return value as an rvalue which must appear inside the first statement in target block. I.e. something like

    bb1: function(...) -> [bb2]
    bb2: destination = funcret
    

    This seems pretty clean, and we could also zero-out arguments as a part of funcret, but there’s a few issues with this approach that come to mind:

    1. funcret cannot be removed as part of optimisations (special case in optimisation passes);
    2. we’d have to somehow carry the state around trans?

That’s all my thoughts so far. This is pretty important to get fixed, because it causes llvm assertions for some code.

@nagisa nagisa added the A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html label Mar 7, 2016
@nagisa
Copy link
Member Author

nagisa commented Mar 7, 2016

Just for the record, I would not like to return to the long comment approach, because alongside fictional phis it now also has to consider cleanuppad which we use extensively.

@eddyb
Copy link
Member

eddyb commented Mar 7, 2016

Just for the record, I would not like to return to the long comment approach, because alongside fictional phis it now also has to consider cleanuppad which we use extensively.

Just for the record, I had to, during (yet unpushed) MIR work for #32080, just so I can get libcore to generate unbroken LLVM IR.

@Aatch
Copy link
Contributor

Aatch commented Mar 8, 2016

Your description of the issue with the current at_start seems very similar to critical edges. Critical edges are edges in a CFG that are neither the only edge leaving a block, nor the only edge entering a block.

    bb0
    / \
   /   \
  /     \
bb1    bb2
 |     / \
 |    c   \
 |   /     \
 bb3      bb4

So edge c here is a critical edge as it's not the only edge leaving bb2 and it's not the edge ending bb3. The thing with critical edges is that you have to break them (by inserting new blocks) to add computation along them without affecting the other edges, which seems pretty similar to the issue here.

Identifying and breaking critical edges is a known problem, so it shouldn't be too hard to find algorithms for doing both. I don't know enough about what is going on here to know the best overall strategy though.

@nagisa
Copy link
Member Author

nagisa commented Mar 8, 2016

@Aatch thanks for the input. I’ve thought about your graph more and came to conclusion that we may potentially have trouble in a more general case of target block having more than a single predecessor.

Namely something like

bb0(Drop -> bb2)     bb1(Drop -> bb2)
      \                     /
        \                 /
             bb2(...)

Is enough to make at_start go wrong, regardless of how many successors bb0 and bb1 actually have.

@arielb1
Copy link
Contributor

arielb1 commented Mar 8, 2016

@nagisa

I think that forcing the target of an invoke to be a basic block to have a single source (either in MIR construction or in a post-processing pass) would be the best option. This is a simple invariant, and funcret would break RVO.

Why would you use at_start with single-successor Drop?

@arielb1
Copy link
Contributor

arielb1 commented Mar 8, 2016

cleanuppad will stop being a problem when we get non-zeroing drop - we can mark the arguments as dropped before the call - it's not like we have any asynchronous exceptions to be afraid of. Then we only have to copy the destination and set its drop-flag if needed.

@nikomatsakis
Copy link
Contributor

OK, I see a variety of options here, most of which @nagisa summarized. These options are all basically the same, but they build on one another in some sense. The key point is that the edge that leads out of an invoke is special. But in each case we are moving this "specialness" further back:

  1. Handle this in MIR trans by making a basic block for invoke edges. The transformation to basic blocks is then actually fairly simple: there is an LLVM basic block for every MIR block AND every MIR edge. However, we optimize out the edge blocks the vast majority of time; we only include them for those cases where we with to insert code on the edge (which is where we NOW call at_start). Here the specialness of the edge is specific to trans.
  2. Same as Option 1, but do this "edge-splitting" transformation in the MIR before we enter trans. Here the specialness of the edge is specific to "LIR" (that conceptual variant of MIR where we have performed various optimizations post type-checking).
  3. Same as Option 2, but insert an extra block for edges that we may need to split, and make sure we never strip this block during optimizations and so forth, even if it is empty. Here the specialness of that edge is an invariant we maintain on MIR.
  4. Modify Call and have a special rvalue. Here the specialness of the edge is recognized by the structure of the MIR itself. This is essentially the same as Option 3, really, except that we would never strip the "edge block" as empty because it would never be empty, it would have the special rvalue assignment.

My opinion is roughly that we should go for either 1 or 4. The others feel like silly compromises. The reason to go for Option 4 is basically that trans is not the only piece of code that cares about this edge being a bit special -- dataflow needs to understand that too, since the initialization of the target occurs only if there is no unwinding. So if we adopt Option 4, then this fact is embedded in the MIR, and both trans/dataflow profit. Otherwise, though, it is something they both have to hardcode. That said, the hardcoding may in fact be very simple to achieve, so maybe that's fine!

@nikomatsakis
Copy link
Contributor

@arielb1

This is a simple invariant, and funcret would break RVO.

Can you elaborate on this? I do not believe that funcret would break RVO. In particular, every invoke instruction would have (as an invariant) a successor which begins with funcret. Therefore, we can easily find the destination by peeking at the first instruction of our successor block, so there should be no problem in passing this pointer into the function we are calling.

@arielb1
Copy link
Contributor

arielb1 commented Mar 9, 2016

LIR

If we want to represent non-zeroing drop in the MIR, we would need to add MIR "design rules" (at least, explicit/implicit MIR drop flags).

We can easily insert the guard blocks late in the pipeline - that's option 2.

In particular, every invoke instruction would have (as an invariant) a successor which begins with funcret.

That sounds like a non-trivial and hard-to-fix-up invariant. Also, funcret would have a potentially different type on every use, and more implicit annoyances.

@nikomatsakis
Copy link
Contributor

@arielb1

If we want to represent non-zeroing drop in the MIR, we would need to add MIR "design rules" (at least, explicit/implicit MIR drop flags).

My plan was to add explicit DROP flags -- literally boolean temporaries, and then extend DROP to take a "condition" flag (always True at first). Or else make a ConditionalDrop terminator.

This comes back to the general vision of having a conceptual "LIR" which is represented using the same MIR data structures. This LIR would be produced by borrowck (which removes the "zeroing drop" semantics) and then modified by whatever optimizations we choose to do after that.

That sounds like a non-trivial and hard-to-fix-up invariant.

It seems no more or less hard to maintain than "invoke must have a block with a single predecessor". But I'm pretty indifferent really. I think I even lean against the funcret rvalue at this point, until there is more evidence of pain with the current setup. In that case, I lean towards "add a special basic block representing this edge in trans" -- minimally invasive to the rest of MIR.

Also, funcret would have a potentially different type on every use, and more implicit annoyances.

I don't really see why this would be annoying either. Presumably it would be funcret<T> where T is the type of value being returned (or type of function being called).

I guess the bottom line is that funcret still seems harmless to me, but probably not useful enough to go through the trouble of adding.

@nikomatsakis
Copy link
Contributor

It seems no more or less hard to maintain than "invoke must have a block with a single predecessor". But I'm pretty indifferent really. I think I even lean against the funcret rvalue at this point, until there is more evidence of pain with the current setup. In that case, I lean towards "add a special basic block representing this edge in trans" -- minimally invasive to the rest of MIR.

Actually, reading through pnkfelix's recent PR, I'm mildly revising this opinion. Basically: dataflow either wants a successor with one predecessor OR to associate gen/kill sets with the edges themselves. Perhaps the latter is cleanly achievable though.

@arielb1
Copy link
Contributor

arielb1 commented Mar 9, 2016

@nikomatsakis

I thought to use regular if conditionals to handle non-zeroing-drop, along with added temporaries. I am calling this a "design rule" is because the MIR produced by adding the drop flags is still valid MIR (adding another set of drop flags, either on-stack or inline, would be harmless).

It seems no more or less hard to maintain than "invoke must have a block with a single predecessor".

We don't need to maintain it through passes. We can just establish it whenever we want, and the MIR stays valid. OTOH, the funcret is an integral part of MIR and must be maintained through all passes.

@nagisa
Copy link
Member Author

nagisa commented Mar 9, 2016

A problem with funcret that we would end up copying things we return via outpointer. Optimised code probably does not care, but copying data returned via outpointer is potentially many times more expensive than copying of primitive values we do currently in the no-optimisations mode.

(OTOH, we already copy everything around a bunch of times in non-optimized mode regardless, do that probably is not as critical as it sounds)

@arielb1
Copy link
Contributor

arielb1 commented Mar 9, 2016

I think @nikomatsakis's proposal was to treat = funcret as a phi.

@nagisa
Copy link
Member Author

nagisa commented Mar 9, 2016

@arielb1 phis doesn’t help here. Consider this

    ...
    function(....) -> success // uses outptr
success:
    sometmp = funcret

when translating the call you don’t know about which tmp the funcret will end up being “assigned” to, and thus you get to create an implicit alloca and do a copy afterwards. funcret being phi-like doesn’t really help here, unless we’re trying to do some shenanigans of “looking into the future other blocks”.

@arielb1
Copy link
Contributor

arielb1 commented Mar 9, 2016

@nagisa

That's what I meant by "treating it like a phi". It is handled by setting the destination of the preceding assignment, rather than by doing the move.

Note that doing the phi treatment afterwards would be hard to LLVM because of aliasing.

I don't like that sort of thing, of course.

@nikomatsakis
Copy link
Contributor

We don't need to maintain it through passes. We can just establish it whenever we want, and the MIR stays valid. OTOH, the funcret is an integral part of MIR.

Fair enough.

In any case, after reading #32156 some more, I'm feeling pretty good about just special-casing the effect of the CALL terminator in the dataflow code. It's the only terminator that even affects the dataflow, after all, and we already are doing a match to uncover the successors, so it seems simple enough to also propagate the bit there.

So I stand by my original suggestion: let's handle this internally to any pass that cares.

@nikomatsakis
Copy link
Contributor

@arielb1

I thought to use regular if conditionals to handle non-zeroing-drop, along with added temporaries.

Yeah, I considered this. It'd be fine too. The only reason not to do it that I can see is that it will require creating more basic blocks, rather than just modifying the DROP terminator to a CONDITIONAL_DROP. Seems less efficient overall and it's not clear to me that it really makes anything easier. One less terminator I guess. But I'd be happy either way.

@arielb1
Copy link
Contributor

arielb1 commented Mar 9, 2016

I think @nagisa wants a one-to-one correspondence between MIR basic blocks and LLVM basic blocks.

@nikomatsakis
Copy link
Contributor

@arielb1

I think @nagisa wants a one-to-one correspondence between MIR basic blocks and LLVM basic blocks.

Why?

@nagisa
Copy link
Member Author

nagisa commented Mar 9, 2016

I’m not strictly adamant about it having one-to-one correspondence but it makes things considerably simpler when considering the call terminator within trans.

@nikomatsakis
Copy link
Contributor

it makes things considerably simpler when considering the call terminator within trans

How so?

I mean, I don't really object to adding an extra block just before trans. If it makes things cleaner, seems fine. I'm just curious.

@nikomatsakis
Copy link
Contributor

But I guess the more that things leak outside of trans the more tempted I am to try and address this in a more systematic way. :)

@nikomatsakis
Copy link
Contributor

But I guess the more that things leak outside of trans the more tempted I am to try and address this in a more systematic way. :)

(Though if we are inserting this block just before trans, one could quite reasonably and correctly say that this is not "leaking outside" of trans.)

@eddyb
Copy link
Member

eddyb commented Mar 9, 2016

@nikomatsakis I would expect we store the just-before-trans MIR in metadata, though.

@arielb1
Copy link
Contributor

arielb1 commented Mar 9, 2016

I prefer to have a "low-level" MIR that is fairly close to the generated LLVM IR. I think that having all MIR we create have the same data structures and semantics except for "design rules" (safety checks, undefined behaviour) is a nice design (we may want an untyped MIR for some crazy optimizations, but I am not sure that will have an advantage over LLVM).

@nagisa
Copy link
Member Author

nagisa commented Mar 9, 2016

How so?

@nikomatsakis I guess in the end it makes me feel like trans is less hacky and workaround-ish. I really can’t look at the trans-time dummy blocks as the solution, especially if there’s a necessity for a whole page of comment explaining why this is correct given a handful of assumptions at the time.

I would expect we store the just-before-trans MIR in metadata, though.

@eddyb why? We already do some pre-trans analysis passes as part of trans_mir() and the way I see it, pre-trans transformations would go into exact the same location.

@eddyb
Copy link
Member

eddyb commented Mar 9, 2016

@nagisa That is a complete waste of time and processing power, we want to cache optimal MIR.

@Aatch
Copy link
Contributor

Aatch commented Mar 10, 2016

So I've been looking through this issue and trying to figure out what the actual problem is, and it seems to be that for an invoke we need to have stuff that happens after it that happens only after it, but that stuff needs to be a different basic block because of how invoke works in LLVM, right?

The reason why target-block related stuff screws up is because of already-translated code in them, right? Either we mess up the destinations in phi nodes or we break LLVM's phi-first invariant. Well one thing that makes me think of is altering the order we translate MIR blocks. Right now it's effectively random order due to the way it's built, but a reverse postorder traversal should mitigate those issues somewhat. Why? Well a reverse postorder traversal visits all a block's predecessors (except ones via backedges) before that block is visited.

Given @nagisa's example here: #32105 (comment), a post-order traversal would visit both bb0 and bb1 before visiting bb2, meaning that bb2 should be effectively empty (or at least only contain post-call code from predecessor blocks) at this point. As long as we don't inject invariant-breaking code with at_start this should work fine, right? If we need to inject extra blocks along edges we can still do that, as long as we track that we've done so.

@nagisa
Copy link
Member Author

nagisa commented Mar 10, 2016

Į do not think it is always possible to visit a node's predecessors before
the node itself in a graph due to the presence of loops.
On Mar 10, 2016 3:35 AM, "James Miller" notifications@github.com wrote:

So I've been looking through this issue and trying to figure out what the
actual problem is, and it seems to be that for an invoke we need to have
stuff that happens after it that happens only after it, but that stuff
needs to be a different basic block because of how invoke works in LLVM,
right?

The reason why target-block related stuff screws up is because of
already-translated code in them, right? Either we mess up the destinations
in phi nodes or we break LLVM's phi-first invariant. Well one thing that
makes me think of is altering the order we translate MIR blocks. Right now
it's effectively random order due to the way it's built, but a reverse
postorder traversal should mitigate those issues somewhat. Why? Well a
reverse postorder traversal visits all a block's predecessors (except ones
via backedges) before that block is visited.

Given @nagisa https://github.com/nagisa's example here: #32105 (comment)
#32105 (comment),
a post-order traversal would visit both bb0 and bb1 before visiting bb2,
meaning that bb2 should be effectively empty (or at least only contain
post-call code from predecessor blocks) at this point. As long as we don't
inject invariant-breaking code with at_start this should work fine,
right? If we need to inject extra blocks along edges we can still do that,
as long as we track that we've done so.


Reply to this email directly or view it on GitHub
#32105 (comment).

@nikomatsakis
Copy link
Contributor

On Wed, Mar 09, 2016 at 09:50:34PM -0800, Simonas Kazlauskas wrote:

Į do not think it is always possible to visit a node's predecessors before
the node itself in a graph due to the presence of loops.

RPO visits all predecessors before successors, except for backedges.
The outgoing edge from an invoke call should never be a backedge,
because the target of a backedge by definition has more one
predecessor, and we have already said we do not want the target of an
invoke to be a node with more than one predecessor.

I think there is basically no problem handling this in trans, but it
will break the "one LLVM basic block per MIR block" invariant. I don't
care about this personally. It still seems like a very simple
translation to me: one basic block per MIR block, plus a basic block
for some of the edges where we may need to inject code.

That said, as @arielb1 pointed out somewhere or other, we are going to
need a block in which to inject code for other purposes. For example,
when borrowck is rewriting code for dynamic drop, it needs to be able
to insert code to set a stack flag to 1 after the call to invoke. One
could imagine then having one MIR transformation that ensures such a
block exists, and using it twice. (This transformation could indeed be
made more general, as @Aatch suggested, and just split all criticial
edges: if we start doing more optimizations ourselves, we will surely
want this.)

THAT said, I do not think we should do any optimization or
transformations before we run typeck/borrowck -- or at least only very
minimal optimizations. It will only cause us trouble by making MIR
farther away from the user's source, and it also complicates the
authoring of optimizations, because they can't assume LIR, which has
fewer invariants, but must think about keeping the MIR fully typed.

This implies that the borrowck case could be handled in builder,
simply by having the call always create a fresh block which is known
to have only one predecessor.

It seems to me that we've more or less drilled down to the heart of
the issue here and at this point we are going back and forth over a
minor detail.
I think we've all agreed we will not transform MIR
in the way I proposed; we'll keep the destination in the call
terminator, rather than breaking it out into something separate. This
implies that we have to split the edge. We can do it in trans, or as a
pre-pass. The difference to me seems small, and it is something we can
easily refactor later.

My proposal is this: whoever writes a patch first gets to decide. :)

@arielb1
Copy link
Contributor

arielb1 commented Mar 10, 2016

@nikomatsakis

because they can't assume LIR, which has fewer invariants, but must think about keeping the MIR fully typed.

I am not sure that this is a disadvantage. More invariants also improve optimizations.

@nikomatsakis
Copy link
Contributor

@arielb1 perhaps the more important question is whether those optimizations
would simplify or improve borrowck. If we have some optimizations which
require (and maintain) strong invariants, they can also run before the
conceptual "LIR transition point". That doesn't mean they have to run
before borrowck. I guess one could imagine optimizations which want to run
before dynamic drop has been applied, though I don't know of such an
example off hand.

On Thu, Mar 10, 2016 at 12:50 PM, arielb1 notifications@github.com wrote:

@nikomatsakis https://github.com/nikomatsakis

because they can't assume LIR, which has fewer invariants, but must think
about keeping the MIR fully typed.

I am not sure that this is a disadvantage. More invariants also improve
optimizations.


Reply to this email directly or view it on GitHub
#32105 (comment).

@Aatch
Copy link
Contributor

Aatch commented Mar 11, 2016

So it took a while, but using @eddyb's branch and removing the intermediate blocks, this case:

enum State {
    Both,
    Front,
    Back
}

struct Foo<A: Iterator, B: Iterator> {
    state: State,
    a: A,
    b: B
}

impl<A, B> Foo<A, B> 
where A: Iterator, B: Iterator<Item=A::Item>
{
    fn next(&mut self) -> Option<A::Item> {
        match self.state {
            State::Both => match self.a.next() {
                elt @ Some(..) => elt,
                None => {
                    self.state = State::Back;
                    self.b.next()
                }
            },
            State::Front => self.a.next(),
            State::Back => self.b.next(),
        }
    }
}

Hastily adapted from the Chain iterator, this causes an LLVM assertion about an instruction not dominating all it's uses. Specifically, the call self.b.next() in the Both arm.

@Aatch
Copy link
Contributor

Aatch commented Mar 11, 2016

I have a branch in progress that splits critical edges, it's a general transformation, so it might be a little more aggressive than this case needs. It fixes the case I posted above quite nicely.

@nikomatsakis
Copy link
Contributor

I think this is now settled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html
Projects
None yet
Development

No branches or pull requests

5 participants