-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MIR] Reconsider invoked function return value strategy #32105
Comments
Just for the record, I would not like to return to the long comment approach, because alongside fictional phis it now also has to consider |
Just for the record, I had to, during (yet unpushed) MIR work for #32080, just so I can get libcore to generate unbroken LLVM IR. |
Your description of the issue with the current
So edge Identifying and breaking critical edges is a known problem, so it shouldn't be too hard to find algorithms for doing both. I don't know enough about what is going on here to know the best overall strategy though. |
@Aatch thanks for the input. I’ve thought about your graph more and came to conclusion that we may potentially have trouble in a more general case of target block having more than a single predecessor. Namely something like
Is enough to make |
I think that forcing the target of an invoke to be a basic block to have a single source (either in MIR construction or in a post-processing pass) would be the best option. This is a simple invariant, and Why would you use |
|
OK, I see a variety of options here, most of which @nagisa summarized. These options are all basically the same, but they build on one another in some sense. The key point is that the edge that leads out of an invoke is special. But in each case we are moving this "specialness" further back:
My opinion is roughly that we should go for either 1 or 4. The others feel like silly compromises. The reason to go for Option 4 is basically that trans is not the only piece of code that cares about this edge being a bit special -- dataflow needs to understand that too, since the initialization of the target occurs only if there is no unwinding. So if we adopt Option 4, then this fact is embedded in the MIR, and both trans/dataflow profit. Otherwise, though, it is something they both have to hardcode. That said, the hardcoding may in fact be very simple to achieve, so maybe that's fine! |
Can you elaborate on this? I do not believe that funcret would break RVO. In particular, every invoke instruction would have (as an invariant) a successor which begins with |
If we want to represent non-zeroing drop in the MIR, we would need to add MIR "design rules" (at least, explicit/implicit MIR drop flags). We can easily insert the guard blocks late in the pipeline - that's option 2.
That sounds like a non-trivial and hard-to-fix-up invariant. Also, |
My plan was to add explicit DROP flags -- literally boolean temporaries, and then extend This comes back to the general vision of having a conceptual "LIR" which is represented using the same MIR data structures. This LIR would be produced by borrowck (which removes the "zeroing drop" semantics) and then modified by whatever optimizations we choose to do after that.
It seems no more or less hard to maintain than "invoke must have a block with a single predecessor". But I'm pretty indifferent really. I think I even lean against the
I don't really see why this would be annoying either. Presumably it would be I guess the bottom line is that funcret still seems harmless to me, but probably not useful enough to go through the trouble of adding. |
Actually, reading through pnkfelix's recent PR, I'm mildly revising this opinion. Basically: dataflow either wants a successor with one predecessor OR to associate gen/kill sets with the edges themselves. Perhaps the latter is cleanly achievable though. |
I thought to use regular
We don't need to maintain it through passes. We can just establish it whenever we want, and the MIR stays valid. OTOH, the |
A problem with funcret that we would end up copying things we return via outpointer. Optimised code probably does not care, but copying data returned via outpointer is potentially many times more expensive than copying of primitive values we do currently in the no-optimisations mode. (OTOH, we already copy everything around a bunch of times in non-optimized mode regardless, do that probably is not as critical as it sounds) |
I think @nikomatsakis's proposal was to treat |
@arielb1 phis doesn’t help here. Consider this
when translating the call you don’t know about which |
That's what I meant by "treating it like a phi". It is handled by setting the destination of the preceding assignment, rather than by doing the move. Note that doing the phi treatment afterwards would be hard to LLVM because of aliasing. I don't like that sort of thing, of course. |
Fair enough. In any case, after reading #32156 some more, I'm feeling pretty good about just special-casing the effect of the So I stand by my original suggestion: let's handle this internally to any pass that cares. |
Yeah, I considered this. It'd be fine too. The only reason not to do it that I can see is that it will require creating more basic blocks, rather than just modifying the DROP terminator to a CONDITIONAL_DROP. Seems less efficient overall and it's not clear to me that it really makes anything easier. One less terminator I guess. But I'd be happy either way. |
I think @nagisa wants a one-to-one correspondence between MIR basic blocks and LLVM basic blocks. |
I’m not strictly adamant about it having one-to-one correspondence but it makes things considerably simpler when considering the call terminator within trans. |
How so? I mean, I don't really object to adding an extra block just before trans. If it makes things cleaner, seems fine. I'm just curious. |
But I guess the more that things leak outside of trans the more tempted I am to try and address this in a more systematic way. :) |
(Though if we are inserting this block just before trans, one could quite reasonably and correctly say that this is not "leaking outside" of trans.) |
@nikomatsakis I would expect we store the just-before-trans MIR in metadata, though. |
I prefer to have a "low-level" MIR that is fairly close to the generated LLVM IR. I think that having all MIR we create have the same data structures and semantics except for "design rules" (safety checks, undefined behaviour) is a nice design (we may want an untyped MIR for some crazy optimizations, but I am not sure that will have an advantage over LLVM). |
@nikomatsakis I guess in the end it makes me feel like trans is less hacky and workaround-ish. I really can’t look at the trans-time dummy blocks as the solution, especially if there’s a necessity for a whole page of comment explaining why this is correct given a handful of assumptions at the time.
@eddyb why? We already do some pre-trans analysis passes as part of |
@nagisa That is a complete waste of time and processing power, we want to cache optimal MIR. |
So I've been looking through this issue and trying to figure out what the actual problem is, and it seems to be that for an The reason why target-block related stuff screws up is because of already-translated code in them, right? Either we mess up the destinations in phi nodes or we break LLVM's phi-first invariant. Well one thing that makes me think of is altering the order we translate MIR blocks. Right now it's effectively random order due to the way it's built, but a reverse postorder traversal should mitigate those issues somewhat. Why? Well a reverse postorder traversal visits all a block's predecessors (except ones via backedges) before that block is visited. Given @nagisa's example here: #32105 (comment), a post-order traversal would visit both |
Į do not think it is always possible to visit a node's predecessors before
|
On Wed, Mar 09, 2016 at 09:50:34PM -0800, Simonas Kazlauskas wrote:
RPO visits all predecessors before successors, except for backedges. I think there is basically no problem handling this in trans, but it That said, as @arielb1 pointed out somewhere or other, we are going to THAT said, I do not think we should do any optimization or This implies that the borrowck case could be handled in builder, It seems to me that we've more or less drilled down to the heart of My proposal is this: whoever writes a patch first gets to decide. :) |
I am not sure that this is a disadvantage. More invariants also improve optimizations. |
@arielb1 perhaps the more important question is whether those optimizations On Thu, Mar 10, 2016 at 12:50 PM, arielb1 notifications@github.com wrote:
|
So it took a while, but using @eddyb's branch and removing the intermediate blocks, this case: enum State {
Both,
Front,
Back
}
struct Foo<A: Iterator, B: Iterator> {
state: State,
a: A,
b: B
}
impl<A, B> Foo<A, B>
where A: Iterator, B: Iterator<Item=A::Item>
{
fn next(&mut self) -> Option<A::Item> {
match self.state {
State::Both => match self.a.next() {
elt @ Some(..) => elt,
None => {
self.state = State::Back;
self.b.next()
}
},
State::Front => self.a.next(),
State::Back => self.b.next(),
}
}
} Hastily adapted from the |
I have a branch in progress that splits critical edges, it's a general transformation, so it might be a little more aggressive than this case needs. It fixes the case I posted above quite nicely. |
I think this is now settled. |
Currently in MIR function calls look like this:
where destination is a lvalue of some sort. However this assignment is a little bit troublesome where under-the-covers we need to copy the return value and zero-fill the arguments after the call finishes.
LLVM has two kind of cals
invoke
andcall
.call
is a regular instruction, whereasinvoke
is a basic block terminator. There’s no problem with thecall
instruction, because we can always produce a bunch of post-call code aftercall
and then simplybr
into the destination block.invoke
, however, does not give us that kind of freedom and we must branch into something right after the call. (remember, we still must to copy the return value and zero-fill the arguments!).Previously we used to generate an intermediate block and translate the copy into that, but it is problematic and gets even more-so with zeroing considered. Lately we’ve moved to translating drops/copies straight into the target block (the
at_start
approach) – it turns out this is wrong in its current form, because the target block can easily have more than one predecessor!The options considered and not considered are:
Pre-trans pass to add empty blocks for all invokes like that and just use
at_start
approach;at_start
approach which is the cleanest one I’ve thought up so far; but@nikomatsakis had proposed having function return value as an rvalue which must appear inside the first statement in target block. I.e. something like
This seems pretty clean, and we could also zero-out arguments as a part of
funcret
, but there’s a few issues with this approach that come to mind:That’s all my thoughts so far. This is pretty important to get fixed, because it causes llvm assertions for some code.
The text was updated successfully, but these errors were encountered: