-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantics of MIR assignments, around aliasing, ordering, and primitives. #68364
Comments
@eddyb I can't quite tell if there is a proposal here. I think you're proposing that we alter MIR construction for some kinds of rvalues to avoid the temporary? |
@nikomatsakis Sorry, I didn't know how to phrase it but... The only immediate proposal here in terms of changing the MIR would be that split of But the discussion I want to start is on the actual semantics, maybe also diving into how explicit we can make the purity, ordering, etc. of various MIR concepts. EDIT: maybe I should've focused more on how |
I agree. We should make sure Miri actually correctly catches this in all cases. We are doing the right thing for "big" copies, but Miri has some optimizations for scalars where we might fail to check things properly. There even is a FIXME for that. ;)
That's just an optimization in Miri though, not part of the spec. MIR values, the simple wayLet us ignore types for a minute. Conceptually, if I were to formally specify MIR, I'd say that an In the special case you considered, In Miri, we don't want to carry around these temporary sequences of bytes, that's why Miri's typed MIR valuesReality is more complex than that, though -- for example, the spec above fails to explain that padding gets "reset" to When evaluating a |
Eagerly loading is unavoidable in codegen, and I wasn't thinking of miri when I wrote that.
This is the same in codegen, for anything you can't eagerly load.
But... there is no place to keep an arbitrary sequence of bytes other than memory. |
codegen should then better make sure that what it does is indistinguishable from the spec I proposed. :) I think it would be a big bad mess to have to deal with that distinction of "small" and "big" values in the spec. And I think that indeed this is indistinguishable for UB-free programs, because of the requirement that the two regions of memory must not overlap. If this wasn't the case, it would not be correct for Miri to perform this internal optimization.
We are writing a spec. Math has all the space we could want. :D |
Sure, but it sounded like an argument for not requiring "no overlap", because if you can read all of the What you want spec-wise, IMO, if you're going down that route, is that |
The argument for not allowing overlap is to give implementations (codegen, Miri) more freedom. But at the same time the spec should be as simple as possible to make reasoning as simple as we can make it.
I am not sure what you have in mind, but it sounds a lot more complicated than even what I proposed.^^ I'd rather separate concerns. There is no reason that |
The spec is not realistically implementable without that "freedom", so I can't think of it as "a nice thing on top". Alright, "value computation" doesn't have to mention In terms of being able to reason about possible optimizations/lowerings of MIR constructs, I highly dislike the possibility of more than one Ignoring unsizing coercions (which we could make its own statement or rely on a shim etc.), the only situations which ever need to require non-overlap, involve bulk copying of data, either to create an aggregate, or just between two And in my experience, optimizing away bulk copies, even if only ones between locals (let alone anything involving alias analysis), is fundamentally much harder than anything SSA-like over primitives, because there is no way to reason about any one past value, without reintroducing more of the bulk copies you're trying to remove. I mean, it's literally graph coloring, heh. So you can see why I don't want that kind of NP "conflict reasoning" where avoidable. Or maybe MIR suffers from the LLVM problem of being too high-level for its own good, just at a different level, and/or I shouldn't care about non- |
Alright, talking to @RalfJung in PM a bit made me realize there is a path forward in terms of making MIR more optimization-friendly without complicating the spec. In terms of having "two MIRs", there's more than one way:
So for example, if we wanted to start using DAGs for pure operations involving primitives, we might have something like this in "rustc MIR": x y
╭─┴─╮ ╭─┴─╮
╰(*)╯ ╰(*)╯
╰─(+)─╯
z = ╯ (I could name the nodes, but I wanted to emphasize the DAG nature instead) That would expand (or "serialize") into this "spec MIR": tmp0 = x
tmp1 = Mul(tmp0, tmp0)
tmp2 = y
tmp3 = Mul(tmp2, tmp2)
tmp4 = Add(tmp1, tmp3)
z = tmp4 (6 Note that even if you had rustc use MIR that looked like this (which I think it actually does today, or only in some situations), it would produce the same codegen, because of all the Am I understanding your position on this correctly, @RalfJung? |
Let me see if I can rephrase things to make sure I'm following. First, @eddyb was saying that they wish to be sure that they can copy bytes from the source to the destination without any intermediates in codegen if necessary (i.e., for non-scaler values). One way to do that would be to define the "spec" to carry around a "destination". But @RalfJung proposed instead that we impose the rule that the source/destination cannot overlap and define the spec in terms of loading a value, and then show that this permits the implementation the freedom to do the more optimal thing. Next the question was raised whether we want more than one internal MIR, or if we just want to have "rustc MIR" so long as it has a canonical conversion to "spec MIR". It seems like the latter is a pretty standard construct and quite reasonable. One caveat is that I could also imagine that we might want to have MIR construction generate things (e.g., the false edges that borrowck relies on) that we eventually strip out or compile down to simpler things (but perhaps after borrowck). We already do this for If those things wind up being important to "spec MIR", of course, and we wish for miri to view them, then we either have to retain them throughout optimizations or to execute miri on "pre-optimization" code. |
Not really... I am not saying the compiler has to make any analysis like that, all I am saying is that we should give it the freedom to do so. I think it is a mistake to make the spec "maximally tight" wrt any given implementation. We should leave more freedom where it makes sense so that we can adjust our implementation strategy in the future, or add more optimizations.
There are no Places in Operands -- as I (think I) said, that's just a Miri implementation detail. Well, actually, in a "really proper" interpreter
These are good points. I agree that adding the "overlap" rule as something like an afterthought is ugly, and I would prefer if it would arise more naturally. I just have not found a way to do that, given my constraints of also keeping the evaluation of places and values independent of each other.
So, basically "C-style sequencing points"-light? I am not sure if that's really less ugly than the "overlap" rule...
Yes that sounds very reasonable.
Agreed. :)
I... think so. I am not entirely sure what your proposed lowering from "rustc MIR" to "spec MIR" is relative to the "overlap" rule, but I agree with your high-level statements and that is always a good start. :D |
@eddyb as a follow-up to "it would be better for non-overlap to arise naturally": One alternative I thought of is to define But as far as I can see this does not naturally give rise to a non-overlap rule either, and it fails to explain why padding is "lost" on typed copies and why copying value that violate the validity invariant is UB, so I ditched this in favor of the simpler, first-order approach of directly representing the value. You seem to think that this "destination-passing style" solves the overlap problem, could you explain how? The evaluation of places and operands would still perform a particular sequence of memory operations in a particular order, there is no UB if any of these operations overlap. |
To be clear, I meant "syntactically", in MIR, So to give a perhaps more explicit example, I would think that a borrow ( That's probably less relevant given all the other stuff that has been discussed since, but I wanted to clarify that I wasn't talking about the dynamic semantics.
Glad that proposal makes sense! (even if it's less necessary now that I grasp the overall picture better) I agree there is a potential defined outcome of overlaps (after all, Something we could do is use To answer your question more directly: even with destination-passing, if everything is defined in terms of "byte sequences", spec-wise something would probably have to say "destination can't overlap with any operands" or "first thing we do is write
I was hoping to get that across with the example in #68364 (comment), but here it is: "expand constructs (which don't want to take advantage of overlap), into multiple statements, mentioning at most one non-temporary |
I am not just thinking of a potential outcome. Such a "destination-passing style operand/value" does not have to do the entire copy in a single go -- in fact, considering things like padding, it seems reasonable to assume that the many fields of a struct are all copied separately. That means that even if So this, too, would require some explicit additional ad-hoc clause that determines the "overall memory range" read by the operand/value, and a clause saying that this must not overlap with the destination place. In other words, this is just a ad-hoc and hacky as the variant I proposed (the one you didn't like).
We don't have undef and poison, we have one thing that matches poison (Miri calls it
I don't understand the goal of this -- even if codegen does not want to take advantage of the overlap rule for some constructs, there is no harm in applying the rule anyway. It's not an if-and-only-if rule. |
According to @eddyb, discussion at #71005 (comment) is relevant here. In particular, maybe whatever we use to model "return place must not alias with anything" could also be used to model "left-hand side of assignment must not alias with anything". |
Inspired by #71117 and rust-lang/miri#1330, here is another possible semantics for MIR assignment
This avoids explicitly talking about "overlap" or values having location in memory, i.e., this entirely maintains that the only thing you can do with a place expression is evaluating it to a place, and the only thing you can do with a value expression is evaluating it to an (abstract) value. The fact that Miri avoids explicitly manifesting that abstract value is entirely hidden. If evaluating the value "overlaps" with the place, the fresh tag we added will get removed, which means the write at the end will cause UB. Assignment always terminates, so there is no way to avoid the UB if any overlap happens (thus, we do not need to add protectors). This is in fact eerily similar to formal models of C sequence points that I have seen before.
I think this latest proposal almost achieves that. Evaluating an assignment uses "evaluate the RHS to a value" an opaque step. The step does not happen before anything specific to assignments, but that seems fine to me -- this is still compositional, even if composition is not a simple |
@RalfJung that model seems very elegant |
@RalfJung this is possible, but this model is slightly more restrictive than I think it needs to be. In particular, it doesn't only affect the computed places on the RHS, but also their inputs. Because of this, I'd suggest instead this model:
Specifically, this allows things like |
This is non-compositional. The RHS is a value expression. I am rather opposed to anything that treats this not as an opaque value expression that is being evaluated to a value. In particular, we should not have to syntactically look "into" that expression -- that's just a disaster to reason about. That said, @tmandry points out that we generate things like |
For calls the LHS and RHS have to be disjoint lest we miscompile them with all existing backends. I think @tmandry's example is a bad example. It only happens because of a special case for AddAssign for integers and floats (where it doesn't hurt) and can easily fixed using an extra temporary (which doesn't have any runtime cost with cg_clif and I think also with cg_llvm even without optimizations). |
Yeah, this is a good point. I've thought about this a bunch, and have a new suggestion that I think is compatible with existing code and not too bad to reason about. For rvalues Distinguishing those three rvalues might seem arbitrary, but I'd actually like to argue that it is not. Fundamentally, when we worry about overlap we are concerned with the implementation running into a very particular type of bug: It computes one part of the value first, stores it to the place, and then goes and computes another part of the value which is now incorrect because the store aliased. Importantly, this type of bug is only ever possible if the value can be computed in a piecewise fashion. For example, an implementation might do the copying for a Furthermore, although this suggestion might technically be considered non-compositional, I do not think that this case is particularly interesting or concerning. If we really wanted to, we could recover the compositional nature by splitting the rvalue enum into two parts and having two assign statement kinds to clearly show that the semantics differ. Of course I don't actually think we should do this - right now, we can recover the same information by checking which variant the rvalue has. |
This issue is about MIR assignments. Calls are not MIR assignments, they are a totally separate language construct ( |
I think a "canonical" thing to do here would be to have both kinds of |
Indeed. |
@bjorn3 (sorry for picking on you, I never know who to ask codegen questions to). To the extent that you know, are current implementations of codegen compatible with the above semantics? I would of course go check this myself before I open a PR that suggests documenting these as the semantics |
Cg_ssa and cg_clif both use memcpy for assignments of values not stored as SSA values I believe. Memcpy doesn't allow overlapping. |
The version I'm referring to requires MIR to be non-overlapping for |
In that case I think current codegen is compatible with your proposal. |
The person you're referring to as @tmandry is, in fact, @tmiasko. |
Fix Dest Prop Closes rust-lang#82678, rust-lang#79191 . This was not originally a total re-write of the pass but is has gradually turned into one. Notable changes: 1. Significant improvements to documentation all around. The top of the file has been extended with a more precise argument for soundness. The code should be fairly readable, and I've done my best to add useful comments wherever possible. I would very much like for the bus factor to not be one on this code. 3. Improved handling of conflicts that are not visible in normal dataflow. This was the cause of rust-lang#79191. Handling this correctly requires us to make decision about the semantics and specifically evaluation order of basically all MIR constructs (see specifically rust-lang#68364 rust-lang#71117. The way this is implemented is based on my preferred resolution to these questions around the semantics of assignment statements. 4. Some re-architecting to improve performance. More details below. 5. Possible future improvements to this optimization are documented, and the code is written with the needs of those improvements in mind. The hope is that adding support for more precise analyses will not require a full re-write of this opt, but just localized changes. ### Regarding Performance The previous approach had some performance issues; letting `l` be the number of locals and `s` be the number of statements/terminators, the runtime of the pass was `O(l^2 * s)`, both in theory and in practice. This version is smarter about not calculating unnecessary things and doing more caching. Our runtime is now dominated by one invocation of `MaybeLiveLocals` for each "round," and the number of rounds is less than 5 in over 90% of cases. This means it's linear-ish in practice. r? `@oli-obk` who reviewed the last version of this, but review from anyone else would be more than welcome
In today's MIR, an indirect assignment like
*p = *q;
is similar to, but not exactly the same as:The differences are:
tmp
isn't used elsewhere, allowing codegen to treat it like an SSA value, resulting instore(p, load(q))
, which is also what*p = *q
codegens totmp
must be in memory*p = *q;
is UB (AFAIK) ifp..p+size
overlapsq..q+size
For the purposes of this discussion, a primitive is:
bool
,char
, integer, float, or pointer/reference)Scalar pairs likely also should/need to be included, due to how easy they are to support in any code that already handles scalars, and also due to their use in wide pointers/references.
What's interesting about primitives, though, is that some kinds of
Rvalue
s (the RHS of the assignment) always produce primitive values, because they're primitive operations.The
Rvalue
variants which are always primitive, today, are:Ref
(&T
/&mut T
- may become dependent on custom DSTs in the future)AddressOf
(*const T
/*mut T
- may become dependent on custom DSTs in the future)Len
(usize
)Cast
, other than unsizing (scalar)BinaryOp
,UnaryOp
(scalar, or maybe also vector)CheckedBinaryOp
(pair of integer andbool
- only if we consider scalar pairs to be primitive)NullaryOp(SizeOf)
(usize
)NullaryOp(Box)
(Box<T>
)Discriminant
(integer)Which leaves these variants as potentially relying on memory operations to write the result:
Use
(any type, one copy)Repeat
([T; N]
,N
copies)Cast
, specifically unsizing (any type implementingCoerceUnsized
, per-field copies)Aggregate
(any ADT, per-field copies)If we want to remain conservative, we could ignore types for the latter, and just assume that the destination of the assignment cannot overlap any memory read in the
Operand
s of theRvalue
.We could even cement the distinction by moving the always-primitive operations into a new
PrimOp
enum, and/or move the otherRvalue
s to their own statements (e.g. introduceCopy(*p, *q)
), but that's more aesthetic than anything for the time being.At the very least, we should probably document these differences, and make sure that
miri
only allows overlaps in the cases we don't consider UB (either abstractly, or due to our choice of codegen).Another interesting aspect of the always-primitive ops is that they're "pure functions" of their operands (other than
NullaryOp(Box)
, I suppose, but that could be replaced with a call to a lang item returning aBox<MaybeUninit<T>>
, instead).This means that if we wanted to, we could replace some of the intermediary locals with an
PrimOp
DAG, a bit like SSA but without φ (phi
) nodes or a strict instruction stream.All of the necessary ordering would still happen at the statement level (so this is nowhere near as complex as VSDG), but we might see some benefits in scalar-heavy code.
Asides aside, cc @RalfJung @rust-lang/wg-mir-opt
The latest proposal is here
The text was updated successfully, but these errors were encountered: