-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document the current MIR semantics that are clear from existing code #95320
Conversation
I've left a bunch of FIXME's in here. I may try and get to some of them before merging, but they should all be non-critical and can just be added later. The test change has a relevant Zulip conversation Also, I didn't add anything for |
compiler/rustc_middle/src/mir/mod.rs
Outdated
/// pointed to type. | ||
/// | ||
/// **Needs clarification**: What about metadata resulting from dereferencing wide pointers (and | ||
/// possibly from accessing unsized locals - not sure how those work)? That probably deserves to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A place stores an address and an optional meta value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is probably the right thing to have. That being said, I'm going to leave this open until I get a chance to more thoroughly go through all of the documentation and deal with DSTs (or probably someone else should do that because I know very little about it)
compiler/rustc_middle/src/mir/mod.rs
Outdated
/// Not all of these are allowed at every [`MirPhase`] - when this is the case, it's stated below. | ||
/// | ||
/// Computing any rvalue begins by evaluating the places and operands in the rvalue in the order in | ||
/// which they appear. These are then used to produce a "value" - the same kind of value that an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be left undefined in which order theybare evaluated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the ordering question to a "needs clarification" section in an effort to only document things which are already decided.
That being said, I don't think this is a good option. First of all, I'm not sure "order is left indeterminate" implies "any attempt on depending on the order is UB" - it might be the case that we instead get two distinct well-defined possibilities. More importantly though, this doesn't feel like a thing that MIRI can implement, which is concerning.
I also can't produce an example, but I would not be surprised if this is actually required for current Rust, since the evaluation order in Rust is well-defined. I can't get MIR building to produce such MIR, but it would at least imply that the temporaries produced by this are required:
pub fn foo(a: &mut i32) -> i32 {
let derived = &mut *a;
*derived + *a
}
// terribly important that they pass the validator. However, I think other passes might | ||
// still see them, in which case they might be surprised. It would probably be better if we | ||
// didn't put this through the MIR pipeline at all. | ||
if matches!(body.source.instance, InstanceDef::Intrinsic(..) | InstanceDef::Virtual(..)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intrinsics may be codegened in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I think at that point this check should be refined to verify those bodies that do get codegened only. Even better would be if code elsewhere got changed to stop passing intrinsics that don't get codegened through the MIR validator. cc @oli-obk
compiler/rustc_middle/src/mir/mod.rs
Outdated
/// executed `StorageDead`? How about two `StorageLive`s without an intervening `StorageDead`? | ||
/// Two `StorageDead`s without an intervening `StorageLive`? LLVM says yes, poison, yes. If the | ||
/// answer to any of these is "no," is breaking that rule UB or is it an error to have a path in | ||
/// the CFG that might do this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Miri should be made to match whatever is decided. It is currently very relaxed.
compiler/rustc_middle/src/mir/mod.rs
Outdated
/// | ||
/// **Needs clarification**: We currently require that the LHS place not overlap with any place | ||
/// read as part of computation of the RHS. This requirement is under discussion in [#68364]. As | ||
/// a part of this discussion, it is also unclear in what order the components are evaluated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think order should be left indetermimate so any attempt on depending on the order is UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see the comment above - basically the same concerns apply here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think leaving evaluation order indeterminate is a bad idea. We should specify some order eventually, and probably it should be left-to-right. (That's what Miri does.)
The disjointness thing can be enforced without leaving the order indeterminate, using the aliasing model instead. That's better because it can be checked by Miri.
@@ -187,7 +248,7 @@ pub enum TerminatorKind<'tcx> { | |||
/// This allows the memory occupied by "by-value" arguments to be | |||
/// reused across function calls without duplicating the contents. | |||
args: Vec<Operand<'tcx>>, | |||
/// Destination for the return value. If some, the call is converging. | |||
/// Destination for the return value. If none, the call necessarily diverges. | |||
destination: Option<(Place<'tcx>, BasicBlock)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must not alias the place given to any Operand::Move passed as argument. All codegen backends depend on this property to allow passing Operand::Move by reference in case of a type not fitting inside registers without having to male a copy on the stack or having to restrict the callee to not mutate the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added in a mention of that in the "needs clarification" section above. Based on the status of the linked issue, it seems that it is still unclear how we are justifying that optimization
cc @RalfJung - want to check that everything written here (of specific concern is the place and operand stuff) is at least compatible with your model of MIR |
compiler/rustc_middle/src/mir/mod.rs
Outdated
/// Assign statements roughly correspond to an assignment in Rust proper (`x = ...`) except | ||
/// without the possibility of dropping the previous value (that must be done separately, if at | ||
/// all). The *exact* way this works is undecided. It probably does something like evaluating | ||
/// the LHS and RHS, and then doing the inverse of a place to value conversion to write the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think "place to value conversion" is a terrible term, though I know it is standard in C. This operation accesses memory! It shouldn't be described as if it was a simple cast.
So I'd say something more along the lines of: it evaluates the LHS and RHS to places, loads a value from the place denoted by the RHS, and then stores that value into the place denoted by the LHS.
But also, this doesn't even do place-to-value conversion? The RHS is evaluated to a value, and that is stored to the place denoted by the LHS.
/// **Needs clarification**: Is it permitted to have two `StorageLive`s without an intervening | ||
/// `StorageDead`? Two `StorageDead`s without an intervening `StorageLive`? LLVM says poison, | ||
/// yes. If the answer to any of these is "no," is breaking that rule UB or is it an error to | ||
/// have a path in the CFG that might do this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think CFG paths are relevant. What's relevant is the actual sequence of statements that are executed when the function runs. There's basically a boolean flag dead-or-alive for each local that starts out 'dead' for locals with annotations, and that is set appropriately by these annotations. Accessing (evaluating the place expression) of a local is UB if it is currently 'dead'.
Actually it's more than that, each StorageLive allows the local to be put at a different location in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think CFG paths are relevant
Well, because of the MaybeStorageLiveLocals
in the validator, this is already the case. Certainly CFG paths should not be relevant in the spec/miri, but the docs here have to also mention what is required by the validator, which might have a stronger opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the storage check currently present in the validator should be considered to be merely a debugging aid. It is possible to write unsafe code that uses local when it does not have storage. With additional optimizations present such code would fail the check. When we get to that point the check should be disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is a good point. With the current setup, we can't turn *r
into x
even if we know that r = &mut x;
, because that might turn UB into an ICE. I'll mention this in the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it, we should probably mention that we often don't want to "remove" UB in MIR optimizations in general, since keeping it visible to the codegen backend is important.
For example, we don't want to lower switchInt [0 => bb3] otherwise: bbUnreachable
to goto bb3
, since that keeps LLVM from knowing about the impossible condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these kinds of things should definitely be documented, but imo not here. I do want to start an organized way of defining "canonicalized MIR" at some point, because I think that will be increasingly important, and this seems like the kind of thing to include there
compiler/rustc_middle/src/mir/mod.rs
Outdated
/// complicated. They are only legal on parent places that are references, pointers, or `Box`. A | ||
/// `Deref` projection begins by creating a value from the parent place, as if by | ||
/// [`Operand::Copy`]. It then dereferences the resulting pointer, creating a place of the | ||
/// pointed to type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this operation can cause UB: the crated place must be dereferencable and suitably aligned, as determined by its type.
compiler/rustc_middle/src/mir/mod.rs
Outdated
/// One possible model that I believe makes sense is that "part of memory" is actually just the | ||
/// address of the beginning of the referred to range of bytes. For sized types, the size of the | ||
/// range is then stored in the type, and for unsized types it's stored (possibly indirectly, | ||
/// through a vtable) in the metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what Miri does, and I think it is the right model.
compiler/rustc_middle/src/mir/mod.rs
Outdated
/// | ||
/// [value-def]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/value-domain.md | ||
/// | ||
/// The most common way to create values is via a place to value conversion. A place to value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah now this is the right place for the rant that I put earlier. ;)
I don't think we should all this a "conversion". That sounds like an as
cast, like just putting something into a different representation. That's not at all what happens though. This is a load from memory, and should be described as that.
@bors r+ |
📌 Commit 8732bf5 has been approved by |
Document the current MIR semantics that are clear from existing code This PR adds documentation to places, operands, rvalues, statementkinds, and terminatorkinds that describes their existing semantics and requirements. In many places the semantics depend on the Rust memory model or other T-Lang decisions - when this is the case, it is just noted as such with links to UCG issues where possible. I'm hopeful that none of the documentation added here can be used to justify optimizations that depend on the memory model. The documentation for places and operands probably comes closest to running afoul of this - if people think that it cannot be merged as is, it can definitely also be taken out. The goal here is to only document parts of MIR that seem to be decided already, or are at least depended on by existing code. That leaves quite a number of open questions - those are marked as "needs clarification." I'm not sure what to do with those in this PR - we obviously can't decide all these questions here. Should I just leave them in as is? Take them out? Keep them in but as `//` instead of `///` comments? If this is too big to review at once, I can split this up. r? rust-lang/mir-opt
…askrgr Rollup of 7 pull requests Successful merges: - rust-lang#95320 (Document the current MIR semantics that are clear from existing code) - rust-lang#95722 (pre-push.sh: Use python3 if python is not found) - rust-lang#95881 (Use `to_string` instead of `format!`) - rust-lang#95909 (rustdoc: Reduce allocations in a `theme` function) - rust-lang#95910 (Fix crate_type attribute to not warn on duplicates) - rust-lang#95920 (use `Span::find_ancestor_inside` to get right span in CastCheck) - rust-lang#95936 (Fix a bad error message for `relative paths are not supported in visibilities` error) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
This PR adds documentation to places, operands, rvalues, statementkinds, and terminatorkinds that describes their existing semantics and requirements. In many places the semantics depend on the Rust memory model or other T-Lang decisions - when this is the case, it is just noted as such with links to UCG issues where possible. I'm hopeful that none of the documentation added here can be used to justify optimizations that depend on the memory model. The documentation for places and operands probably comes closest to running afoul of this - if people think that it cannot be merged as is, it can definitely also be taken out.
The goal here is to only document parts of MIR that seem to be decided already, or are at least depended on by existing code. That leaves quite a number of open questions - those are marked as "needs clarification." I'm not sure what to do with those in this PR - we obviously can't decide all these questions here. Should I just leave them in as is? Take them out? Keep them in but as
//
instead of///
comments?If this is too big to review at once, I can split this up.
r? rust-lang/mir-opt