-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unions 1.2 #1897
Unions 1.2 #1897
Conversation
- `Eq` (but not `PartialEq`), supported for unions with `Eq` fields. | ||
|
||
Since accessing union fields reliably requires extra knowledge, traits trying to | ||
do it (e.g. `PartialEq`) cannot be derived automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean one cannot derive PartialEq
alone? I don't see how can you derive Eq
without deriving PartialEq
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
derive(PartialEq)
needs to actually generate code while derive(Eq)
is just an assertion like derive(Copy)
. Otherwise #1897 (comment)
u = U { a: NoisyDrop }; // Silence | ||
} | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when we write to an inactive field with Drop
, UB due to a read? Unresolved question?
union U {
a: NoisyDrop1,
b: NoisyDrop2,
}
fn main() {
unsafe {
let mut u = U { a: NoisyDrop1 };
u.b = NoisyDrop2; // calls `NoisyDrop2::drop(&mut u.b)` (UB)? or something else?
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calls NoisyDrop2::drop
if u
contains a valid value of NoisyDrop2
, UB due to a read otherwise.
I will add this example to the text.
used to introduce a union declaration. A declaration `fn union() {}` will not | ||
produce such an error. | ||
The key property of unions is that all fields of a union share common storage. | ||
As a result writes to one field of a union can overwrite its other fields, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can overwrite
will overwrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we write into an empty field, e.g. ()
, it will not overwrite anything (#1897 (comment)).
- `Copy`, supported only for unions with `Copy` fields | ||
- `Clone`, supported only for unions implementing `Copy`, which can be cloned | ||
trivially | ||
- `Eq` (but not `PartialEq`), supported for unions with `Eq` fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect Eq
to be derivable just fine for union
that implements PartialEq
manually. This requirement seems overly restrictive to me – and also doesn’t really match behaviour of derive(Eq)
elsewhere IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the wording is bad? This
I would expect Eq to be derivable just fine for union that implements PartialEq manually.
is exactly the case.
more detail). | ||
|
||
Write to a union field can potentially overwrite contents of its other fields. | ||
Write to a union field should not modify any bits outside of the modified field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s the point of this guarantee?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To support this.
(This guarantee was in the original RFC as well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This guarantee only makes sense for specified layouts, then.
u.a = NoisyDrop; // BOOM! | ||
u = U { a: NoisyDrop }; // Silence | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs spec wrt Drop when writen to an inactive fragment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
### Borrow checking | ||
|
||
If struct field is borrowed (mutably or immutably), then the struct as a whole |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/struct/union/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next sentence is about unions, this one is actually about structs.
Your
|
|
||
Either of the following: | ||
|
||
- primitive scalar type (including references) or `str`, or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't need the , or
at the end of each item here since it's a list. I though there was a markdown rendering error at first.
Re the technical details, this is wonderful. I didn't follow the original thread very closely, and had no idea this much static analysis was in the works. There's a lot of "unions are like structs" phrasing however, which makes me worried. Don't get me wrong, I'm perfectly happy with the current syntax of unions as a intentional legacy homage / enabler of basic "copy-paste" FFI, but I fail to see how the semantics of union are at all like structs, except in that they are both "aggregate things" (and so are enums). To anyone learning Rust before C, I think this would be quite confusing. |
This feels... weird, as an RFC. There's a lot of language that makes me feel like it is supposed to be written in a standardese-y sort of way, and then it says stuff like
which... okay, I get what you're trying to go for here, but it's not well specified at all. I'd much rather go the C++ route of saying you can read the common initial subset, but otherwise you get unspecified behavior. |
And I realize most of that is due to the unsafe code guidelines team not having finished very much, but... this feels a bit like overstepping the bounds of guarantees. At most, we should guarantee what C or C++ guarantee, and we may add more guarantees later as part of the unsafe code guidelines team. |
And I realize it may have been in the original, but in the cases where the unsafe {
match u {
MyUnion { f1: 10 } => { println!("ten"); }
MyUnion { f2 } => { println!("{}", f2); }
}
} will always read an invalid member in cases where I'm also not certain of the tagged union case. I'd assume that matchers (conceptually) read the whole value, and only then test whether the thing holds. Therefore, if you match on a tagged union like the following: unsafe {
match v {
Value { tag: I, u: U { i: 0 } } => true,
Value { tag: F, u: U { f: 0.0 } } => true,
_ => false,
}
} In each branch, you're reading the entire thing, and then checking whether tag is |
The section "Active fragment and active field" absolutely squicks me out. This is nowhere near the rules that I would write: There should be one active variant of a union. There are two ways to change the currently active variant; The general case: A specific case, for trivially destructible types: Writes into a non-active variant (like |
If I understand what you mean by that correctly (which I may well not have): making that UB would break well-established use cases for unions. |
@joshtriplett If the current active variant is |
Also, this RFC requires an editor to go over the language used. It's unclear and/or ambiguous in some parts. |
@ubsan I agree about the "trivially destructible" part. However, I think this needs to work: #![feature(untagged_unions)]
struct X {
a: u32,
b: u32,
}
union U {
full: u64,
halves: X
}
fn main() {
let mut value = U { full: 0x0123456789ABCDEF };
unsafe { value.halves.a = 42; }
println!("{:#x} {:#x}", unsafe { value.halves.b }, unsafe { value.full });
} Runnable playground version: https://is.gd/GNHHgL Rough sketch of what makes that safe: every field impls Copy, none implement Drop, nothing reads padding, and all the types involved are "total" (every possible bit value has a safe interpretation without UB). |
@ubsan Both the C and C++ standards allow reading from the "wrong" field of a union as long as the memory contains a valid bit pattern for the type in question. (Whether it does depends on type representation, which is implementation defined, but that's no different from the case in this RFC.) There's no good reason for Rust to be stricter than C on the matter, especially given that the feature's main use case is FFI. The common initial subset rule in C that you allude to is separate, and kind of a mess / not actually implemented by compilers, but has to do with aliasing pointers to the field types in the union, not accesses through the union type itself. Though even there, I can't imagine a world where the Rust unsafe code guidelines end up forbidding pointer transmutes, given the amount of existing code that relies on them. Implied destructor calls should be treated no differently from any other call: valid if the memory has a valid bit pattern for the type being destructed. However, your point about match patterns is well taken. Also, there is one area where this RFC would make stronger guarantees than C: the effect of writes to union fields on unused bits (i.e. bits that correspond to other fields but not the field in question). This RFC says they're unaffected; C says they take unspecified values. It may be worth weakening the guarantee to match C for now. |
Sigh, again I have to correct myself about the C++ standard. It does arguably prohibit reading anything other than the common initial subsequence, a concept it uses in a totally different way from the C standard… Though IMO only arguably. In any case, C explicitly allows it, and in practice it's common for code written in both language to depend on it as one of the "blessed" ways to perform bitcasts (the other is memcpy). |
@comex I'll be honest, I care very little about C. My interest is in C++. Therefore, I write from a C++ mindset. |
@joshtriplett mmmmmmmmhhhhheeehghhgiheghe I can... kinda see how that might be useful, but it at least should be implementation defined. I'm made uncomfortable by it, but I can see the use for things like microsoft's // note: https://msdn.microsoft.com/en-us/library/windows/desktop/aa383713(v=vs.85).aspx |
Specifically for people with C++ mindset I wrote the Type punning section and listed examples that should work and should not be UB. This kind of stuff is used a lot in low-level code (emulation of hardware or interaction with it). |
Discussing stuff like compatible initial sequences and other layout compatibility at type level is the dead end IMO. |
@ubsan Put it another way: The main reason not to allow it would be if a compiler can make optimizations based on treating it as UB, which matters if either (a) it can improve performance of Rust code, or (b) Rust may want to use existing compiler backends that bake in those optimizations. The fact that ~every C++ compiler is also a C compiler, along with the fact that union type punning is common in C++ code too (even if possibly unsanctioned by the standard), strongly suggests that (b) does not apply. (a) is dubious to start with for a FFI- and low-level-focused feature, and is even less likely given that the C standard is mostly designed to maximize optimization potential (with exceptions that don't apply here). Further, as I said, allowing type punning through unions pretty clearly seems to be 'easier' than allowing it through pointer transmutes, yet the latter are effectively guaranteed in Rust at this point due to existing code depending on them. I suppose another reason to forbid it could be to allow sanitizers to complain about it, which could help code that uses unions for general storage purposes (not FFI) and doesn't intentionally type pun. But for that purpose, accesses of inactive variants should be prohibited altogether, without exceptions for initial common subsequences and such; and such a rule should be possible to opt out of, or more probably should be opt-in. |
“Trivial destructible” relies on
All point to that the restriction will be dropped if the implementation is fixed. I think we should spell out the assumption that |
Type punning through unions has to be legal enough for stuff like |
Unsafety in pattern matching, unlike unsafety in expressions, is a new thing (unions are recently implemeted and unsafe borrowing of packed struct fields is still unimplemented) so it doesn't have any conveniences like |
Mostly limitations of current const eval, it cannot reinterpret memory. |
@kennytm |
IGNORE: Missed the copy impl. |
I feel bad about how little attention this has gotten from the lang team so far. Could use some help, so I'm raising the bat signal: @joshtriplett, @whitequark, @solson, @scottmcm, @Ixrec, any of you have time to catch up on this thread and summarize the status? |
I have absolutely no experience with anything FFI-related or union-related, but here goes nothing... With regards to making type punning legal, the high-level design presented here seems like the correct one. @joshtriplett's and @retep998's comments make it pretty clear that we need to "make unions like The problems I see are:
To expand on 2 a bit:
Currently, I think the type punning issue would be better off as its own self-contained RFC. In particular, I really liked the "Requirements and desirable properties" section of this PR, but the fact that it's 2/3rds of the way down means that a lot of the RFC feels unmotivated or doesn't make much sense until you get there. I'd rather see an RFC where "Motivation" describes the LARGE_INTEGER use case, then "Guide-Level Explanation" is mostly the same text as "Requirements and desirable properties", then "Reference-Level Explanation" lists exactly the requirements/guarantees we're adding to support these use cases and nothing else. Right now I think I know what those requirements are, but I'm not entirely sure because they're scattered throughout the RFC, and I'm not at all clear on whether they're supposed to apply to all unions or just #[repr(C)] unions. If we do that, the smaller changes like allowing empty unions, .. syntax, making trivially-destructible assignments safe, the clarification that any union field becoming (un)initialized makes all sibling fields (un)initialized, and the clarification that dynamically-sized types are not allowed in unions (those are all the smaller changes in this PR I'm confident are intentional) seem like they'd only be a few dozen lines in total, so we can probably handle that all of that as a single amendment. As far as I know, the only objection to any of them is the semver compatibility hazard pointed out on the union tracking issue, which seems like something we can hash out here. Smaller questions, some of which may become irrelevant if we do break this up:
|
I feel bad about how little attention this RFC has gotten from me in the recent months. I still need to incorporate the feedback given so far and address unresolved questions. |
@petrochenkov Honestly, I do feel like the feedback about this feeling like a full rewrite in a way that makes it hard to compare to the original is accurate. It's difficult to walk through this RFC and figure out what has changed, what is the same but clarified (e.g. things you discovered while implementing, or nuances that came up later), and what is just rewritten/reworded without any clear rationale for doing so. I feel like those are three independent goals, and doing them all in one giant change/rewrite makes it a challenge to review. I do absolutely think there's value in clarifying/strengthening the specification to match the code; that's one goal. I also think there's value in adding additional union features that haven't yet been implemented; that seems like it should be a separate step, and reviewed separately. |
I'm nominating this for discussion in the @rust-lang/lang meeting. I propose that we assign someone to do a deep read of this document and summarize the pros/cons. @petrochenkov, I'm sorry for the lack of attention this RFC has gotten. If it helps, I think the primary problem is that you did too good of a job! I printed it out once and found that it was going to take me some time to digest and never got back to it. |
@rfcbot fcp postpone We talked this over in the @rust-lang/lang meeting, and the sense was that this RFC is a great start, but it's been long enough that we ought to revisit the topic. In particular, the presentation of the RFC as edits and so forth makes it hard to understand precisely what is being proposed, so it might make sense to try and write a "from scratch" version that lays out the underlying philosophical approach to unions (about which there was some debate, iirc) and then digs more down into the details. |
Team member @nikomatsakis has proposed to postpone this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
This text is mostly "from scratch" already, but I guess I'll just drop all the fragments from the old text and make it a new numbered RFC. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
2 similar comments
🔔 This is now entering its final comment period, as per the review above. 🔔 |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
Oh wow, I had no idea there's a discussion about type punning happening for, like, a year now! This is clearly unsafe-code-guidelines related but somehow it didn't appear on my radar, sorry for that. My two cents: AFAIK, union type punning restrictions in C(++) are closely tied to type-based alias analysis. Essentially, there are two ways to get a "bad" pointer (pointing to data "of the wrong type") In C: Pointer manipulations (like casting or arithmetic), and unions. Both have to be restricted to make type-based alias analysis legal. Rust doesn't (want to) use TBAA, so we shouldn't need any magic type punning rules. |
@RalfJung that seems entirely reasonable to me - although it's more equivalent to |
The final comment period is now complete. |
Closing as postponed. Thanks again @petrochenkov for your extensive work here; we look forward to a revised RFC! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix error. I trust you'll live up to your current expectations.
If anyone finds an error in it, please let me know!
The existing implementation, guarantees and tradeoffs are described in more detail.
Future directions are outlined.
Few corner cases are permitted (empty unions, union patterns with zero fields and
..
).This RFC also provides documentation necessary for stabilization of the feature.
The "Overview" section can be copy-pasted into the book and "Detailed design" into the reference.
Rendered