-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Goal: Accept partial initialization + use of records created via such #54987
Comments
If there's an invariant of the struct fields you can't use one struct field before all of them are initialized. |
I think @leonardo-m 's previous comment is meant to be an argument against accepting the code snippet they quoted. Here are my main responses to that:
|
tagging as NLL-deferred because we know we aren't going to try to implement this as part of the 2018 edition. |
@Centril has an in-progress draft of an RFC here: Centril/rfcs#16 |
Re-triaging for #56754. P-low since any change here deserves a (currently "unwritten" or at least unposted) RFC. |
There are some concerns about the current state of this in #60450 |
How should we handle yielding from a generator while a local variable is partially initialized? e.g. let mut gen = || {
let mut x: (i32, i32);
x.0 = 11;
yield;
x.1 = 22;
} I'd prefer that this wasn't allowed when generators stabilize. It would simplify some layout code and potentially enable more optimizations. It's possible we decide to allow it eventually, but I'd rather not be locked into that decision from the start. Note that I'm fine with allowing this in cases where the local is not stored in the generator (i.e. it is never used after the yield). This is only useful in the case of the second example from the issue description. cc #60889 |
See #63230 (comment) for some discussion of this subject, and in particular the interaction with "uninhabited structs are ZST" (which seems trivial to do if we entirely rule out partial initialization). So, not allowing partial initialization is beneficial not just for generators, but also for type layout in general. |
…ized, r=Centril Make use of possibly uninitialized data [E0381] a hard error This is one of the behaviors we no longer allow in NLL. Since it can lead to undefined behavior, I think it's definitely worth making it a hard error without waiting to turn off migration mode (rust-lang#58781). Closes rust-lang#60450. My ulterior motive here is making it impossible to leave variables partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so tests are included for that. cc rust-lang#54987 --- I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me. r? @Centril cc @cramertj @nikomatsakis @pnkfelix @RalfJung
…ized, r=Centril Make use of possibly uninitialized data [E0381] a hard error This is one of the behaviors we no longer allow in NLL. Since it can lead to undefined behavior, I think it's definitely worth making it a hard error without waiting to turn off migration mode (rust-lang#58781). Closes rust-lang#60450. My ulterior motive here is making it impossible to leave variables partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so tests are included for that. cc rust-lang#54987 --- I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me. r? @Centril cc @cramertj @nikomatsakis @pnkfelix @RalfJung
…ized, r=Centril Make use of possibly uninitialized data [E0381] a hard error This is one of the behaviors we no longer allow in NLL. Since it can lead to undefined behavior, I think it's definitely worth making it a hard error without waiting to turn off migration mode (rust-lang#58781). Closes rust-lang#60450. My ulterior motive here is making it impossible to leave variables partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so tests are included for that. cc rust-lang#54987 --- I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me. r? @Centril cc @cramertj @nikomatsakis @pnkfelix @RalfJung
As @RalfJung himself noted in the linked discussion, the proposal seems at least very difficult for anything that works with "generic MIR + substs" rather than monomorphizing, chiefly miri. One could imagine performing the equivalent work every time a function is entered, but the cost is likely considerable. A similar but even more serious problem I see is about unsafe code that performs partial or piecewise initalization manually. For example, adapting the usual fn construct_pair_inplace(
out: &mut MaybeUninit<(String, !)>,
mk_string: impl FnOnce() -> String,
mk_void: impl FnOnce() -> !,
) {
let ptr = out.as_mut_ptr();
ptr::write(&raw mut (*ptr).0, mk_string());
ptr::write(&raw mut (*ptr).1, mk_void());
} There's no writes to a "sibling of an uninhabited field" visible here, just taking pointers and offsetting them and passing them to other functions (and if you want to consider |
@rkruppe so under a proposal that makes I am not sure if "the proposal" you refer to is "make structs with uninhabited fields ZST", or "make structs with uninhabited fields ZST and still allow partial initialization in safe code". |
Yes, I think it's a problem, otherwise I wouldn't have brought it up. Well, not literally that one piece of code, more the general pattern of piecewise initalization via out pointer (whether If that kind of code suddenly becomes UB as soon as any component of the overall aggregate being initialized is uninhabited, then that would be pretty uncool. Everyone writing such code would need to remember to add subtle additional checks1 (even more subtle than the special casing already required for ZSTs in some code, as uninhabited types are even rarer and more obscure), and if they care about still executing the side effects of the initialization of the inhabited parts, they'll need a lot of extra complexity for that too. I don't think that'll work out, instead we'll probably get a pile of unsound code that will rarely get in contact with uninhabited types but blow up every time it does. Keep in mind that it'll rarely be as obviously involving an uninhabited type as in the above (kind of minimal) example, more commonly it will involve generics or a large (possibly auto-generated) struct/enum. Essentially, "what if one of these types is uninhabited" will have to get a dedicated slot on unsafe code audit checklists. Edit: 1or rule out that any uninhabited types can be involved, but this seems difficult to ensure in most code for reasons outlined above, and in any case still adds an extra burden to those writing that code.
Sorry for being unclear, I meant the latter w.r.t. complexity imposed on miri (and other consumers of generic MIR). |
Now I am confused because the rest of your text applies equally to the former.^^ I think for unsafe code it is okay to put the burden on that code to show that it is not initializing something uninhabited. But maybe I am underestimating the complexity of that. Looking at the generalized version of your example: fn construct_pair_inplace<T, U>(
out: &mut MaybeUninit<(T, U)>,
mk0: impl FnOnce() -> T,
mk1: impl FnOnce() -> U,
) {
let ptr = out.as_mut_ptr();
ptr::write(&raw mut (*ptr).0, mk0());
ptr::write(&raw mut (*ptr).1, mk1());
} I think saying this is UB if either |
Ugh, sorry for all the confusion. I was trying to say "the problem for miri applies only to one of these proposals, the other one regarding unsafe code applies to everything", but code patterns like
(I have misgivings about using this very simplified, if generic, example as running example, but let's go with it for now.) One question to answer is, how can someone rewrite that function to be sound? There's a few ways I can think of, but none of them are good. One could simply bail out if the overall type is a ZST:
but this omits side effects from if is_uninhabited {
mk0(); // eval into a temporary for side effects and/or divergence
} else {
ptr::write(..., mk0());
}
if is_uninhabited {
mk1(); // eval into a temporary for side effects and/or divergence
} else {
ptr::write(..., mk1());
}
// ... and so on for all other fields (I can imagine a variant that doesn't try to preserve all side effects and just preserves divergence behavior, but it'll still need an amount of code linear in the number of fields to figure out which part it needs to call to diverge, assuming the exact kind of divergence matters.) This is... not great. And it's just what programmers would write if they knew to do that. As I said before, I think this is an obscure obligation that most programmers will not get right, or even think to try to handle. It's exactly the kind of pitfalls that makes it unreasonably difficult to write sound unsafe code IMO. I am really unsure whether being able to make uninhabited aggregates into ZSTs is worth this. Does anyone know of real instances of performance/memory problems caused by them not being ZSTs? (Keep in mind that only product types are affected, |
That's a very good point! MIR construction would have to be very smart about this. @tmandry but isn't this also a problem for generators? What happens with something like let x: (String, !) = (String::new(), { yield; panic!() }); In MIR, this looks a lot like partial initialization. So won't it have all the same problems?
So you are saying this function actually is sound under our current semantics? Wow, I think you are right. And it wouldn't be sound any more with new semantics using the following safe client: let mut x = MaybeUninit::uninit();
construct_pair_inplace(&mut x,
|| 0i32,
|| panic!(),
); I think you convinced me that we cannot make these types ZST. |
Worse, since this too can be made generic, there's really nothing MIR construction can do about it (unless we give up on placement entirely and always evaluate subexpressions into isolated temporaries first, but that's unacceptable IMO). It has to be "fixed up" at monomorphization time, with all the problems that entails. |
Today it's not a problem, because of how we emit the MIR (we assign each piece to a temporary, and then build the aggregate all at once). I asked @eddyb if depending on this was too brittle, and they said yes, it might change in the future. So it seems likely that #63230 was just a stopgap and we'll have to be smarter and/or more conservative in the future about the possibility of partial initialization in generators. |
@rkruppe: Can you elaborate on that? The way I see it, this shouldn't be too complicated. Going back to your example user-written code: if is_uninhabited {
mk0(); // eval into a temporary for side effects and/or divergence
} else {
ptr::write(..., mk0());
}
if is_uninhabited {
mk1(); // eval into a temporary for side effects and/or divergence
} else {
ptr::write(..., mk1());
} this is just want we'd want monomorphization/miri to do. Setting aside partial initialization for the moment, we know that the uninhabited type we're 'constructing' can never escape the function (as any code after the creationof uninhabited type is unreachable). For example, when monomorphizing or interpreting: let x: (String, !) = (String::new(), { yield; panic!() }); We can simply write the While this would result in some additional complexity in codegen and Miri, it would both simply and optimize actual Rust code. 'Uninhabited types are zero size' is a much nicer rule than 'Uninhabited types are zero size, except under certain circumstances (despite still being unconstructable).' |
Is there any place we are tracking this specifically for generators? Or is the issue here sufficient?
What you are proposing sounds extremely complicated to me.^^ If that chain of ifs is your proposal for how the MIR would look like, one big issue with that is that it duplicates the initializer expression. Sure, here it is just (FWIW, the test could be just "if size of struct greater 0", because if this is a ZST struct then "copying" the result of the computation into the struct is redundant in any case. We don't need a new "is uninhabited" intrinsic.)
Given that this affects MIR semantics, I think it would also result in additional complexity in any other MIR consumer -- borrowcheck and MIR optimizations, specifically. And making the life of these passes as easy as we can is basically the only reason MIR exists at all. So if they all become too complicated through things like this, MIR failed at its job.
'The size of a struct is the sum of the sizes of its fields and padding' is also a much nicer rule than 'The size of a struct is the sum of the sizes of its fields and padding, except under certain circumstances (despite still being partially initializable).' I would not say 'Uninhabited types are zero size, except under certain circumstances (despite still being unconstructable)' is a rule we currently have, so it seems silly to compare with that. There just is no connection between inhabitedness and size, end of story. So we are trading the 'size of a struct is the sum' rule against the 'uninhabited types are ZST' rule. And note that even with your proposal 'uninhabited types are ZST' will likely not be true. For example, there are proposals to make |
Opened #63616. |
Spawned off of #21232
In the long-term, we want to accept code like this:
We probably also want to start accepting this too:
(But that second example is more debatable. I don't think there's any debate about the first example.)
See #54986 for the short-term goal of rejecting all partial initializations until we can actually use the results you get from partial initialization.
The text was updated successfully, but these errors were encountered: