Don't ignore generator fields in miri #60889

tmandry · 2019-05-16T19:51:35Z

Right now, saved locals in generators are special-cased in miri, because they might not always be initialized.

With #59897 and #60187, we have the framework to fix this: fields are now only put in generator variants where they are live. The problem remains with aggregate fields, however, e.g.

fn gen() {
  let x: (i32, i32);
  x.0 = 5;
  yield;
  dbg!(x.0);
}

In this case x.1 is never initialized. This still presents a special case, because we cannot have a partially-unitialized field in a regular struct or enum type.

One way to remove the special-casing in miri is to wrap the type of every generator field in MaybeUninitialized when we return it from rustc::ty::layout::field(). This required making MaybeUnitialized a lang item, though, and doesn't actually give us any new features.

Eventually, I think we'd like to handle aggregate fields explicitly, so we can sanitize UB in generators.

The text was updated successfully, but these errors were encountered:

tmandry · 2019-05-16T19:52:00Z

cc @RalfJung @Zoxc @eddyb @cramertj @rust-lang/wg-async-await

RalfJung · 2019-05-17T18:26:02Z

Right now, saved locals in generators are special-cased in miri, because they might not always be initialized.

In Miri and in layout code, and for the same reason.

RalfJung · 2019-05-17T18:29:42Z

This required making MaybeUnitialized a lang item, though, and doesn't actually give us any new features.

Well, it gives us a "more correct" layout for generators, for some notion of correctness. Also technically we wouldn't have to do this for every field, if you know which fields can be partially initialized for a particular yield point you can encode that. And that might give us layout optimizations.

nikomatsakis · 2019-06-04T17:22:44Z

Marking as "deferred" from a stabilization perspective.

Generator optimization: Overlap locals that never have storage live at the same time The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization. More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.** To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with. Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout. For the layout computation, we use a simplified approach. 1. Start with the set of locals assigned to only one variant. The rest are disqualified. 2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping. Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone". So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this: You can think of a generator as an enum, where some fields are shared between variants. e.g. ```rust enum Generator { Unresumed, Poisoned, Returned, Suspend0(A, B, X), Suspend1(B), Suspend2(A, Y, Z), } ``` where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them. Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time. --- Depends on: - [x] #59897 Multi-variant layouts for generators - [x] #60840 Preserve local scopes in generator MIR - [x] #61373 Emit StorageDead along unwind paths for generators Before merging: - [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened #60889) - [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location) - [x] Clean up TODO - [x] Fix the layout code to enforce that the same field never moves around in the generator - [x] Add tests for async/await - [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity) - [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at #59897 (comment))

…ddyb Generator optimization: Overlap locals that never have storage live at the same time The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization. More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.** To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with. Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout. For the layout computation, we use a simplified approach. 1. Start with the set of locals assigned to only one variant. The rest are disqualified. 2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping. Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone". So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this: You can think of a generator as an enum, where some fields are shared between variants. e.g. ```rust enum Generator { Unresumed, Poisoned, Returned, Suspend0(A, B, X), Suspend1(B), Suspend2(A, Y, Z), } ``` where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them. Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time. --- Depends on: - [x] rust-lang#59897 Multi-variant layouts for generators - [x] rust-lang#60840 Preserve local scopes in generator MIR - [x] rust-lang#61373 Emit StorageDead along unwind paths for generators Before merging: - [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened rust-lang#60889) - [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location) - [x] Clean up TODO - [x] Fix the layout code to enforce that the same field never moves around in the generator - [x] Add tests for async/await - [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity) - [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at rust-lang#59897 (comment))

@Centril

…ized, r=Centril Make use of possibly uninitialized data [E0381] a hard error This is one of the behaviors we no longer allow in NLL. Since it can lead to undefined behavior, I think it's definitely worth making it a hard error without waiting to turn off migration mode (rust-lang#58781). Closes rust-lang#60450. My ulterior motive here is making it impossible to leave variables partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so tests are included for that. cc rust-lang#54987 --- I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me. r? @Centril cc @cramertj @nikomatsakis @pnkfelix @RalfJung

@Centril

…ized, r=Centril Make use of possibly uninitialized data [E0381] a hard error This is one of the behaviors we no longer allow in NLL. Since it can lead to undefined behavior, I think it's definitely worth making it a hard error without waiting to turn off migration mode (rust-lang#58781). Closes rust-lang#60450. My ulterior motive here is making it impossible to leave variables partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so tests are included for that. cc rust-lang#54987 --- I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me. r? @Centril cc @cramertj @nikomatsakis @pnkfelix @RalfJung

@Centril

…ized, r=Centril Make use of possibly uninitialized data [E0381] a hard error This is one of the behaviors we no longer allow in NLL. Since it can lead to undefined behavior, I think it's definitely worth making it a hard error without waiting to turn off migration mode (rust-lang#58781). Closes rust-lang#60450. My ulterior motive here is making it impossible to leave variables partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so tests are included for that. cc rust-lang#54987 --- I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me. r? @Centril cc @cramertj @nikomatsakis @pnkfelix @RalfJung

RalfJung · 2020-02-28T12:25:59Z

@tmandry in #69257 (not merged yet), I removed the generator hack and everything still seems to work. Is that expected? Has generator layout been fixed (e.g. by #69257 -- though I thought someone said that would not fix this issue)?

tmandry · 2020-03-02T22:05:11Z

@RalfJung IIRC, you're basically relying on an implementation detail of how MIR gets built, which is that it doesn't leave partially-initialized aggregates in the generator prior to yielding (it only creates aggregates once all their pieces are initialized).

But if this changed, it would probably mean the layouts we report for generators are wrong. So it's probably best if this gets relied on and tested by miri (in addition to testing in rustc). Then if we change the way MIR builds aggregates, we'll have a test failure to let us know we have to wrap generator fields in MaybeUninit everywhere.

Just to double check -- would this test fail if the aggregate were actually created?

RalfJung · 2020-03-03T13:54:48Z

Just to double check -- would this test fail if the aggregate were actually created?

It would definitely fail if Miri's layout-guided recursive descend would ever actually hit a field of ! type. As in, if the code actually claimed to have created a (String, !) aggregate (and, say, pass it somewhere by value), that should cause an error.

it doesn't leave partially-initialized aggregates in the generator prior to yielding (it only creates aggregates once all their pieces are initialized).

So, in that test case, the generator does not preserve a (String, !) across the yield point, only a String? I think it has to, or else it would have to say that it is a MaybeUninit<(String, !)>.

nikomatsakis · 2020-03-05T15:54:16Z

you're basically relying on an implementation detail of how MIR gets built, which is that it doesn't leave partially-initialized aggregates in the generator prior to yielding (it only creates aggregates once all their pieces are initialized).

I would not call this an implementation detail. If we were going to change this, we'd have to have some other modification to MIR to "signal" when the aggregate is fully constructed (it's relevant to borrow check, for example, and it affects visible drop order as well). Not to say we can't change it, but we can't do it lightly and without affecting other systems.

tmandry · 2020-03-06T02:11:52Z

So, in that test case, the generator does not preserve a (String, !) across the yield point, only a String? I think it has to, or else it would have to say that it is a MaybeUninit<(String, !)>.

Correct.

I would not call this an implementation detail. If we were going to change this, we'd have to have some other modification to MIR to "signal" when the aggregate is fully constructed (it's relevant to borrow check, for example, and it affects visible drop order as well). Not to say we can't change it, but we can't do it lightly and without affecting other systems.

Okay, that makes me feel better then :)

RalfJung · 2020-03-06T07:25:36Z

I suppose this means we can close this issue? I also extended the Miri test to try harder to catch invalid generator states by doing typed (and thus validated) copies of the entire generator after each yield (and then discarding them again, so this does not constitute "moving" the generator).

tmandry · 2020-03-06T22:04:00Z

Yep! That looks great, closing now.

jonas-schievink added A-async-await Area: Async & Await A-coroutines Area: Coroutines A-miri Area: The miri tool T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 16, 2019

tmandry mentioned this issue May 30, 2019

Generator optimization: Overlap locals that never have storage live at the same time #60187

Merged

10 tasks

nikomatsakis added the AsyncAwait-Triaged Async-await issues that have been triaged during a working group meeting. label Jun 4, 2019

This was referenced Jul 27, 2019

Goal: Accept partial initialization + use of records created via such #54987

Open

Make use of possibly uninitialized data [E0381] a hard error #63230

Merged

tmandry mentioned this issue Sep 5, 2019

Use MaybeUninit to produce more correct layouts #63035

Merged

tmandry closed this as completed Mar 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't ignore generator fields in miri #60889

Don't ignore generator fields in miri #60889

tmandry commented May 16, 2019

tmandry commented May 16, 2019 •

edited

Loading

RalfJung commented May 17, 2019

RalfJung commented May 17, 2019

nikomatsakis commented Jun 4, 2019

RalfJung commented Feb 28, 2020 •

edited

Loading

tmandry commented Mar 2, 2020 •

edited

Loading

RalfJung commented Mar 3, 2020

nikomatsakis commented Mar 5, 2020 •

edited

Loading

tmandry commented Mar 6, 2020

RalfJung commented Mar 6, 2020

tmandry commented Mar 6, 2020

Don't ignore generator fields in miri #60889

Don't ignore generator fields in miri #60889

Comments

tmandry commented May 16, 2019

tmandry commented May 16, 2019 • edited Loading

RalfJung commented May 17, 2019

RalfJung commented May 17, 2019

nikomatsakis commented Jun 4, 2019

RalfJung commented Feb 28, 2020 • edited Loading

tmandry commented Mar 2, 2020 • edited Loading

RalfJung commented Mar 3, 2020

nikomatsakis commented Mar 5, 2020 • edited Loading

tmandry commented Mar 6, 2020

RalfJung commented Mar 6, 2020

tmandry commented Mar 6, 2020

tmandry commented May 16, 2019 •

edited

Loading

RalfJung commented Feb 28, 2020 •

edited

Loading

tmandry commented Mar 2, 2020 •

edited

Loading

nikomatsakis commented Mar 5, 2020 •

edited

Loading