Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't ignore generator fields in miri #60889

Closed
tmandry opened this issue May 16, 2019 · 11 comments
Closed

Don't ignore generator fields in miri #60889

tmandry opened this issue May 16, 2019 · 11 comments
Labels
A-async-await Area: Async & Await A-coroutines Area: Coroutines A-miri Area: The miri tool AsyncAwait-Triaged Async-await issues that have been triaged during a working group meeting. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@tmandry
Copy link
Member

tmandry commented May 16, 2019

Right now, saved locals in generators are special-cased in miri, because they might not always be initialized.

With #59897 and #60187, we have the framework to fix this: fields are now only put in generator variants where they are live. The problem remains with aggregate fields, however, e.g.

fn gen() {
  let x: (i32, i32);
  x.0 = 5;
  yield;
  dbg!(x.0);
}

In this case x.1 is never initialized. This still presents a special case, because we cannot have a partially-unitialized field in a regular struct or enum type.

One way to remove the special-casing in miri is to wrap the type of every generator field in MaybeUninitialized when we return it from rustc::ty::layout::field(). This required making MaybeUnitialized a lang item, though, and doesn't actually give us any new features.

Eventually, I think we'd like to handle aggregate fields explicitly, so we can sanitize UB in generators.

@tmandry
Copy link
Member Author

tmandry commented May 16, 2019

cc @RalfJung @Zoxc @eddyb @cramertj @rust-lang/wg-async-await

@jonas-schievink jonas-schievink added A-async-await Area: Async & Await A-coroutines Area: Coroutines A-miri Area: The miri tool T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 16, 2019
@RalfJung
Copy link
Member

Right now, saved locals in generators are special-cased in miri, because they might not always be initialized.

In Miri and in layout code, and for the same reason.

@RalfJung
Copy link
Member

This required making MaybeUnitialized a lang item, though, and doesn't actually give us any new features.

Well, it gives us a "more correct" layout for generators, for some notion of correctness. Also technically we wouldn't have to do this for every field, if you know which fields can be partially initialized for a particular yield point you can encode that. And that might give us layout optimizations.

@nikomatsakis nikomatsakis added the AsyncAwait-Triaged Async-await issues that have been triaged during a working group meeting. label Jun 4, 2019
@nikomatsakis
Copy link
Contributor

Marking as "deferred" from a stabilization perspective.

bors added a commit that referenced this issue Jun 12, 2019
Generator optimization: Overlap locals that never have storage live at the same time

The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization.

More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.**

To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with.

Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout.

For the layout computation, we use a simplified approach.

1. Start with the set of locals assigned to only one variant. The rest are disqualified.
2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping.

Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone".

So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this:

You can think of a generator as an enum, where some fields are shared between variants. e.g.

```rust
enum Generator {
  Unresumed,
  Poisoned,
  Returned,
  Suspend0(A, B, X),
  Suspend1(B),
  Suspend2(A, Y, Z),
}
```

where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them.

Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time.

---

Depends on:
- [x] #59897 Multi-variant layouts for generators
- [x] #60840 Preserve local scopes in generator MIR
- [x] #61373 Emit StorageDead along unwind paths for generators

Before merging:

- [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened #60889)
- [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location)
- [x] Clean up TODO
- [x] Fix the layout code to enforce that the same field never moves around in the generator
- [x] Add tests for async/await
- [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity)
- [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at #59897 (comment))
Centril added a commit to Centril/rust that referenced this issue Jun 12, 2019
…ddyb

Generator optimization: Overlap locals that never have storage live at the same time

The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization.

More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.**

To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with.

Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout.

For the layout computation, we use a simplified approach.

1. Start with the set of locals assigned to only one variant. The rest are disqualified.
2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping.

Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone".

So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this:

You can think of a generator as an enum, where some fields are shared between variants. e.g.

```rust
enum Generator {
  Unresumed,
  Poisoned,
  Returned,
  Suspend0(A, B, X),
  Suspend1(B),
  Suspend2(A, Y, Z),
}
```

where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them.

Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time.

---

Depends on:
- [x] rust-lang#59897 Multi-variant layouts for generators
- [x] rust-lang#60840 Preserve local scopes in generator MIR
- [x] rust-lang#61373 Emit StorageDead along unwind paths for generators

Before merging:

- [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened rust-lang#60889)
- [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location)
- [x] Clean up TODO
- [x] Fix the layout code to enforce that the same field never moves around in the generator
- [x] Add tests for async/await
- [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity)
- [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at rust-lang#59897 (comment))
Centril added a commit to Centril/rust that referenced this issue Aug 6, 2019
…ized, r=Centril

Make use of possibly uninitialized data [E0381] a hard error

This is one of the behaviors we no longer allow in NLL. Since it can
lead to undefined behavior, I think it's definitely worth making it a
hard error without waiting to turn off migration mode (rust-lang#58781).

Closes rust-lang#60450.

My ulterior motive here is making it impossible to leave variables
partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so
tests are included for that.

cc rust-lang#54987

---

I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me.

r? @Centril
cc @cramertj @nikomatsakis @pnkfelix @RalfJung
Centril added a commit to Centril/rust that referenced this issue Aug 6, 2019
…ized, r=Centril

Make use of possibly uninitialized data [E0381] a hard error

This is one of the behaviors we no longer allow in NLL. Since it can
lead to undefined behavior, I think it's definitely worth making it a
hard error without waiting to turn off migration mode (rust-lang#58781).

Closes rust-lang#60450.

My ulterior motive here is making it impossible to leave variables
partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so
tests are included for that.

cc rust-lang#54987

---

I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me.

r? @Centril
cc @cramertj @nikomatsakis @pnkfelix @RalfJung
Centril added a commit to Centril/rust that referenced this issue Aug 6, 2019
…ized, r=Centril

Make use of possibly uninitialized data [E0381] a hard error

This is one of the behaviors we no longer allow in NLL. Since it can
lead to undefined behavior, I think it's definitely worth making it a
hard error without waiting to turn off migration mode (rust-lang#58781).

Closes rust-lang#60450.

My ulterior motive here is making it impossible to leave variables
partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so
tests are included for that.

cc rust-lang#54987

---

I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me.

r? @Centril
cc @cramertj @nikomatsakis @pnkfelix @RalfJung
@RalfJung
Copy link
Member

RalfJung commented Feb 28, 2020

@tmandry in #69257 (not merged yet), I removed the generator hack and everything still seems to work. Is that expected? Has generator layout been fixed (e.g. by #69257 -- though I thought someone said that would not fix this issue)?

@tmandry
Copy link
Member Author

tmandry commented Mar 2, 2020

@RalfJung IIRC, you're basically relying on an implementation detail of how MIR gets built, which is that it doesn't leave partially-initialized aggregates in the generator prior to yielding (it only creates aggregates once all their pieces are initialized).

But if this changed, it would probably mean the layouts we report for generators are wrong. So it's probably best if this gets relied on and tested by miri (in addition to testing in rustc). Then if we change the way MIR builds aggregates, we'll have a test failure to let us know we have to wrap generator fields in MaybeUninit everywhere.

Just to double check -- would this test fail if the aggregate were actually created?

@RalfJung
Copy link
Member

RalfJung commented Mar 3, 2020

Just to double check -- would this test fail if the aggregate were actually created?

It would definitely fail if Miri's layout-guided recursive descend would ever actually hit a field of ! type. As in, if the code actually claimed to have created a (String, !) aggregate (and, say, pass it somewhere by value), that should cause an error.

it doesn't leave partially-initialized aggregates in the generator prior to yielding (it only creates aggregates once all their pieces are initialized).

So, in that test case, the generator does not preserve a (String, !) across the yield point, only a String? I think it has to, or else it would have to say that it is a MaybeUninit<(String, !)>.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Mar 5, 2020

you're basically relying on an implementation detail of how MIR gets built, which is that it doesn't leave partially-initialized aggregates in the generator prior to yielding (it only creates aggregates once all their pieces are initialized).

I would not call this an implementation detail. If we were going to change this, we'd have to have some other modification to MIR to "signal" when the aggregate is fully constructed (it's relevant to borrow check, for example, and it affects visible drop order as well). Not to say we can't change it, but we can't do it lightly and without affecting other systems.

@tmandry
Copy link
Member Author

tmandry commented Mar 6, 2020

So, in that test case, the generator does not preserve a (String, !) across the yield point, only a String? I think it has to, or else it would have to say that it is a MaybeUninit<(String, !)>.

Correct.

I would not call this an implementation detail. If we were going to change this, we'd have to have some other modification to MIR to "signal" when the aggregate is fully constructed (it's relevant to borrow check, for example, and it affects visible drop order as well). Not to say we can't change it, but we can't do it lightly and without affecting other systems.

Okay, that makes me feel better then :)

@RalfJung
Copy link
Member

RalfJung commented Mar 6, 2020

I suppose this means we can close this issue? I also extended the Miri test to try harder to catch invalid generator states by doing typed (and thus validated) copies of the entire generator after each yield (and then discarding them again, so this does not constitute "moving" the generator).

@tmandry
Copy link
Member Author

tmandry commented Mar 6, 2020

Yep! That looks great, closing now.

@tmandry tmandry closed this as completed Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-async-await Area: Async & Await A-coroutines Area: Coroutines A-miri Area: The miri tool AsyncAwait-Triaged Async-await issues that have been triaged during a working group meeting. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants