Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing Stacked Borrows (part 2): Shrink Item #2315

Merged
merged 2 commits into from
Jul 13, 2022

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Jul 3, 2022

This moves protectors out of Item, storing them both in a global HashSet which contains all currently-protected tags as well as a Vec<SbTag> on each Frame so that when we return from a function we know which tags to remove from the protected set.

This also bit-packs the 64-bit tag and the 2-bit permission together when they are stored in memory. This means we theoretically run out of tags sooner, but I doubt that limit will ever be hit.

Together these optimizations reduce the memory footprint of Miri when executing programs which stress Stacked Borrows by ~66%. For example, running a test with isolation off which only panics currently peaks at ~19 GB, with this PR it peaks at ~6.2 GB.

To-do

  • Enforce the 62-bit limit
  • Decide if there is a better order to pack the tag and permission in
  • Wait for UnsafeCell to become infectious, or express offsets + tags in the global protector set

Benchmarks before:

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/backtraces/Cargo.toml
  Time (mean ± σ):      8.948 s ±  0.253 s    [User: 8.752 s, System: 0.158 s]
  Range (min … max):    8.619 s …  9.279 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/mse/Cargo.toml
  Time (mean ± σ):      2.129 s ±  0.037 s    [User: 1.849 s, System: 0.248 s]
  Range (min … max):    2.086 s …  2.176 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/serde1/Cargo.toml
  Time (mean ± σ):      3.334 s ±  0.017 s    [User: 3.211 s, System: 0.103 s]
  Range (min … max):    3.315 s …  3.352 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/serde2/Cargo.toml
  Time (mean ± σ):      3.316 s ±  0.038 s    [User: 3.207 s, System: 0.095 s]
  Range (min … max):    3.282 s …  3.375 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/unicode/Cargo.toml
  Time (mean ± σ):      6.391 s ±  0.323 s    [User: 5.928 s, System: 0.412 s]
  Range (min … max):    6.090 s …  6.917 s    5 runs

After:

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/backtraces/Cargo.toml
 Time (mean ± σ):      6.955 s ±  0.051 s    [User: 6.807 s, System: 0.132 s]
 Range (min … max):    6.900 s …  7.038 s    5 runs

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/mse/Cargo.toml
 Time (mean ± σ):      1.784 s ±  0.012 s    [User: 1.627 s, System: 0.156 s]
 Range (min … max):    1.772 s …  1.797 s    5 runs

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/serde1/Cargo.toml
 Time (mean ± σ):      2.505 s ±  0.095 s    [User: 2.311 s, System: 0.096 s]
 Range (min … max):    2.405 s …  2.603 s    5 runs

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/serde2/Cargo.toml
 Time (mean ± σ):      2.449 s ±  0.031 s    [User: 2.306 s, System: 0.100 s]
 Range (min … max):    2.395 s …  2.467 s    5 runs

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/unicode/Cargo.toml
 Time (mean ± σ):      3.667 s ±  0.110 s    [User: 3.498 s, System: 0.140 s]
 Range (min … max):    3.564 s …  3.814 s    5 runs

The decrease in system time is probably due to spending less time in the page fault handler.

@RalfJung
Copy link
Member

RalfJung commented Jul 3, 2022

This moves protectors out of Item, storing them both in a global HashSet which contains all currently-protected tags as well as a Vec on each Frame so that when we return from a function we know which tags to remove from the protected set.

How is that implementing the correct semantics? Note that with a type like &(i32, Cell<i32>), since #2248 landed this tag is protected for the i32 part of the memory but not for the Cell part.

EDIT: Oh I see you have a TODO that probably relates to this. I assume this currently makes a test fail?

@saethlin
Copy link
Member Author

saethlin commented Jul 3, 2022

Yes this absolutely implements the wrong semantics right now, and the two tests for internal mutability fail. I'm going to touch up the rest of this first and hope you decide on the behavior of UnsafeCell in the meantime.

@RalfJung
Copy link
Member

RalfJung commented Jul 3, 2022

Well I am personally pretty much settled that I want them to completely infect the reference.

But this seems like a big change and I didn't see a lot of other people (in particular the lang team) enthusiastically agree with that...

@RalfJung
Copy link
Member

RalfJung commented Jul 3, 2022

Meanwhile, what about the following scheme: we sacrifice another bit in the item to indicate whether or not this item is protected. That should be enough, because all items for a given tag t are either protected by the same CallId, or not protected; i.e., we never have two items with the same tag protected by two different CallIds.

@saethlin
Copy link
Member Author

saethlin commented Jul 3, 2022

in particular the lang team

🤷 All that I can find related to this decision is the rfcbot comment which got un-nominated with no evidence of discussion. I haven't seen any objections from any lang team members, and it's only been 5 days since it was un-nominated. Maybe I'm just very patient.

I was considering sacrificing another bit anyway to bypass the check in item_popped. But I'm not sure that resolves the ambiguity, don't we still have the problem that for some Tag, it will be protected in the stacks that are not within an UnsafeCell, and it will not be protected in the stacks that are inside an UnsafeCell?

@saethlin
Copy link
Member Author

saethlin commented Jul 3, 2022

Oh 🤦 wait I get it. Yes this is a great idea.

@RalfJung
Copy link
Member

RalfJung commented Jul 3, 2022

All that I can find related to this decision is rust-lang/rust#98017 (comment) which got un-nominated with no evidence of discussion. I haven't seen any objections from any lang team members, and it's only been 5 days since it was un-nominated. Maybe I'm just very patient.

That's a different discussion. If/when that lands (which I expect it will), that brings the doc in sync with what Miri already does since #2248.

What you would have needed is rust-lang/unsafe-code-guidelines#236.
But anyway we seem to agree that compression is possible even with the current semantics. :)

@saethlin
Copy link
Member Author

saethlin commented Jul 3, 2022

Now I just have UI test failures because printing a tag can no longer print the CallId that it was protected by. Is that okay?

@saethlin
Copy link
Member Author

saethlin commented Jul 3, 2022

Also, should we even have a second Item struct, or just use the packed one everywhere? I don't think there's any performance argument to be made, but it would make the codebase simpler overall.

@saethlin saethlin marked this pull request as ready for review July 3, 2022 20:43
@RalfJung
Copy link
Member

RalfJung commented Jul 4, 2022

Also, should we even have a second Item struct, or just use the packed one everywhere? I don't think there's any performance argument to be made, but it would make the codebase simpler overall.

Only having the packed thing would make the code simpler? Sure, then go for it. (I guess for struct that is usually the case; when packing an enum one really wants the unpacked type to do match.)

@bors
Copy link
Contributor

bors commented Jul 5, 2022

☔ The latest upstream changes (presumably #2333) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented Jul 6, 2022

☔ The latest upstream changes (presumably #2338) made this pull request unmergeable. Please resolve the merge conflicts.

src/machine.rs Outdated Show resolved Hide resolved
src/stacked_borrows.rs Outdated Show resolved Hide resolved
src/stacked_borrows.rs Outdated Show resolved Hide resolved
src/stacked_borrows.rs Outdated Show resolved Hide resolved
src/stacked_borrows.rs Outdated Show resolved Hide resolved
src/stacked_borrows.rs Outdated Show resolved Hide resolved
src/stacked_borrows.rs Outdated Show resolved Hide resolved
@RalfJung

This comment was marked as off-topic.

@rustbot rustbot added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels Jul 12, 2022
@RalfJung

This comment was marked as off-topic.

@rustbot rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels Jul 12, 2022
src/machine.rs Outdated Show resolved Hide resolved
src/stacked_borrows.rs Outdated Show resolved Hide resolved
src/machine.rs Outdated Show resolved Hide resolved
@RalfJung
Copy link
Member

@rustbot author
but this is my last round of comments, I promise :D

@rustbot rustbot added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels Jul 12, 2022
src/stacked_borrows.rs Outdated Show resolved Hide resolved
@RalfJung
Copy link
Member

There we go. :)
Please squash the commits a little, then r=me.
@bors delegate+

@bors
Copy link
Contributor

bors commented Jul 13, 2022

✌️ @saethlin can now approve this pull request

Previously, Item was a struct of a NonZeroU64, an Option which was
usually unset or irrelevant, and a 4-variant enum. So collectively, the
size of an Item was 24 bytes, but only 8 bytes were used for the most
part.

So this takes advantage of the fact that it is probably impossible to
exhaust the total space of SbTags, and steals 3 bits from it to pack the
whole struct into a single u64. This bit-packing means that we reduce
peak memory usage when Miri goes memory-bound by ~3x. We also get CPU
performance improvements of varying size, because not only are we simply
accessing less memory, we can now compare a Vec<Item> using a memcmp
because it does not have any padding.
stacked_borrow now has an item module, and its own FrameExtra. These
serve to protect the implementation of Item (which is a bunch of
bit-packing tricks) from the primary logic of Stacked Borrows, and the
FrameExtra we have separates Stacked Borrows more cleanly from the
interpreter itself.

The new strategy for checking protectors also makes some subtle
performance tradeoffs, so they are now documented in Stack::item_popped
because that function primarily benefits from them, and it also touches
every aspect of them.

Also separating the actual CallId that is protecting a Tag from the Tag
makes it inconvienent to reproduce exactly the same protector errors, so
this also takes the opportunity to use some slightly cleaner English in
those errors. We need to make some change, might as well make it good.
@saethlin
Copy link
Member Author

I squashed a bit more than a little, but I think also wrote a fittingly large message for the second commit.

@saethlin
Copy link
Member Author

@bors r+

@bors
Copy link
Contributor

bors commented Jul 13, 2022

📌 Commit 4eff60a has been approved by saethlin

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Jul 13, 2022

⌛ Testing commit 4eff60a with merge db5a2b9...

@bors
Copy link
Contributor

bors commented Jul 13, 2022

☀️ Test successful - checks-actions
Approved by: saethlin
Pushing db5a2b9 to master...

@bors bors merged commit db5a2b9 into rust-lang:master Jul 13, 2022
bors added a commit that referenced this pull request Jul 13, 2022
Add a benchmark of the hang-on-test-failure code path

This is the code pattern that produces the performance problem in #2273

I figured out what I was stuck on in #2315 (comment). For a while I was just doing `let x: &[u8] = &[0u8; 4096];` but that doesn't produce the runtime inside `Stack::item_popped` that I was looking for, I think because this allocation is never deallocated. But with `Vec`, I get the profile I'm looking for.
@saethlin saethlin deleted the shrink-item branch July 13, 2022 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: Waiting for the PR author to address review comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants