Optimizing Stacked Borrows (part 2): Shrink Item #2315

saethlin · 2022-07-03T05:31:42Z

This moves protectors out of Item, storing them both in a global HashSet which contains all currently-protected tags as well as a Vec<SbTag> on each Frame so that when we return from a function we know which tags to remove from the protected set.

This also bit-packs the 64-bit tag and the 2-bit permission together when they are stored in memory. This means we theoretically run out of tags sooner, but I doubt that limit will ever be hit.

Together these optimizations reduce the memory footprint of Miri when executing programs which stress Stacked Borrows by ~66%. For example, running a test with isolation off which only panics currently peaks at ~19 GB, with this PR it peaks at ~6.2 GB.

To-do

Enforce the 62-bit limit
Decide if there is a better order to pack the tag and permission in
Wait for UnsafeCell to become infectious, or express offsets + tags in the global protector set

Benchmarks before:

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/backtraces/Cargo.toml
  Time (mean ± σ):      8.948 s ±  0.253 s    [User: 8.752 s, System: 0.158 s]
  Range (min … max):    8.619 s …  9.279 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/mse/Cargo.toml
  Time (mean ± σ):      2.129 s ±  0.037 s    [User: 1.849 s, System: 0.248 s]
  Range (min … max):    2.086 s …  2.176 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/serde1/Cargo.toml
  Time (mean ± σ):      3.334 s ±  0.017 s    [User: 3.211 s, System: 0.103 s]
  Range (min … max):    3.315 s …  3.352 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/serde2/Cargo.toml
  Time (mean ± σ):      3.316 s ±  0.038 s    [User: 3.207 s, System: 0.095 s]
  Range (min … max):    3.282 s …  3.375 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/unicode/Cargo.toml
  Time (mean ± σ):      6.391 s ±  0.323 s    [User: 5.928 s, System: 0.412 s]
  Range (min … max):    6.090 s …  6.917 s    5 runs

After:

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/backtraces/Cargo.toml
 Time (mean ± σ):      6.955 s ±  0.051 s    [User: 6.807 s, System: 0.132 s]
 Range (min … max):    6.900 s …  7.038 s    5 runs

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/mse/Cargo.toml
 Time (mean ± σ):      1.784 s ±  0.012 s    [User: 1.627 s, System: 0.156 s]
 Range (min … max):    1.772 s …  1.797 s    5 runs

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/serde1/Cargo.toml
 Time (mean ± σ):      2.505 s ±  0.095 s    [User: 2.311 s, System: 0.096 s]
 Range (min … max):    2.405 s …  2.603 s    5 runs

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/serde2/Cargo.toml
 Time (mean ± σ):      2.449 s ±  0.031 s    [User: 2.306 s, System: 0.100 s]
 Range (min … max):    2.395 s …  2.467 s    5 runs

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/unicode/Cargo.toml
 Time (mean ± σ):      3.667 s ±  0.110 s    [User: 3.498 s, System: 0.140 s]
 Range (min … max):    3.564 s …  3.814 s    5 runs

The decrease in system time is probably due to spending less time in the page fault handler.

RalfJung · 2022-07-03T14:41:09Z

This moves protectors out of Item, storing them both in a global HashSet which contains all currently-protected tags as well as a Vec on each Frame so that when we return from a function we know which tags to remove from the protected set.

How is that implementing the correct semantics? Note that with a type like &(i32, Cell<i32>), since #2248 landed this tag is protected for the i32 part of the memory but not for the Cell part.

EDIT: Oh I see you have a TODO that probably relates to this. I assume this currently makes a test fail?

saethlin · 2022-07-03T15:10:16Z

Yes this absolutely implements the wrong semantics right now, and the two tests for internal mutability fail. I'm going to touch up the rest of this first and hope you decide on the behavior of UnsafeCell in the meantime.

RalfJung · 2022-07-03T15:13:31Z

Well I am personally pretty much settled that I want them to completely infect the reference.

But this seems like a big change and I didn't see a lot of other people (in particular the lang team) enthusiastically agree with that...

RalfJung · 2022-07-03T18:29:25Z

Meanwhile, what about the following scheme: we sacrifice another bit in the item to indicate whether or not this item is protected. That should be enough, because all items for a given tag t are either protected by the same CallId, or not protected; i.e., we never have two items with the same tag protected by two different CallIds.

saethlin · 2022-07-03T19:18:11Z

in particular the lang team

🤷 All that I can find related to this decision is the rfcbot comment which got un-nominated with no evidence of discussion. I haven't seen any objections from any lang team members, and it's only been 5 days since it was un-nominated. Maybe I'm just very patient.

I was considering sacrificing another bit anyway to bypass the check in item_popped. But I'm not sure that resolves the ambiguity, don't we still have the problem that for some Tag, it will be protected in the stacks that are not within an UnsafeCell, and it will not be protected in the stacks that are inside an UnsafeCell?

saethlin · 2022-07-03T19:19:27Z

Oh 🤦 wait I get it. Yes this is a great idea.

RalfJung · 2022-07-03T19:54:41Z

All that I can find related to this decision is rust-lang/rust#98017 (comment) which got un-nominated with no evidence of discussion. I haven't seen any objections from any lang team members, and it's only been 5 days since it was un-nominated. Maybe I'm just very patient.

That's a different discussion. If/when that lands (which I expect it will), that brings the doc in sync with what Miri already does since #2248.

What you would have needed is rust-lang/unsafe-code-guidelines#236.
But anyway we seem to agree that compression is possible even with the current semantics. :)

saethlin · 2022-07-03T19:56:34Z

Now I just have UI test failures because printing a tag can no longer print the CallId that it was protected by. Is that okay?

saethlin · 2022-07-03T20:10:21Z

Also, should we even have a second Item struct, or just use the packed one everywhere? I don't think there's any performance argument to be made, but it would make the codebase simpler overall.

src/stacked_borrows/stack.rs

tests/fail/stacked_borrows/aliasing_mut1.stderr

RalfJung · 2022-07-04T23:31:53Z

Also, should we even have a second Item struct, or just use the packed one everywhere? I don't think there's any performance argument to be made, but it would make the codebase simpler overall.

Only having the packed thing would make the code simpler? Sure, then go for it. (I guess for struct that is usually the case; when packing an enum one really wants the unpacked type to do match.)

src/stacked_borrows/stack.rs

bors · 2022-07-05T10:41:59Z

☔ The latest upstream changes (presumably #2333) made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2022-07-06T14:17:36Z

☔ The latest upstream changes (presumably #2338) made this pull request unmergeable. Please resolve the merge conflicts.

src/machine.rs

src/stacked_borrows.rs

src/machine.rs

src/stacked_borrows.rs

src/machine.rs

tests/fail/stacked_borrows/deallocate_against_barrier1.rs

src/stacked_borrows/stack.rs

RalfJung · 2022-07-12T16:21:14Z

@rustbot author
but this is my last round of comments, I promise :D

src/stacked_borrows.rs

RalfJung · 2022-07-13T00:45:48Z

There we go. :)
Please squash the commits a little, then r=me.
@bors delegate+

bors · 2022-07-13T00:45:50Z

✌️ @saethlin can now approve this pull request

Previously, Item was a struct of a NonZeroU64, an Option which was usually unset or irrelevant, and a 4-variant enum. So collectively, the size of an Item was 24 bytes, but only 8 bytes were used for the most part. So this takes advantage of the fact that it is probably impossible to exhaust the total space of SbTags, and steals 3 bits from it to pack the whole struct into a single u64. This bit-packing means that we reduce peak memory usage when Miri goes memory-bound by ~3x. We also get CPU performance improvements of varying size, because not only are we simply accessing less memory, we can now compare a Vec<Item> using a memcmp because it does not have any padding.

stacked_borrow now has an item module, and its own FrameExtra. These serve to protect the implementation of Item (which is a bunch of bit-packing tricks) from the primary logic of Stacked Borrows, and the FrameExtra we have separates Stacked Borrows more cleanly from the interpreter itself. The new strategy for checking protectors also makes some subtle performance tradeoffs, so they are now documented in Stack::item_popped because that function primarily benefits from them, and it also touches every aspect of them. Also separating the actual CallId that is protecting a Tag from the Tag makes it inconvienent to reproduce exactly the same protector errors, so this also takes the opportunity to use some slightly cleaner English in those errors. We need to make some change, might as well make it good.

saethlin · 2022-07-13T01:10:35Z

I squashed a bit more than a little, but I think also wrote a fittingly large message for the second commit.

saethlin · 2022-07-13T01:43:56Z

@bors r+

bors · 2022-07-13T01:43:58Z

📌 Commit 4eff60a has been approved by saethlin

It is now in the queue for this repository.

bors · 2022-07-13T01:44:03Z

⌛ Testing commit 4eff60a with merge db5a2b9...

bors · 2022-07-13T02:06:35Z

☀️ Test successful - checks-actions
Approved by: saethlin
Pushing db5a2b9 to master...

Add a benchmark of the hang-on-test-failure code path This is the code pattern that produces the performance problem in #2273 I figured out what I was stuck on in #2315 (comment). For a while I was just doing `let x: &[u8] = &[0u8; 4096];` but that doesn't produce the runtime inside `Stack::item_popped` that I was looking for, I think because this allocation is never deallocated. But with `Vec`, I get the profile I'm looking for.

saethlin mentioned this pull request Jul 3, 2022

Iterate in reverse when comparing Stacks #2214

Closed

saethlin force-pushed the shrink-item branch from 08e0754 to 85d3a5d Compare July 3, 2022 17:56

saethlin marked this pull request as ready for review July 3, 2022 20:43

oli-obk reviewed Jul 4, 2022

View reviewed changes

src/stacked_borrows/stack.rs Outdated Show resolved Hide resolved

oli-obk reviewed Jul 4, 2022

View reviewed changes

src/stacked_borrows/stack.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jul 4, 2022

View reviewed changes

tests/fail/stacked_borrows/aliasing_mut1.stderr Outdated Show resolved Hide resolved

RalfJung reviewed Jul 4, 2022

View reviewed changes

src/stacked_borrows/stack.rs Outdated Show resolved Hide resolved

saethlin force-pushed the shrink-item branch 2 times, most recently from d9c5d39 to 548ca37 Compare July 5, 2022 01:11

saethlin mentioned this pull request Jul 5, 2022

move checking ptr tracking on item pop into cold helper function #2328

Merged

saethlin force-pushed the shrink-item branch from 548ca37 to 481ce3e Compare July 5, 2022 23:35

saethlin force-pushed the shrink-item branch from 481ce3e to 48e97ea Compare July 6, 2022 14:34

RalfJung reviewed Jul 6, 2022

View reviewed changes

src/machine.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jul 6, 2022

View reviewed changes

src/stacked_borrows.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jul 6, 2022

View reviewed changes

src/stacked_borrows.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jul 6, 2022

View reviewed changes

src/stacked_borrows.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jul 6, 2022

View reviewed changes

src/stacked_borrows.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jul 6, 2022

View reviewed changes

src/stacked_borrows.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jul 12, 2022

View reviewed changes

src/stacked_borrows.rs Show resolved Hide resolved

RalfJung reviewed Jul 12, 2022

View reviewed changes

src/stacked_borrows.rs Outdated Show resolved Hide resolved

This comment was marked as off-topic.

Sign in to view

rustbot added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels Jul 12, 2022

This comment was marked as off-topic.

Sign in to view

rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels Jul 12, 2022

RalfJung reviewed Jul 12, 2022

View reviewed changes

src/machine.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jul 12, 2022

View reviewed changes

src/stacked_borrows.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jul 12, 2022

View reviewed changes

src/machine.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jul 12, 2022

View reviewed changes

tests/fail/stacked_borrows/deallocate_against_barrier1.rs Show resolved Hide resolved

RalfJung reviewed Jul 12, 2022

View reviewed changes

src/stacked_borrows/stack.rs Show resolved Hide resolved

rustbot added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels Jul 12, 2022

RalfJung reviewed Jul 13, 2022

View reviewed changes

src/stacked_borrows.rs Outdated Show resolved Hide resolved

saethlin added 2 commits July 12, 2022 21:01

saethlin force-pushed the shrink-item branch from c6df5b2 to 4eff60a Compare July 13, 2022 01:08

bors merged commit db5a2b9 into rust-lang:master Jul 13, 2022

saethlin deleted the shrink-item branch July 13, 2022 23:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing Stacked Borrows (part 2): Shrink Item #2315

Optimizing Stacked Borrows (part 2): Shrink Item #2315

saethlin commented Jul 3, 2022 •

edited

Loading

RalfJung commented Jul 3, 2022 •

edited

Loading

saethlin commented Jul 3, 2022

RalfJung commented Jul 3, 2022

RalfJung commented Jul 3, 2022

saethlin commented Jul 3, 2022

saethlin commented Jul 3, 2022

RalfJung commented Jul 3, 2022

saethlin commented Jul 3, 2022

saethlin commented Jul 3, 2022

RalfJung commented Jul 4, 2022 •

edited

Loading

bors commented Jul 5, 2022

bors commented Jul 6, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

RalfJung commented Jul 12, 2022

RalfJung commented Jul 13, 2022

bors commented Jul 13, 2022

saethlin commented Jul 13, 2022

saethlin commented Jul 13, 2022

bors commented Jul 13, 2022

bors commented Jul 13, 2022

bors commented Jul 13, 2022

Optimizing Stacked Borrows (part 2): Shrink Item #2315

Optimizing Stacked Borrows (part 2): Shrink Item #2315

Conversation

saethlin commented Jul 3, 2022 • edited Loading

RalfJung commented Jul 3, 2022 • edited Loading

saethlin commented Jul 3, 2022

RalfJung commented Jul 3, 2022

RalfJung commented Jul 3, 2022

saethlin commented Jul 3, 2022

saethlin commented Jul 3, 2022

RalfJung commented Jul 3, 2022

saethlin commented Jul 3, 2022

saethlin commented Jul 3, 2022

RalfJung commented Jul 4, 2022 • edited Loading

bors commented Jul 5, 2022

bors commented Jul 6, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

RalfJung commented Jul 12, 2022

RalfJung commented Jul 13, 2022

bors commented Jul 13, 2022

saethlin commented Jul 13, 2022

saethlin commented Jul 13, 2022

bors commented Jul 13, 2022

bors commented Jul 13, 2022

bors commented Jul 13, 2022

saethlin commented Jul 3, 2022 •

edited

Loading

RalfJung commented Jul 3, 2022 •

edited

Loading

RalfJung commented Jul 4, 2022 •

edited

Loading