-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weak memory emulation using store buffers #1963
Conversation
Oh wow, this is amazing! Thanks so much for working on this. :) Unfortunately I will have to put off a detailed review until mid-February or maybe even March, since interview season is approaching rapidly for me. So for now I can just give some high-level comments:
|
Of course! My term has started now as well so I will have less time to work on this than I did over Christmas
Right now it's a uniform random choice using the existing rng in I agree we could have a flag for this with possible strategies like
I saw Promising Semantics was used to handle SC fences over in loom tokio-rs/loom#220, I didn't realise there's a v2. Will have a look! Both the author (Christopher) and the supervisor (Alastair, who actually taught my concurrency course last term) of the race detector paper are on GitHub as well. If we could incorporate something that works with scfix (plus the already-implemented change on simplified release sequences) into the model I'm sure they'll be interested.
Yes. Although by non-preemptive here I meant the scheduler not context switching until an explicit yield (which rules out a lot of weak behaviours that requires certain interleavings), but yes, for speculative loads it will require speculative execution in the interpreter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry because it seems that there are some mistakes occurred, which let me approve this request without review. I consider I am not capable now to review this snippet, so maybe it is a greater idea to seek others to review it. Thanks:)
☔ The latest upstream changes (presumably #2045) made this pull request unmergeable. Please resolve the merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry again for the long delay! It is hard to find time to review big complicated changes like this. This is some impressive work. :-)
I went over the code and left some comments. However, I honestly don't understand at least half of it, and I won't have time any time soon to dig into the paper and convince myself that the algorithm makes sense, or to meticulously check that you implemented the algorithm faithfully. This concerns me, because it means in case of any problems I have no way of debugging this. In that sense I quite appreciate that you put most of this into a new file, because it means if it all goes horribly wrong one day I can at least rip it out fairly easily. In fact it might be a good idea to further reduce the data_race.rs
changes and move basically everything that works on AllocExtra::weak_memory
into the new file (including the let ... if
that even test whether that data is present).
So, under these caveats, I think I am okay with landing this. Ideally, if/when a problem comes up, I'll just ping you and you can help. :)
What is the situation with that C++20 revised definition? Is it fair to say that this simply implements C++11 semantics instead? That seems fine for me, it just needs documenting (including in the README). It's probably better to stick to a published algorithm rather than trying to adjust it... I certainly couldn't review those adjustments.
However you say
(CppMem still says it's consistent, but that's because it uses C++11 model)
so does what you implemented disagree with CppMem, or does "The test case RWC+syncs given in Lahav et al.'s paper fails" actually mean "it says it is consistent"? I am confused.
Is there a testcase you can add that should pass/fail but does the opposite, due to this?
It also would be good to have some test case that shows that store buffering is actually happening. Obviously this might have to run like 10 times and assert that it happens at least once, but still, at least a smoke test should be there -- right now, the test case you added would have already passed before your PR, so that's kind of a shame.
And finally, can you make sure that the crossbeam tests still pass with your patch? They have Miri on CI, and it'd be a shame if we broke their CI by landing this.
Thanks for coming back to this!
I'll look into this, though I suspect lifting code that accesses
All possible behaviours under the current implementation are allowed under C++11 semantics, this is what I would call correct or sound (albeit incomplete). Very briefly, C++20 weakened SC loads and stores (accesses), and strengthened SC fences. This gives rises to 3 scenarios
Note that loom doesn't support SC accesses (they are weakened to Acquire/Release). I don't think it's acceptable for us to do the same here because Miri as a dynamic checker shouldn't produce false positives (behaviours not allowed under the memory model), whereas Loom is a static checker and can favour coverage at the expense of overly-weak behaviours. As such, we have to stick to what's given in the Lidbury & Donaldson paper and therefore C++11 model. I'm glad you are on board with this.
My implementation agrees with CppMem. The trouble is Rust's documentation specifies that Rust follows C++20 model, not C++11. Ideally we want to do C++20 (i.e.
The issue is that, whether a buffered store gets read at all is dependent on Miri's scheduler and the RNG that picks one store from potentially many candidates. The randomness can be dealt with by running the test many times, but the scheduler is completely deterministic and multiple runs make no difference. Of course, I can find an assertion that succeeds under the current version, but it's not robust against potentially subtle perturbations to the scheduler behaviour. Nonetheless I'm happy to add such test with extensive comments on why it may fail.
Will do Quite a lot of UI tests are broken after rebasing. Looks like they're due to the same issue. Will investigate later. |
Thanks! I was mostly imagining changes like moving this entire block into the other file, which I don't think should cause borrow checker trouble?
So in contrast, the implementation is not 'correct' under C++20, in the sense of generating behaviors that C++20 disallows? What about 'completeness' (generating all possible behaviors, given enough different random seeds)?
Seems fair, given that that's what other tools also do and that the entire weak memory support can be disabled.
Okay. Please add this as a compile-fail testcase with comments explaining the situation.
The scheduler can be 'nudged' by calling |
Indeed it will generate behaviors that C++20 disallows.
It's not complete. There are categories of allowed weak behaviour this implementation will never produce, such as load buffering. |
Okay. That sounds like a good thing to include into the summary at the top of |
c0be33b
to
20db949
Compare
☔ The latest upstream changes (presumably #2084) made this pull request unmergeable. Please resolve the merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some more comments; my understanding is a bunch of my previous ones are also still open.
This PR is getting huge! Do you think there is any way to stage this over multiple PRs, for ease of reviewing?
Sorry I was thinking of adding my would-be responses as comments, though I tend to procrastinate them until after code changes...
Everything in 0b9c8b0 is standalone. I'll cherry-pick it into another PR. All the changes to scheduler-dependent race detector tests are simply adding |
I don't mind them. The hard to review part is in |
Take one for example pub fn test_simple_release() {
let mut a = 0u32;
let b = &mut a as *mut u32;
let c = EvilSend(b);
unsafe {
let j1 = spawn(move || {
*c.0 = 1;
SYNC.store(1, Ordering::Release);
});
let j2 = spawn(move || {
if SYNC.load(Ordering::Acquire) == 1 {
*c.0
} else {
0
}
});
j1.join().unwrap();
assert_eq!(j2.join().unwrap(),1);
}
} This test won't work on real hardware: nothing is guaranteeing Although my PR doesn't change the scheduler, meaning |
I see, makes sense. That makes me wonder if we should let users control the probability of reading an old value, similar to the knob we have to control the probability of cmpxchg_weak spuriously failing. |
Use atomic RMW for `{mutex, rwlock, cond, srwlock}_get_or_create_id` functions This is required for #1963 `{mutex, rwlock, cond, srwlock}_get_or_create_id()` currently checks whether an ID field is 0 using an atomic read, allocate one and get a new ID if it is, then write it in a separate atomic write. This is fine without weak memory. For instance, in `pthread_mutex_lock` which may be called by two threads concurrently, only one thread can read 0, create and then write a new ID, the later-run thread will always see the newly created ID and never 0. ```rust fn pthread_mutex_lock(&mut self, mutex_op: &OpTy<'tcx, Tag>) -> InterpResult<'tcx, i32> { let this = self.eval_context_mut(); let kind = mutex_get_kind(this, mutex_op)?.check_init()?; let id = mutex_get_or_create_id(this, mutex_op)?; let active_thread = this.get_active_thread(); ``` However, with weak memory behaviour, both threads may read 0: the first thread has to see 0 because nothing else was written to it, and the second thread is not guaranteed to observe the latest value, causing a duplicate mutex to be created and both threads "successfully" acquiring the lock at the same time. This is a pretty typical pattern requiring the use of atomic RMWs. RMW *always* reads the latest value in a location, so only one thread can create the new mutex and ID, all others scheduled later will see the new ID.
☔ The latest upstream changes (presumably #2114) made this pull request unmergeable. Please resolve the merge conflicts. |
Co-authored-by: Ralf Jung <post@ralfj.de>
Co-authored-by: Ralf Jung <post@ralfj.de>
…f UB Co-authored-by: Ralf Jung <post@ralfj.de>
997e5fb
to
1b32d14
Compare
Squashed a bit and rebased |
Awesome! Thank you so much for working on this and sticking with me through the many rounds of review. I feel like Miri will be an excellent tool for finding bugs in concurrent software. :) @bors r+ |
📌 Commit 1b32d14 has been approved by |
Thank you for reviewing this massive PR! I learnt quite a lot implementing this, and I'm glad we got to scrutinise Rust's atomics specification as a bonus |
☀️ Test successful - checks-actions |
clarify how Rust atomics correspond to C++ atomics `@cbeuw` noted in rust-lang/miri#1963 that the correspondence between C++ atomics and Rust atomics is not quite as obvious as one might think, since in Rust I can use `get_mut` to treat previously non-atomic data as atomic. However, I think using C++20 `atomic_ref`, we can establish a suitable relation between the two -- or do you see problems with that `@cbeuw?` (I recall you said there was some issue, but it was deep inside that PR and Github makes it impossible to find...) Cc `@thomcc;` not sure whom else to ping for atomic memory model things.
clarify how Rust atomics correspond to C++ atomics ``@cbeuw`` noted in rust-lang/miri#1963 that the correspondence between C++ atomics and Rust atomics is not quite as obvious as one might think, since in Rust I can use `get_mut` to treat previously non-atomic data as atomic. However, I think using C++20 `atomic_ref`, we can establish a suitable relation between the two -- or do you see problems with that ``@cbeuw?`` (I recall you said there was some issue, but it was deep inside that PR and Github makes it impossible to find...) Cc ``@thomcc;`` not sure whom else to ping for atomic memory model things.
clarify how Rust atomics correspond to C++ atomics ``@cbeuw`` noted in rust-lang/miri#1963 that the correspondence between C++ atomics and Rust atomics is not quite as obvious as one might think, since in Rust I can use `get_mut` to treat previously non-atomic data as atomic. However, I think using C++20 `atomic_ref`, we can establish a suitable relation between the two -- or do you see problems with that ``@cbeuw?`` (I recall you said there was some issue, but it was deep inside that PR and Github makes it impossible to find...) Cc ``@thomcc;`` not sure whom else to ping for atomic memory model things.
This implements the second half of the Lidbury & Donaldson paper: weak memory emulation using store buffers. A store buffer is created over a memory range on atomic access. Stores will push store elements into the buffer and loads will search through the buffer in reverse modification order, determine which store elements are valid for the current load, and pick one randomly.
This implementation will never generate weak memory behaviours forbidden by the C++11 model, but it is incapable of producing all possible weak behaviours allowed by the model. There are certain weak behaviours observable on real hardware but not while using this.
Note that this implementation does not take into account of C++20's memory model revision to SC accesses and fences introduced by P0668. This implementation is not fully correct under the revised C++20 model and may generate behaviours C++20 disallows.
Rust follows the C++20 memory model (except for the Consume ordering and some operations not performable through C++'s std::atomic API). It is therefore possible for this implementation to generate behaviours never observable when the same program is compiled and run natively. Unfortunately, no literature exists at the time of writing which proposes an implementable and C++20-compatible relaxed memory model that supports all atomic operation existing in Rust. The closest one is A Promising Semantics for Relaxed-Memory Concurrency by Jeehoon Kang et al. However, this model lacks SC accesses and is therefore unusable by Miri (SC accesses are everywhere in library code).
Safe/sound Rust allows for more operations on atomic locations than the C++20 atomic API was intended to allow, such as non-atomically accessing a previously atomically accessed location, or accessing previously atomically accessed locations with a differently sized operation (such as accessing the top 16 bits of an
AtomicU32
). These scenarios are generally left undefined in formalisations of C++ memory model, even though they became possible in C++20 withstd::atomic_ref<T>
. In Rust, these operations can only be done through a&mut AtomicFoo
reference or one derived from it, therefore these operations can only happen after all previous accesses on the same locations. This implementation is adapted to accommodate these.TODOs:
{mutex, rwlock, cond, srwlock}_get_or_create_id
functions undersrc/shims
to use atomic RMWs instead of separate read -> check if need to create a new one -> write stepsweak_memory.rs