Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary condition in Barrier::wait() #87440

Merged
merged 1 commit into from
Oct 21, 2021

Conversation

twetzel59
Copy link
Contributor

This is my first pull request for Rust, so feel free to call me out if anything is amiss.

After some examination, I realized that the second condition of the "spurious-wakeup-handler" loop in std::sync::Barrier::wait() should always evaluate to true, making it redundant in the && expression.

Here is the affected function before the fix:

#[stable(feature = "rust1", since = "1.0.0")]
pub fn wait(&self) -> BarrierWaitResult {
    let mut lock = self.lock.lock().unwrap();
    let local_gen = lock.generation_id;
    lock.count += 1;
    if lock.count < self.num_threads {
        // We need a while loop to guard against spurious wakeups.
        // https://en.wikipedia.org/wiki/Spurious_wakeup
        while local_gen == lock.generation_id && lock.count < self.num_threads { // fixme
            lock = self.cvar.wait(lock).unwrap();
        }
        BarrierWaitResult(false)
    } else {
        lock.count = 0;
        lock.generation_id = lock.generation_id.wrapping_add(1);
        self.cvar.notify_all();
        BarrierWaitResult(true)
    }
}

At first glance, it seems that the check that lock.count < self.num_threads would be necessary in order for a thread A to detect when another thread B has caused the barrier to reach its thread count, making thread B the "leader".

However, the control flow implicitly results in an invariant that makes observing !(lock.count < self.num_threads), i.e. lock.count >= self.num_threads impossible from thread A.

When thread B, which will be the leader, calls .wait() on this shared instance of the Barrier, it locks the mutex in the first line and saves the MutexGuard in the lock variable. It then increments the value of lock.count. However, it then proceeds to check if lock.count < self.num_threads. Since it is the leader, it is the case that (after the increment of lock.count), the lock count is equal to the number of threads. Thus, the second branch is immediately taken and lock.count is zeroed. Additionally, the generation ID is incremented (with wrap). Then, the condition variable is signalled. But, the other threads are waiting at the line lock = self.cvar.wait(lock).unwrap();, so they cannot resume until thread B's call to Barrier::wait() returns, which drops the MutexGuard acquired in the first let statement and unlocks the mutex.

The order of events is thus:

  1. A thread A calls .wait()
  2. .wait() acquires the mutex, increments lock.count, and takes the first branch
  3. Thread A enters the while loop since the generation ID has not changed and the count is less than the number of threads for the Barrier
  4. Spurious wakeups occur, but both conditions hold, so the thread A waits on the condition variable
  5. This process repeats for N - 2 additional times for non-leader threads A'
  6. Meanwhile, Thread B calls Barrier::wait() on the same barrier that threads A, A', A'', etc. are waiting on. The thread count reaches the number of threads for the Barrier, so all threads should now proceed, with B being the leader. B acquires the mutex and increments the value lock.count only to find that it is not less than self.num_threads. Thus, it immediately clamps self.num_threads back down to 0 and increments the generation. Then, it signals the condvar to tell the A (prime) threads that they may continue.
  7. The A, A', A''... threads wake up and attempt to re-acquire the lock as per the internal operation of a condition variable. When each A has exclusive access to the mutex, it finds that lock.generation_id no longer matches local_generation and the && expression short-circuits -- and even if it were to evaluate it, self.count is definitely less than self.num_threads because it has been reset to 0 by thread B before B dropped its MutexGuard.

Therefore, it my understanding that it would be impossible for the non-leader threads to ever see the second boolean expression evaluate to anything other than true. This PR simply removes that condition.

Any input would be appreciated. Sorry if this is terribly verbose. I'm new to the Rust community and concurrency can be hard to explain in words. Thanks!

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @kennytm (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 24, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 15, 2021
@inquisitivecrystal inquisitivecrystal added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Aug 24, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 13, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 28, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 19, 2021
@JohnTitor
Copy link
Member

Re-assigning randomly,
r? rust-lang/libs

@rust-highfive rust-highfive assigned yaahc and unassigned kennytm Oct 19, 2021
@yaahc
Copy link
Member

yaahc commented Oct 20, 2021

This is my first pull request for Rust, so feel free to call me out if anything is amiss.

Eyyy, congrats on the first PR and thank you! I'm just sorry it's taken us so long to review it, hopefully we can do better on future PRs.

Any input would be appreciated. Sorry if this is terribly verbose. I'm new to the Rust community and concurrency can be hard to explain in words. Thanks!

After about 3 readthroughs I can't find any issues with your reasoning. Also, there's no need to apologize for the verbosity, as you noted reasoning about concurrency can be particularly difficult so in this case the verbosity is very much appreciated. I don't have any issues to call out other than wanting to say thank you again for the contribution! Hopefully it will be the first of many ^_^

@bors r+

@bors
Copy link
Contributor

bors commented Oct 20, 2021

📌 Commit d65ab29 has been approved by yaahc

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 20, 2021
JohnTitor added a commit to JohnTitor/rust that referenced this pull request Oct 21, 2021
Remove unnecessary condition in Barrier::wait()

This is my first pull request for Rust, so feel free to call me out if anything is amiss.

After some examination, I realized that the second condition of the "spurious-wakeup-handler" loop in ``std::sync::Barrier::wait()`` should always evaluate to ``true``, making it redundant in the ``&&`` expression.

Here is the affected function before the fix:
```rust
#[stable(feature = "rust1", since = "1.0.0")]
pub fn wait(&self) -> BarrierWaitResult {
    let mut lock = self.lock.lock().unwrap();
    let local_gen = lock.generation_id;
    lock.count += 1;
    if lock.count < self.num_threads {
        // We need a while loop to guard against spurious wakeups.
        // https://en.wikipedia.org/wiki/Spurious_wakeup
        while local_gen == lock.generation_id && lock.count < self.num_threads { // fixme
            lock = self.cvar.wait(lock).unwrap();
        }
        BarrierWaitResult(false)
    } else {
        lock.count = 0;
        lock.generation_id = lock.generation_id.wrapping_add(1);
        self.cvar.notify_all();
        BarrierWaitResult(true)
    }
}
```

At first glance, it seems that the check that ``lock.count < self.num_threads`` would be necessary in order for a thread A to detect when another thread B has caused the barrier to reach its thread count, making thread B the "leader".

However, the control flow implicitly results in an invariant that makes observing ``!(lock.count < self.num_threads)``, i.e. ``lock.count >= self.num_threads`` impossible from thread A.

When thread B, which will be the leader, calls ``.wait()`` on this shared instance of the ``Barrier``, it locks the mutex in the first line and saves the ``MutexGuard`` in the ``lock`` variable. It then increments the value of ``lock.count``. However, it then proceeds to check if ``lock.count < self.num_threads``. Since it is the leader, it is the case that (after the increment of ``lock.count``), the lock count is *equal* to the number of threads. Thus, the second branch is immediately taken and ``lock.count`` is zeroed. Additionally, the generation ID is incremented (with wrap). Then, the condition variable is signalled. But, the other threads are waiting at the line ``lock = self.cvar.wait(lock).unwrap();``, so they cannot resume until thread B's call to ``Barrier::wait()`` returns, which drops the ``MutexGuard`` acquired in the first ``let`` statement and unlocks the mutex.

The order of events is thus:
1. A thread A calls `.wait()`
2. `.wait()` acquires the mutex, increments `lock.count`, and takes the first branch
3. Thread A enters the ``while`` loop since the generation ID has not changed and the count is less than the number of threads for the ``Barrier``
3. Spurious wakeups occur, but both conditions hold, so the thread A waits on the condition variable
4. This process repeats for N - 2 additional times for non-leader threads A'
5. *Meanwhile*, Thread B calls ``Barrier::wait()`` on the same barrier that threads A, A', A'', etc. are waiting on. The thread count reaches the number of threads for the ``Barrier``, so all threads should now proceed, with B being the leader. B acquires the mutex and increments the value ``lock.count`` only to find that it is not less than ``self.num_threads``. Thus, it immediately clamps ``self.num_threads`` back down to 0 and increments the generation. Then, it signals the condvar to tell the A (prime) threads that they may continue.
6. The A, A', A''... threads wake up and attempt to re-acquire the ``lock`` as per the internal operation of a condition variable. When each A has exclusive access to the mutex, it finds that ``lock.generation_id`` no longer matches ``local_generation`` **and the ``&&`` expression short-circuits -- and even if it were to evaluate it, ``self.count`` is definitely less than ``self.num_threads`` because it has been reset to ``0`` by thread B *before* B dropped its ``MutexGuard``**.

Therefore, it my understanding that it would be impossible for the non-leader threads to ever see the second boolean expression evaluate to anything other than ``true``. This PR simply removes that condition.

Any input would be appreciated. Sorry if this is terribly verbose. I'm new to the Rust community and concurrency can be hard to explain in words. Thanks!
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 21, 2021
Rollup of 14 pull requests

Successful merges:

 - rust-lang#86984 (Reject octal zeros in IPv4 addresses)
 - rust-lang#87440 (Remove unnecessary condition in Barrier::wait())
 - rust-lang#88644 (`AbstractConst` private fields)
 - rust-lang#89292 (Stabilize CString::from_vec_with_nul[_unchecked])
 - rust-lang#90010 (Avoid overflow in `VecDeque::with_capacity_in()`.)
 - rust-lang#90029 (Add test for debug logging during incremental compilation)
 - rust-lang#90031 (config: add the option to enable LLVM tests)
 - rust-lang#90048 (Add test for line-number setting)
 - rust-lang#90071 (Remove hir::map::blocks and use FnKind instead)
 - rust-lang#90074 (2229 migrations small cleanup)
 - rust-lang#90077 (Make `From` impls of NonZero integer const.)
 - rust-lang#90097 (Add test for duplicated sidebar entries for reexported macro)
 - rust-lang#90098 (Add test to ensure that the missing_doc_code_examples is not triggered on foreign trait implementations)
 - rust-lang#90099 (Fix MIRI UB in `Vec::swap_remove`)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit fb9232b into rust-lang:master Oct 21, 2021
@rustbot rustbot added this to the 1.58.0 milestone Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants