Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Mutex::try_lock #71

Merged
merged 2 commits into from
Jun 28, 2022
Merged

Implement Mutex::try_lock #71

merged 2 commits into from
Jun 28, 2022

Conversation

bkragl
Copy link
Contributor

@bkragl bkragl commented May 25, 2022

Without try_lock it was easier to justify context switches, because
acquire was a right mover (we only needed a context switch before) and
release was a left mover (we only needed a context switch after).
However, with try_lock that is not the case anymore. This commit
argues why we need a context switch at the end of lock and try_lock
(both in the success and failure case), and why we do not need a context
switch at the beginning of try_lock and MutexGuard::drop.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

tests/basic/mutex.rs Outdated Show resolved Hide resolved
tests/basic/mutex.rs Outdated Show resolved Hide resolved
src/sync/mutex.rs Outdated Show resolved Hide resolved
src/sync/mutex.rs Outdated Show resolved Hide resolved
src/sync/mutex.rs Outdated Show resolved Hide resolved
src/sync/mutex.rs Outdated Show resolved Hide resolved
Copy link
Member

@jamesbornholt jamesbornholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the broken tests are because we have an extra yield point now, so existing schedules don't replay (in which case we should bump the minor version on the next release) and some of the tests that count context switches are now wrong.

This extra yield point also makes the clock_condvar_notify_all_dfs test extremely slow; wonder if we can do something about that.

src/sync/mutex.rs Outdated Show resolved Hide resolved
src/sync/mutex.rs Outdated Show resolved Hide resolved
src/sync/mutex.rs Outdated Show resolved Hide resolved
src/sync/mutex.rs Outdated Show resolved Hide resolved
src/sync/mutex.rs Outdated Show resolved Hide resolved
tests/basic/mutex.rs Outdated Show resolved Hide resolved
tests/basic/mutex.rs Outdated Show resolved Hide resolved
tests/basic/mutex.rs Outdated Show resolved Hide resolved
tests/basic/mutex.rs Outdated Show resolved Hide resolved
@bkragl
Copy link
Contributor Author

bkragl commented Jun 2, 2022

I think all the broken tests are because we have an extra yield point now, so existing schedules don't replay (in which case we should bump the minor version on the next release) and some of the tests that count context switches are now wrong.

I fixed all tests.

This extra yield point also makes the clock_condvar_notify_all_dfs test extremely slow; wonder if we can do something about that.

I just did an experiment on my local machine. On the current main branch, clock_condvar_notify_all_dfs takes 60 seconds. On this PR (including the revision of doing a context switch before lock only in case we need to block) it takes "only" 40 seconds.

src/sync/mutex.rs Outdated Show resolved Hide resolved
src/sync/mutex.rs Outdated Show resolved Hide resolved

// Update the vector clock stored in the Mutex, because future threads that manage to
// acquire the lock have a causal dependency on this failed `try_lock`.
ExecutionState::with(|s| state.clock.update(s.get_clock(me)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that I had made an implicit optimization with vector clock updates. When a thread successfully acquires a lock (using lock()), it does not update the Mutex's clock with its own clock at the time of acquisition. This was fine in the absence of try_lock, as all other threads would be blocked anyway. But with try_lock, the Mutex's clock has to be updated with the thread's clock in both the else branch below (line 142) and in the lock() call above (line 103).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In try_lock that update was already done in the successful case, so I factored it out from the if branch to be done in both cases.

/// T0 blocks). The following computation tree illustrates all interleavings.
///
/// ```
/// digraph G {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

tests/basic/mutex.rs Outdated Show resolved Hide resolved
Without `try_lock` it was easier to justify context switches, because
acquire was a right mover (we only needed a context switch before) and
release was a left mover (we only needed a context switch after).
However, with `try_lock` that is not the case anymore. This commit
argues why we need a context switch at the end of `lock` and `try_lock`
(both in the success and failure case), and why we do not need a context
switch at the beginning of `try_lock` and `MutexGuard::drop`.
Comment on lines 66 to 71
// Note that we only need a context switch when we are blocked, but not if the lock is
// available. Consider that there is another thread `t` that also wants to acquire the
// lock. At the last context switch (where we were chosen), `t` must have been already
// runnable and could have been chosen by the scheduler instead. Also, if we want to
// re-acquiring the lock immediately after having it released, we know that the release
// had a context switch that allowed other threads to acquire in between.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change I think we can move self.waiters.insert(me) above into the branch, and that probably lends itself to a cleaner way to handle the state here.

Also, is it true that if we don't enter this branch, then state.waiters == {me} (or {} if you move that insert)? Maybe we should assert that invariant.

Comment on lines 69 to 70
// runnable and could have been chosen by the scheduler instead. Also, if we want to
// re-acquiring the lock immediately after having it released, we know that the release
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// runnable and could have been chosen by the scheduler instead. Also, if we want to
// re-acquiring the lock immediately after having it released, we know that the release
// runnable and could have been chosen by the scheduler instead. Also, if we want to
// re-acquire the lock immediately after releasing it, we know that the release


#[test]
#[should_panic(expected = "nothing to get")]
fn replay_roducer_consumer_broken1() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn replay_roducer_consumer_broken1() {
fn replay_producer_consumer_broken1() {


#[test]
#[should_panic(expected = "deadlock")]
fn replay_roducer_consumer_broken2() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn replay_roducer_consumer_broken2() {
fn replay_producer_consumer_broken2() {

add_thread.join().unwrap();
mul_thread.join().unwrap();

let value = *lock.try_lock().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be slightly more idiomatic/explicit to do

let value = Arc::try_unwrap(lock).unwrap().into_inner().unwrap();

(and same thing in the other tests) -- we're not trying to test anything about try_lock here

#[test]
#[should_panic(expected = "tried to acquire a Mutex it already holds")]
fn double_lock() {
check(|| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to get rid of the round-robin scheduler:

Suggested change
check(|| {
check_dfs(|| {


#[test]
fn double_try_lock() {
check(|| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
check(|| {
check_dfs(|| {

@jamesbornholt jamesbornholt merged commit 8bd99ef into awslabs:main Jun 28, 2022
@bkragl bkragl deleted the try_lock branch January 25, 2023 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants