Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utility type WakerSet to the sync module #390

Merged
9 commits merged into from Nov 1, 2019
Merged

Add utility type WakerSet to the sync module #390

9 commits merged into from Nov 1, 2019

Conversation

ghost
Copy link

@ghost ghost commented Oct 25, 2019

This type is shared among Mutex and channel for now, but would also be useful for implementing RwLock and Barrier - it'd make the code cleaner and more efficient at the same time!

We can integrate Registry into Mutex and channel in a follow-up PR. It might even be useful for implementing Condvar.

Comparing benchmarks from PR #370...

master branch:

test mutex_contention        ... bench:   3,535,766 ns/iter (+/- 284,716)
test mutex_mimick_contention ... bench:       1,686 ns/iter (+/- 414)
test mutex_no_contention     ... bench:     347,171 ns/iter (+/- 58,254)
test mutex_unused            ... bench:          35 ns/iter (+/- 2)

PR #370:

test mutex_contention        ... bench:   1,943,831 ns/iter (+/- 175,483)
test mutex_mimick_contention ... bench:       1,242 ns/iter (+/- 152)
test mutex_no_contention     ... bench:     261,742 ns/iter (+/- 66,615)
test mutex_unused            ... bench:           0 ns/iter (+/- 0)

This PR:

test mutex_contention        ... bench:   1,430,417 ns/iter (+/- 121,264)
test mutex_mimick_contention ... bench:       1,359 ns/iter (+/- 142)
test mutex_no_contention     ... bench:     277,573 ns/iter (+/- 46,106)
test mutex_unused            ... bench:           3 ns/iter (+/- 0)

@ghost
Copy link
Author

ghost commented Oct 25, 2019

@nbdd0121 It would be interesting to see how to consolidate efforts of this PR and PR #370. They're not necessarily in conflict - in fact, they're probably complementary.

This PR factors out the Registry utility that is useful in almost all synchronization primitives, not just Mutex. PR #370 essentially improves on this Registry before it was factored out.

@nbdd0121
Copy link
Contributor

It might even be useful for implementing Condvar.

Condvar is likely just going to be implemented as a thin wrapper around Registry, with wait call also handling unlock and re-lock.

The Registry type is pretty much similar to my WakerListLock type, but with better API design. Most improvements in #370 still applies, such as using linked list, and various techniques used to shorten hot path and de-bloat the generated code. I can surely rebase my PR on top on this.

Some of my concerns:

  • About NOTIFY_ONE and NOTIFY_ALL, is there any reason for the change? I can't think of a simple case where it is useful without spurious wake ups. If they're concrete cases supporting it over a simple empty flag I'd be glad to hear.
  • The code have quite a lot SeqCst, which is somehow worrying as their performance isn't ideal on non-TSO, weak memory consistency model systems.
  • Spinlock type is removed. It might be useful to have a copy of it left in sync module to be used by the entire crate, e.g. Use AtomicCell<u64> on targets with target_has_atomic less than 64 #286.

I am quite surprised about the performance improvement by this PR. I think mostly it's due changing the type to AtomicBool. I can dig further on the underlying reason.

@ghost
Copy link
Author

ghost commented Oct 25, 2019

  • About NOTIFY_ONE and NOTIFY_ALL, is there any reason for the change? I can't think of a simple case where it is useful without spurious wake ups. If they're concrete cases supporting it over a simple empty flag I'd be glad to hear.

The reason is that empty flag doesn't tell us much. If the list of registered items is not empty, a notify_one() operation will lock the entry list, get the first entry, and wake it. But it's possible that the first entry was already woken up and contains a None. In that case, we've just wasted time locking the entry list.

In other words, a notify_one() operation has work to do only when the list is non-empty and all items are Some. Similarly, a notify_all() operation has work to do only if the list is non-empty and there is at least one Some entry.

  • The code have quite a lot SeqCst, which is somehow worrying as their performance isn't ideal on non-TSO, weak memory consistency model systems.

Unfortunately, I don't know how to do this more elegantly without SeqCst. Any ideas?

It's not used anywhere anymore, but I have no reservations about keeping it if it is useful.

@nbdd0121
Copy link
Contributor

The reason is that empty flag doesn't tell us much. If the list of registered items is not empty, a notify_one() operation will lock the entry list, get the first entry, and wake it. But it's possible that the first entry was already woken up and contains a None. In that case, we've just wasted time locking the entry list.

In other words, a notify_one() operation has work to do only when the list is non-empty and all items are Some. Similarly, a notify_all() operation has work to do only if the list is non-empty and there is at least one Some entry.

But considering that the item will only be turned into None when they're waken up, and they will either re-register, cancel, or complete in such case, surely most times when we call notify, emptiness usually is a good-enough indicator of whether we need to do any work?

The main reason that I think empty flag is better is that when Registry is switched to use linked list rather than slab, emptiness checking is essentially free and requires no special book-keeping to manage.

@nbdd0121
Copy link
Contributor

Unfortunately, I don't know how to do this more elegantly without SeqCst. Any ideas?

You're right. I am being naive originally thinking some sort of Acq/Rel would be sufficient. I ended up spending a few hours reading through relevant standards and research papers, and realized that this cannot be done without protecting the read in some sort of lock. The fundamental reason is that there are no easy way to turn thread-local sequenced-before relation to inter-thread happens before other than using sequential consistency.

I think the current Mutex's approach of having 1 bit indicating if anything has been blocked is much easier to reason about, as everything is protected by locks.

@nbdd0121
Copy link
Contributor

I don't think it's a Map. It's more like a set (or a list, if backed by a linked list rather than a slab).

I am not convinced by replacing cancel with remove and notify_one. If the waker is never woken up and cancelled, we don't need to wake up anyone.

In general to prefer the old names to the current one. Or maybe we can call the methods park/unpark.

@ghost
Copy link
Author

ghost commented Oct 27, 2019

But considering that the item will only be turned into None when they're waken up, and they will either re-register, cancel, or complete in such case, surely most times when we call notify, emptiness usually is a good-enough indicator of whether we need to do any work?

That's true, but I did find some performance benefits under heavy contention where we avoid locking the entry list when there's nothing really to notify.

Although, I think checking if the first entry is turned into None rather than all of them should work just as well as an optimization...

@ghost
Copy link
Author

ghost commented Oct 27, 2019

I am not convinced by replacing cancel with remove and notify_one. If the waker is never woken up and cancelled, we don't need to wake up anyone.

You're right, that was an oversight.

Okay, so we need the following methods:

  • insert(waker)
  • update(waker, key)
  • complete(key)
  • cancel(key)

Do these method names sound okay?

I renamed the type to WakerMap after suggestion by @yoshuawuyts because it's essentially a mapping from usize to Option<Waker>. The name Registry is a bit meh because it's so generic, wheres I feel it should contain "waker" somewhere in its name tobe a bit more specific... :)

@ghost ghost requested a review from yoshuawuyts October 27, 2019 18:02
@nbdd0121
Copy link
Contributor

Do these method names sound okay?

I renamed the type to WakerMap after suggestion by @yoshuawuyts because it's essentially a mapping from usize to Option. The name Registry is a bit meth because it's so generic, wheres I feel it should contain "waker" somewhere in its name tobe a bit more specific... :)

Vec is also kinda mapping from usize to Option<Waker> as well... The usize is just a handle to locate the waker. So I don't think WakerMap is a good name. Would WakerRegistry be a sensible name?

For the method name I tend to stick with register and reregister, which reflects more information than insert and update, and makes more sense when paired with methods like notify, complete or cancel.

@yoshuawuyts
Copy link
Contributor

@nbdd0121 API-wise the difference between a "map" and a "list" is that with a list you provide the key, but with a map the key is provided for you:

// Map -- key is generated for you.
fn insert(&mut self, val) -> key;

// List -- you provide the key.
// Operations like `push` are just shorthands on top of this.
fn insert(&mut self, index, val);

In this case the key is generated, which means it's more similar to a HashMap than to a Vec.

@nbdd0121
Copy link
Contributor

@nbdd0121 API-wise the difference between a "map" and a "list" is that with a list you provide the key, but with a map the key is provided for you:

// Map -- key is generated for you.
fn insert(&mut self, val) -> key;

// List -- you provide the key.
// Operations like `push` are just shorthands on top of this.
fn insert(&mut self, index, val);

In this case the key is generated, which means it's more similar to a HashMap than to a Vec.

Not really. A map can never generate a key, because the key is always supplied by the user.

// Map: User must supply the key
fn insert(&mut self, k: K, v: V)

You cannot insert anything to a map without providing a key.

On the other hand, set (or vector and list) can return an iterator upon insertion:

// Set: An iterator is provided to the user.
std::pair<iterator,bool> insert( value_type&& value );

The fact that we use a usize and call it a key does not just turn the whole concept from a Set into a Map. In fact, we're just working around Rust's lifetime constraints which made Rust incapable of representing C++'s iterator safely.

@ghost
Copy link
Author

ghost commented Oct 27, 2019

What if we just call it WakerSet?

@yoshuawuyts yoshuawuyts added the enhancement New feature or request label Oct 28, 2019
@ghost
Copy link
Author

ghost commented Oct 30, 2019

Can we do another round of review and merge if this looks good?

@nbdd0121 I still want us to have your optimizations on top of this PR :) I'm also particularly interested in having a doubly linked list because it allows us to have stronger fairness guarantees than what slab provides (i.e. the first waker to be inserted should be woken first).

In fact, it'd be great if we just removed the slab crate as a dependency and replaced it with doubly-linked lists everywhere :) I guess what I'm looking for is our own implementation of a subset of Slab based on a doubly-linked list.

What do you think? Sorry for overriding your already submitted PR, especially given how much effort you've put into it :(

flag |= NOTIFY_ALL;
}

// Use `SeqCst` ordering to synchronize with `WakerSet::lock_to_notify()`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably mean WakerSet::notify_{one, all} here

@nbdd0121
Copy link
Contributor

MutexGuard::drop uses AcqRel now, but it probably needs to be SeqCst. Acquire can only establish an inter-thread happens-before order if we use its value to make decisions. The current code works because swap(false, SeqCst) and swap(false, AcqRel) compiles to the same sequence on x64.

This leads to a broader issue. Currently it's very difficult to use WakerSet correctly, due to NOTIFY_ONE and NOTIFY_ALL fields. Maybe we still want the old approach, to store the BLOCKED bits inside the mutex, and let WakerSet logically be strictly identical to a Mutex<Inner>.

Copy link
Contributor

@yoshuawuyts yoshuawuyts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API-wise this looks good to me. But my knowledge of Atomics is limited, so I can't comment on any of that. It seems @nbdd0121's feedback should probably be addressed before this can be merged.

Still approving this since I don't have any feedback left at this point.

@yoshuawuyts yoshuawuyts changed the title Add utility type Registry to the sync module Add utility type WakerSet to the sync module Oct 31, 2019

/// Set when the entry list is locked.
#[allow(clippy::identity_op)]
const LOCKED: usize = 1 << 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively the clippy lint could be removed by writing

const LOCKED: usize = 0b001;
const NOTIFY_ONE: usize = 0b010;
const NOTIFY_ALL: usize = 0b100;

@ghost
Copy link
Author

ghost commented Oct 31, 2019

I switched to SeqCst orderings now, that suggestion was on spot. Thank you for pointing the issue out! I also switched to SeqCst in the try_lock() method to be on the safe side. We similarly use SeqCst in try_recv() and try_send() inside channels.

@nbdd0121 I agree the previous approach with BLOCKED is easier to follow on its own, but don't think the one in this PR is significantly worse. In fact, it looks exactly the same as in channels and this approach is unifying the same pattern across many synchronization primitives.

Here are latest benchmarks. PR #370 :

test contention    ... bench:   1,927,725 ns/iter (+/- 248,164)
test create        ... bench:           0 ns/iter (+/- 0)
test no_contention ... bench:     243,104 ns/iter (+/- 50,678)

This PR:

test contention    ... bench:   1,286,675 ns/iter (+/- 153,616)
test create        ... bench:           3 ns/iter (+/- 0)
test no_contention ... bench:     251,007 ns/iter (+/- 173,492)

The difference in the highly contended cases is noticeable, at the expense of a smaller speed bump in uncontended cases, even with so many SeqCst orderings.

@ghost ghost merged commit 87de4e1 into async-rs:master Nov 1, 2019
@ghost ghost deleted the refactor-sync branch November 1, 2019 01:45
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants