Improve performance and space of async_std::sync::Mutex #370

nbdd0121 · 2019-10-18T19:00:24Z

Related #362

nbdd0121 · 2019-10-20T00:53:25Z

Some numbers

Before this PR:
test mutex_contention ... bench: 1,866,362 ns/iter (+/- 162,639)
test mutex_mimick_contention ... bench: 1,448 ns/iter (+/- 131)
test mutex_no_contention ... bench: 341,444 ns/iter (+/- 17,727)
test mutex_unused ... bench: 27 ns/iter (+/- 1)
Size of Mutex<()> is 64 bytes

After this PR:
test mutex_contention ... bench: 1,669,542 ns/iter (+/- 167,686)
test mutex_mimick_contention ... bench: 1,034 ns/iter (+/- 148)
test mutex_no_contention ... bench: 258,535 ns/iter (+/- 38,157)
test mutex_unused ... bench: 0 ns/iter (+/- 0)
Size of Mutex<()> is 16 bytes

The PR also makes sure Mutex<T> only uses 1 copy of its cold path across different Ts, reducing bloat.

nbdd0121 · 2019-10-20T15:16:41Z

@stjepang This PR is ready. I scrapped the plan to squeeze everything into 1 usize (this PR makes it 2 usize), because squeezing further would hurt code clarity and the code would not be reusable for RwLock.

Signed-off-by: Gary Guo <gary@garyguo.net>

All `Mutex`es now internally use `RawMutex` (which is similar to a `Mutex<()>`, only providing locking semantics but not data), therefore instantiating `Mutex`es on different types do not duplicate code. This patch does not otherwise change the algorithm used by `Mutex`. Signed-off-by: Gary Guo <gary@garyguo.net>

#[inline] are added to common and trivial functions, and slow paths are separated out from inlined hotpath. Signed-off-by: Gary Guo <gary@garyguo.net>

Signed-off-by: Gary Guo <gary@garyguo.net>

WakerListLock is an optimised version of Spinlock<WakerList> which is more efficient in performance and space. Signed-off-by: Gary Guo <gary@garyguo.net>

Signed-off-by: Gary Guo <gary@garyguo.net>

Moving the try_lock code to before touching the waker list is sound, because the waker list can only ever be accessed with `blocked` hold, so as long as we retry lock it while having `blocked` locked, we are okay. This code also set both LOCK and BLOCKED in the same atomic op. This has some performance improvements by touching the atomic variable 1 less time when inserting the entry. Signed-off-by: Gary Guo <gary@garyguo.net>

We originally check try_lock before test if opt_key is None. This commit changes its order. Doing so removes the need to deregister_waker when opt_key is None, therefore makes the hot path (un-contended case) faster. Signed-off-by: Gary Guo <gary@garyguo.net>

This makes RawLockFuture::poll itself small enough which is suitable for inlining. Signed-off-by: Gary Guo <gary@garyguo.net>

skade · 2019-10-28T03:30:03Z

src/sync/mutex.rs

@@ -219,8 +304,9 @@ impl<T> Mutex<T> {
    /// #
    /// # })
    /// ```
+    #[inline]


In general, forced inlining of trivial operations is considered an antipattern. Do we run into issues that these methods are not inlined?

#[inline] is not forcing inline (different from #[inline(always)]; it is just hinting the compiler that this should probably be inlined.

Usually it's a good idea to have the #[inline] notation if the method is going to be inlined across the crate boundary. In the case Mutex is polymorphic over T, so we may get away with it, but a monomorphic function will never be inlined across crate boundary because the compiler cannot see the underlying implementation.

Correct, I misspoke there a little.

That's correct as a hint, this is only in the absence of lto, though. Considering that libstd uses the same annotations, we should follow that practice.

dignifiedquire · 2020-04-26T16:21:41Z

@nbdd0121 we now have a Mutex implementation based on a WakerList, is this still needed?

nbdd0121 · 2020-04-26T16:34:24Z

@dignifiedquire The currently implementation is still using Slab. It'll still be better to use a linked-list based implementation for predictable performance upper bound and avoid memory leak.

However I'm quite busy recently. Maybe I'll take a look again and rebase next month.

nbdd0121 · 2020-06-21T22:20:55Z

This would be no longer necessary after #822 lands, closing.

nbdd0121 force-pushed the mutex branch 4 times, most recently from 50084a1 to 078294b Compare October 20, 2019 00:45

nbdd0121 marked this pull request as ready for review October 20, 2019 00:54

Mutex performance benchmarks

1dd0478

Signed-off-by: Gary Guo <gary@garyguo.net>

nbdd0121 force-pushed the mutex branch from 078294b to cb32973 Compare October 21, 2019 00:51

nbdd0121 added 9 commits October 21, 2019 19:55

Regain some lost performance due to de-bloating.

39ef033

#[inline] are added to common and trivial functions, and slow paths are separated out from inlined hotpath. Signed-off-by: Gary Guo <gary@garyguo.net>

Replace Slab-backed waker list with linked list

5427c41

Signed-off-by: Gary Guo <gary@garyguo.net>

Implement WakerListLock

74186ff

WakerListLock is an optimised version of Spinlock<WakerList> which is more efficient in performance and space. Signed-off-by: Gary Guo <gary@garyguo.net>

Remove the acquired bool from RawLockFuture.

44052b2

Signed-off-by: Gary Guo <gary@garyguo.net>

Unlocking the mutex only has to be Release, not AcqRel.

7a53719

Signed-off-by: Gary Guo <gary@garyguo.net>

Move RawLockFuture::poll cold path to #[cold] functions

43f598d

This makes RawLockFuture::poll itself small enough which is suitable for inlining. Signed-off-by: Gary Guo <gary@garyguo.net>

nbdd0121 force-pushed the mutex branch from cb32973 to 43f598d Compare October 21, 2019 18:55

ghost mentioned this pull request Oct 25, 2019

Add utility type WakerSet to the sync module #390

Merged

skade reviewed Oct 28, 2019

View reviewed changes

yoshuawuyts added the enhancement New feature or request label Oct 28, 2019

Matthias247 mentioned this pull request Nov 11, 2019

async_std::sync::Mutex is too heavy #362

Open

nbdd0121 closed this Jun 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance and space of async_std::sync::Mutex #370

Improve performance and space of async_std::sync::Mutex #370

nbdd0121 commented Oct 18, 2019

nbdd0121 commented Oct 20, 2019

nbdd0121 commented Oct 20, 2019

skade Oct 28, 2019

nbdd0121 Oct 28, 2019

skade Oct 28, 2019

dignifiedquire commented Apr 26, 2020

nbdd0121 commented Apr 26, 2020

nbdd0121 commented Jun 21, 2020

Improve performance and space of async_std::sync::Mutex #370

Improve performance and space of async_std::sync::Mutex #370

Conversation

nbdd0121 commented Oct 18, 2019

nbdd0121 commented Oct 20, 2019

nbdd0121 commented Oct 20, 2019

skade Oct 28, 2019

Choose a reason for hiding this comment

nbdd0121 Oct 28, 2019

Choose a reason for hiding this comment

skade Oct 28, 2019

Choose a reason for hiding this comment

dignifiedquire commented Apr 26, 2020

nbdd0121 commented Apr 26, 2020

nbdd0121 commented Jun 21, 2020