Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance and space of async_std::sync::Mutex #370

Closed
wants to merge 10 commits into from

Conversation

nbdd0121
Copy link
Contributor

Related #362

@nbdd0121
Copy link
Contributor Author

Some numbers

Before this PR:
test mutex_contention ... bench: 1,866,362 ns/iter (+/- 162,639)
test mutex_mimick_contention ... bench: 1,448 ns/iter (+/- 131)
test mutex_no_contention ... bench: 341,444 ns/iter (+/- 17,727)
test mutex_unused ... bench: 27 ns/iter (+/- 1)
Size of Mutex<()> is 64 bytes

After this PR:
test mutex_contention ... bench: 1,669,542 ns/iter (+/- 167,686)
test mutex_mimick_contention ... bench: 1,034 ns/iter (+/- 148)
test mutex_no_contention ... bench: 258,535 ns/iter (+/- 38,157)
test mutex_unused ... bench: 0 ns/iter (+/- 0)
Size of Mutex<()> is 16 bytes

The PR also makes sure Mutex<T> only uses 1 copy of its cold path across different Ts, reducing bloat.

@nbdd0121 nbdd0121 marked this pull request as ready for review October 20, 2019 00:54
@nbdd0121
Copy link
Contributor Author

@stjepang This PR is ready. I scrapped the plan to squeeze everything into 1 usize (this PR makes it 2 usize), because squeezing further would hurt code clarity and the code would not be reusable for RwLock.

Signed-off-by: Gary Guo <gary@garyguo.net>
All `Mutex`es now internally use `RawMutex` (which is similar to a
`Mutex<()>`, only providing locking semantics but not data), therefore
instantiating `Mutex`es on different types do not duplicate code.

This patch does not otherwise change the algorithm used by `Mutex`.

Signed-off-by: Gary Guo <gary@garyguo.net>
#[inline] are added to common and trivial functions, and slow paths are
separated out from inlined hotpath.

Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
WakerListLock is an optimised version of Spinlock<WakerList> which is
more efficient in performance and space.

Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
Moving the try_lock code to before touching the waker list is
sound, because the waker list can only ever be accessed with `blocked`
hold, so as long as we retry lock it while having `blocked` locked, we
are okay.

This code also set both LOCK and BLOCKED in the same atomic op. This
has some performance improvements by touching the atomic variable 1
less time when inserting the entry.

Signed-off-by: Gary Guo <gary@garyguo.net>
We originally check try_lock before test if opt_key is None. This commit
changes its order. Doing so removes the need to deregister_waker when
opt_key is None, therefore makes the hot path (un-contended case) faster.

Signed-off-by: Gary Guo <gary@garyguo.net>
This makes RawLockFuture::poll itself small enough which is suitable
for inlining.

Signed-off-by: Gary Guo <gary@garyguo.net>
@@ -219,8 +304,9 @@ impl<T> Mutex<T> {
/// #
/// # })
/// ```
#[inline]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, forced inlining of trivial operations is considered an antipattern. Do we run into issues that these methods are not inlined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[inline] is not forcing inline (different from #[inline(always)]; it is just hinting the compiler that this should probably be inlined.

Usually it's a good idea to have the #[inline] notation if the method is going to be inlined across the crate boundary. In the case Mutex is polymorphic over T, so we may get away with it, but a monomorphic function will never be inlined across crate boundary because the compiler cannot see the underlying implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I misspoke there a little.

That's correct as a hint, this is only in the absence of lto, though. Considering that libstd uses the same annotations, we should follow that practice.

@yoshuawuyts yoshuawuyts added the enhancement New feature or request label Oct 28, 2019
@dignifiedquire
Copy link
Member

@nbdd0121 we now have a Mutex implementation based on a WakerList, is this still needed?

@nbdd0121
Copy link
Contributor Author

@dignifiedquire The currently implementation is still using Slab. It'll still be better to use a linked-list based implementation for predictable performance upper bound and avoid memory leak.

However I'm quite busy recently. Maybe I'll take a look again and rebase next month.

@nbdd0121
Copy link
Contributor Author

This would be no longer necessary after #822 lands, closing.

@nbdd0121 nbdd0121 closed this Jun 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants