Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid race condition in OnceMap #3987

Merged
merged 4 commits into from
Jun 3, 2024
Merged

Conversation

ibraheemdev
Copy link
Member

@ibraheemdev ibraheemdev commented Jun 3, 2024

Summary

Fixes a race condition in OnceMap::wait_blocking where the inserted value could potentially be missed, leading to a deadlock. Fairly certain this will resolve #3724.

@ibraheemdev ibraheemdev requested a review from konstin June 3, 2024 15:24
@charliermarsh
Copy link
Member

So the race here is that the value could be filled between the time we fetch it from the map and set up the notifier?

@ibraheemdev
Copy link
Member Author

ibraheemdev commented Jun 3, 2024

@charliermarsh yes, checking the map again after setting up the notifier is the crucial bit. If the value hasn't been inserted yet, the thread is already in the waiter list ready to be notified. This is a pretty standard algorithm for blocking in a concurrent data-structure (check, register waiter, check again, wait).

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I buy what you're selling here. Great find. Notify is quite a bit more subtle than I had thought.

@@ -46,41 +47,61 @@ impl<K: Eq + Hash, V: Clone, H: BuildHasher + Clone> OnceMap<K, V, H> {
///
/// Will hang if [`OnceMap::done`] isn't called for this key.
pub async fn wait(&self, key: &K) -> Option<V> {
let entry = self.items.get(key)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to reason through how a deadlock could happen here. My understanding is that entry here is actually a dashmap::mapref::one::Ref, and that in turn holds a lock while it's alive. (The dashmap docs are woefully incomplete, but it's what the implementation suggests.) If that's true, then once a get happens, then it shouldn't be possible for a done call to insert anything before the case analysis below, right?

Oh... wait... OK. I now see the drop(entry) below just before the notify.notified().await. So the deadlock is that between drop(entry) and notify.notified().await, a done call inserts a filled entry for the given key and notifies waiters before notify.notified() has a chance to register the waiter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a done call inserts a filled entry for the given key and notifies waiters before notify.notified() has a chance to register the waiter.

Yes, exactly.


// Prepare to wait.
let mut notification = pin!(notify.notified());
notification.as_mut().enable();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is necessary? We are using Notify::notify_waiters, and it seems like that doesn't benefit from Notified::enable, since notify_waiters doesn't "store" permits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, I'm wondering if its superfluous (not deleterious).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary, calling notify.notified() does nothing but construct the future lazily. enable is what actually registers the waiter.

Copy link
Member Author

@ibraheemdev ibraheemdev Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I might be wrong on that, from the docs:

The Notified future is guaranteed to receive wakeups from notify_waiters() as soon as it has been created, even if it has not yet been polled.

But looking at the source code I'm not sure how that is guaranteed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but the docs for Notify::notify_waiters suggests otherwise. And, specifically, not Notify::notify_{one,last}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Notified future is guaranteed to receive wakeups from notify_waiters() as soon as it has been created, even if it has not yet been polled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah whoops, my comments came in after your last two, but I hadn't seen those.

Yeah I'm just going on docs. Not on implementation.

I'd be inclined to leave out superfluous things because it can make things more confusing, but if we don't know it's superfluous we could leave it out and see whether the deadlocks re-appear. Or if you're feeling more conservative, add a comment explaining why it's there even though a read of the docs suggests it isn't needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, Tokio pulls a little trick here by tracking how many times notify_waiters has been called. I missed that, you're correct in that we don't need the enable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comments to make it clear this only works with notify_waiters.

};

// Wait until the value is inserted.
notification.await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I buy this, because acquiring the Notification has been de-coupled from awaiting on the Notification. And notify_waiters specifically says:

The purpose of this method is to notify all already registered waiters. Registering for notification is done by acquiring an instance of the Notified future via calling notified().

But my reading here is still that notification.as_mut().enable() is not needed here. Not unless we're using notify_{one,last} somewhere.

Copy link

codspeed-hq bot commented Jun 3, 2024

CodSpeed Performance Report

Merging #3987 will improve performances by 6.72%

Comparing ibraheemdev:deadlock (8ea913b) with main (29ea5d5)

Summary

⚡ 1 improvements
✅ 12 untouched benchmarks

Benchmarks breakdown

Benchmark main ibraheemdev:deadlock Change
wheelname_tag_compatibility[flyte-short-incompatible] 926.7 ns 868.3 ns +6.72%

@ibraheemdev ibraheemdev merged commit 1ffe18d into astral-sh:main Jun 3, 2024
46 checks passed
@zanieb
Copy link
Member

zanieb commented Jun 4, 2024

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock during resolution
4 participants