-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-46205: exit if no workers are alive in runtest_mp #30470
Conversation
There is a race condition in runtest_mp where if a worker already pushed its final output to the queue, but is still alive, then the the main thread waits forever on the the now-empty queue. This re-checks the status of the workers after output.get() call times out (after 30 seconds).
@vstinner, this is the simplest fix, but I'm not sure it's ideal. There are still potential race conditions between checking if any workers are alive and reading from the output queue. This means that there can be up to an extra 30 second delay (instead of an hanging forever), while waiting for Here's an alternative approach that uses a sentinel worker value: colesbury/nogil@406e5d7. I'm a little worried about the more complicated approach because it looks like hangs have been an issue in the past, and I'm worried about introducing new bugs. Let me know which approach you think is better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @colesbury for the PR, and @corona10 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10. |
GH-30523 is a backport of this pull request to the 3.10 branch. |
GH-30524 is a backport of this pull request to the 3.9 branch. |
(cherry picked from commit e13cdca) Co-authored-by: Sam Gross <colesbury@gmail.com>
(cherry picked from commit e13cdca) Co-authored-by: Sam Gross <colesbury@gmail.com>
Thanks for the fix @colesbury! |
There is a race condition in runtest_mp where if a worker already
pushed its final output to the queue, but is still alive, then the
the main thread waits forever on the the now-empty queue.
This re-checks the status of the workers after output.get() call times
out (after 30 seconds).
https://bugs.python.org/issue46205