Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-46205: exit if no workers are alive in runtest_mp #30470

Merged
merged 2 commits into from
Jan 11, 2022

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented Jan 7, 2022

There is a race condition in runtest_mp where if a worker already
pushed its final output to the queue, but is still alive, then the
the main thread waits forever on the the now-empty queue.

This re-checks the status of the workers after output.get() call times
out (after 30 seconds).

https://bugs.python.org/issue46205

There is a race condition in runtest_mp where if a worker already
pushed its final output to the queue, but is still alive, then the
the main thread waits forever on the the now-empty queue.

This re-checks the status of the workers after output.get() call times
out (after 30 seconds).
@colesbury
Copy link
Contributor Author

@vstinner, this is the simplest fix, but I'm not sure it's ideal. There are still potential race conditions between checking if any workers are alive and reading from the output queue. This means that there can be up to an extra 30 second delay (instead of an hanging forever), while waiting for self.output.get().

Here's an alternative approach that uses a sentinel worker value: colesbury/nogil@406e5d7. I'm a little worried about the more complicated approach because it looks like hangs have been an issue in the past, and I'm worried about introducing new bugs. Let me know which approach you think is better.

Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@corona10 corona10 merged commit e13cdca into python:main Jan 11, 2022
@miss-islington
Copy link
Contributor

Thanks @colesbury for the PR, and @corona10 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-30523 is a backport of this pull request to the 3.10 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Jan 11, 2022
@bedevere-bot
Copy link

GH-30524 is a backport of this pull request to the 3.9 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.9 only security fixes label Jan 11, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jan 11, 2022
(cherry picked from commit e13cdca)

Co-authored-by: Sam Gross <colesbury@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jan 11, 2022
(cherry picked from commit e13cdca)

Co-authored-by: Sam Gross <colesbury@gmail.com>
miss-islington added a commit that referenced this pull request Jan 11, 2022
(cherry picked from commit e13cdca)

Co-authored-by: Sam Gross <colesbury@gmail.com>
miss-islington added a commit that referenced this pull request Jan 11, 2022
(cherry picked from commit e13cdca)

Co-authored-by: Sam Gross <colesbury@gmail.com>
@vstinner
Copy link
Member

Thanks for the fix @colesbury!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants