-
-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify usage of Queues in nanny #6655
Simplify usage of Queues in nanny #6655
Conversation
distributed/nanny.py
Outdated
@@ -744,7 +741,7 @@ async def kill(self, timeout: float = 2, executor_wait: bool = True) -> None: | |||
if self.status == Status.stopping: | |||
await self.stopped.wait() | |||
return | |||
assert self.status in (Status.starting, Status.running) | |||
assert self.status in (Status.starting, Status.running, Status.failed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually hit this whenever a nanny failed. No idea if this actually has any implications or not. Not having this in actually deadlocked one of the unit tests at some point
distributed/nanny.py
Outdated
if msg["uid"] != uid: # ensure that we didn't cross queues | ||
continue | ||
raise RuntimeError("Encountered message from a different queue.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're using a new queue for every process. how could these messages ever "cross"?
Would also be good to use socket.socketpair() instead of a Queue where the queue is only owned by two processes |
one step at a time :) I tried to keep these changes as minimal as possible. I'm open to simplifying it further. I do consider the Queue API a bit simpler and I guess it is more familiar to most people. |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 20 files ± 0 20 suites ±0 11h 27m 15s ⏱️ + 8m 15s For more details on these failures, see this check. Results for commit bfbb4fa. ± Comparison against base commit 9255987. This pull request removes 2 tests.
♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! test_lots_of_tasks
flake seems to be new, but doesn't look related to me.
7ba727b
to
c276fed
Compare
879ab6c
to
bfbb4fa
Compare
This is a cleanup of the nanny around the usage of multiprocessing queues. It ensures that queues are only closed once and no exceptions are swallowed