-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store ready and constrained tasks in heapsets #6711
Conversation
crusaderky
commented
Jul 11, 2022
- Closes Tasks with resource constraints ignore priority #6137
- Closes Resources are ignored if a task is cancelled and then added again #6710
bfc7ca5
to
09631ae
Compare
09631ae
to
1ddce73
Compare
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 6h 49m 48s ⏱️ - 12m 41s For more details on these failures, see this check. Results for commit 1ddce73. ± Comparison against base commit 91a9e5d. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this refactor! _next_ready_task
is great, and using HeapSet
s makes things simpler and more readable.
I'm concerned about this making test_steal_concurrent_simple
and maybe other stealing tests flaky. Indeed, that seems to be happening: https://github.com/dask/distributed/runs/7288303161. Adding something like popright
to HeapSet
would be a very simple fix that would let us get this in quickly (instead of having to rewrite those tests to not rely on race conditions).
@@ -2182,12 +2175,11 @@ def _transition_constrained_executing( | |||
self, ts: TaskState, *, stimulus_id: str | |||
) -> RecsInstrs: | |||
if self.validate: | |||
assert ts.state == "constrained" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I thought these sorts of assertions were superfluous (the transition function is guaranteed to only be called if this is the case (unless of course it's been registered for tasks of the wrong state))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily; there are a lot of _transition_*
methods that are called directly from other _transition_
methods.
distributed/tests/test_steal.py
Outdated
@@ -1063,12 +1063,12 @@ async def test_steal_concurrent_simple(c, s, *workers): | |||
await asyncio.sleep(0.1) | |||
|
|||
# ready is a heap but we don't need last, just not the next | |||
_, victim_key = w0.state.ready[-1] | |||
victim_key = w0.state.ready.peek().key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above would make me think peek
isn't right for these (since it is the next). Seems like this test is relying on race conditions already (that move_task_request
happens faster than slowinc
); picking the first task might significantly increase the probability of those race conditions being triggered. I would think assert ws1.has_what
may fail intermittently if ws0
starts executing victim_key
before move_task_request
lands?
In #6598 I added a HeapSet.popright
method which might be useful here: https://github.com/dask/distributed/pull/6598/files#diff-8d01f1d2f0aff66e9a55d31f036edddb0c44d82c8bfcc1eefc015b9ac8d661edR100-R109. Note that this isn't a PR I'd ever expect to merge, so not sure if popright
would ever have any usage beyond this test—just mentioning that it's come up once before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks for spotting it!