-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Withhold root tasks [no co assignment] #6614
Withhold root tasks [no co assignment] #6614
Conversation
Idea was that if a `SortedSet` of unrunnable tasks is too expensive, then insertion order is probably _approximately_ priority order, since higher-priority (root) tasks will be scheduled first. This would give us O(1) for all necessary operations, instead of O(logn) for adding and removing. Interestingly, the SortedSet implementation could be hacked to support O(1) `pop` and `popleft`, and inserting a min/max value. In the most common case (root tasks), we're always inserting a value that's greater than the max. Something like this might be the best tradeoff, since it gives us O(1) in the common case but still maintains the sorted gaurantee, which is easier to reason about.
Now task states on the dashboard are listed in the logical order that tasks transition through.
Simpler, though I think basically just an int of 1 may be the most reasonable.
This is just a hack currently, but maybe it would actually be useful?
This reverts commit df11f719b59aad11f39a27ccae7b2fd4dfd9243a.
When there were multiple root task groups, we were just re-using the last worker for every batch because it had nothing processing on it. Unintentionally this also fixes dask#6597 in some cases (because the first task goes to processing, but we measure queued, so we pick the same worker for both task groups)
This reverts commit fdd5fd9.
1. The family metric itself is flawed. Added linear chain traversal, but it's still not good. The maxsize is problematic and probably the wrong way to think about it? a) there's quite likely no maxsize parameter that will ever be right, because you could always have multiple independent crazy substructures that are each maxsize+1. b) even when every task would be in the same family because they're all interconnected, there's still benefit to scheduling subsequent things together, even if you do partition. Minimizing priority partitions is always what you want. Maybe there's something where maxsize is not a hard cutoff, but a cutoff for where to split up interconnected structures? 2. Families probably need to be data structures? When a task completes, you'd like to know if it belongs to a family that actually has more tasks to run on that worker, vs the task just happens to look like it belongs to a family but was never scheduled as a rootish task. Overall I like the family structure for scheduling up/down scaling, but figuring out how to identify them is tricky. Partitioning priority order is great because it totally avoids this problem, of course at the expense of scaling. Can we combine priority and graph structure to identify isolated families when reasonable, partition on priority when not?
Update docstring and add back logic for queuing disabled case
Just easier to explain this way
I think this fix is reasonable? I wonder if occupancy should include queued tasks though?
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 7h 1m 15s ⏱️ + 18m 18s For more details on these failures, see this check. Results for commit 093d7dc. ± Comparison against base commit 817ead3. ♻️ This comment has been updated with latest results. |
feeling pretty good about just `test_graph_execution_width`
Using it as an API saves having to manage `running` and `idle` in multiple places
hesitant on this, but I don't want to introduce a flaky test
@fjetter I believe all comments have been addressed. For tests, I went with just |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed another commit to address a merge conflict. If CI is green(ish) I'll go ahead and merge
We discussed this early on, before we even started implementation. We agreed to merge this behind the feature flag since this will not change the behavior compared to main. |
@gjoseph92 (and everyone else involved here), thank you! How do I test it out ;)?
Is this right? What range of values should I provide: |
There are many different ways to set dask config, and depending on how your clusters are deployed (local vs dask-gateway/pangeo vs dask-cloudprovider vs coiled, etc.), the way to set that config will vary. Often though, the easiest can be to set an environment variable on the cluster. (Pangeo / dask-gateway docs, coiled docs, saturn docs, dask-cloudprovider seems to support $ DASK_DISTRIBUTED__SCHEDULER__WORKER_SATURATION=1.0 dask-scheduler For a local cluster, when creating your cluster, you can just do: with dask.config.set({"distributed.scheduler.worker-saturation": 1.0}):
client = distributed.Client(n_workers=..., threads_per_worker=...) If you can't get the config to work, it's possible to change the setting on a live cluster. You could also use this to try different settings without re-creating the cluster. Only run this while the scheduler is idle (no tasks). Otherwise, you'll probably break your cluster. # enable queuing (new behavior)
client.run_on_scheduler(lambda dask_scheduler: setattr(dask_scheduler, "WORKER_SATURATION", 1.0))
# disable queuing (old behavior)
client.run_on_scheduler(lambda dask_scheduler: setattr(dask_scheduler, "WORKER_SATURATION", float("inf")))
I would try the 1.0 - 2.0 range. I would expect 1.0 to usually be what you want. We are doing some benchmarking, and hopefully will figure out what a good value is across the board, and remove the need/ability to set this value in the future. @dcherian and anyone who tries this, please report back with your findings, regardless of what they are! We would really like to hear how this works on real-world uses. |
Just checking in here, @dcherian any luck trying things out? Happy to help out |
Great to see this merged (and exciting to see Deepak's results too)! Now that we have 2022.9.2 on the LEAP hub (2i2c-org/infrastructure#1769), I'm trying this out there. Unfortunately I can't seem to set the worker saturation option successfully. 😕 Setting via the gateway cluster options manager isn't working - if I do this
the cluster starts as expected, but if I do this
then it hangs indefinitely on cluster creation. I'm not quite sure where or when I'm supposed to run this I also tried |
Co-authored-by: crusaderky <crusaderky@gmail.com> Co-authored-by: fjetter <fjetter@users.noreply.github.com>
This PR withholds root tasks on the scheduler in a global priority queue. Non-root tasks are unaffected.
Workers are only sent as many root tasks as they have threads, by default. This factor can be configured via
distributed.scheduler.worker-saturation
(1.5 would send workers 1.5x as many tasks than they have threads, for example). Setting this config value toinf
completely disables scheduler-side queuing and retains the current scheduling behavior(minus co-assignment).This disregards root task co-assignment. Benchmarking will determine whether fixing root task overproduction is enough of a gain to be worth giving up (flawed) co-assignment. Root task assignment here is typically worst-possible-case: neighboring tasks will usually all be assigned to different workers.
I also could/will easily add back co-assignment whenEDIT: done. With that, this PR would be entirely feature-flaggable—we could merge it with the default set todistributed.scheduler.worker-saturation
isinf
inf
and see zero change in scheduling out of the box.Closes #6560, closes #6631, closes #6597 (with withholding mode turned off at least)
Supersedes #6584, which did the same, but for all tasks (even non-root). It also co-mingled
unrunnable
tasks (due to restrictions) and queued root tasks, which seemed unwise.pre-commit run --all-files