-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify decide_worker #6974
Simplify decide_worker #6974
Conversation
There is one related test failure |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 7h 7m 55s ⏱️ + 38m 47s For more details on these failures, see this check. Results for commit 52e0a88. ± Comparison against base commit 6a1b089. |
A plot of your data table: 1k workers isn't unreasonable. Even 10k happens sometimes. Another thing to note is that in really large clusters like that, you'd have to have a ton of root tasks for them to exceed the
I think this is the more important point. The main thing that worries me is that TaskGroups are a very brittle way of inferring graph structure. All you have to do is submit a bunch of tasks with UUIDs as keys, and they'll bypass the root task logic since they'll all belong to different TaskGroups. That again points to the importance of determining root-ish-ness from the graph itself, not task names: #6922. Basically I'd be a little hesitant to just remove this logic. Instead, I think we should first work on the definition of if len(self.idle) > 100:
ws = random.choice(self.idle.values())
else:
ws = min(self.idle.values(), key=lambda ws: len(ws.processing) / ws.nthreads) |
I know this plot looks dramatic but it is not surprising. What we do here, sorting, scales with My point is basically that when we normalize these numbers to overhead per task this doesn't feel to be too dramatic anymore since, for 10k workers we are at 1ms/task and I am doubtful that real world workloads would really notice. after all, if you are running on 10k workers I expect you're processing a lot of data. Would 1s overhead for the initial dispatch really matter? Basically, I think having fewer branches in the decide_worker logic would make reasoning about what happens much easier and that might be worth the overhead |
If we dropped this fast path we could also demote the workers dictionary to a plain |
This came up during review of #6614 (comment)
Bottom line is that this code path is only there for performance optimization and it approximates the decision performed by decide worker (it neglects held memory / ws.nbytes and ignores inhomogeneous nthreads), i.e. the decision quality is strictly better when using
worker_objective
. In these situations, poor decision are typically not a big deal, though.This code path is in a real world scenario actually pretty difficult to hit since we introduced the root task logic above. Most tasks that do not hold dependencies will follow the root task decision path, unless the group is too small to properly utilize the cluster, i.e. #tasks < #total_threads
I performed a couple of micro benchmarks on my machine (basically i extracted the methods to be a function and ran it on a couple of dicts)
This is the measurement I got on my machine. This is the time it takes to make the worker decision for 1k Tasks.
Basically, we'd slow down embarrassingly parallel submissions, e.g.
client.map(inc, range(1000))
by ~100ms if scheduled on a 1k workers cluster.Is this worth the optimization? As I said, most real world workloads that resemble this will very likely go down the root task path anyhow
Code to reproduce
cc @gjoseph92