-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn on queuing by default #7213
Comments
A key question: what should the default Previously, we'd discussed using After re-running some benchmarks, I'm personally still leaning towards 1.0. Comparing non-1.0* to 1.0, using a value >1.0 does not give a definitive decrease in runtime. In many tests, the difference is not statistically significant—it doesn't show up above the normal variation. The cases where it is definitively faster, it's only by a small amount, in the 2-8% range. But >1.0 unquestionably hurts memory usage. We see up to 30% higher memory usage than 1.0. Every benchmark increases in memory usage. Comparing those two charts, memory use shows a strong signal that >1.0 is worse, but runtime does not show much signal that >1.0 is better. Of course, these benchmarks are only run with one cluster configuration (10 2cpu, 8gib workers). We haven't tested how more/larger workers affects this. The main motivation for >1.0 was, "what about super slow corporate networks, or workers in multiple regions, etc?"?
All this said: from a theoretical standpoint, it does make sense that 1.1 would be somewhat faster that 1.0, so I'm surprised that we don't see a clearer runtime difference. * I used 1.2 in this case, which due to worker size was equivalent to 1.1—either one meant 1 extra task per worker. |
Do we have a measure of the scheduler<->worker RTT on coiled? |
This is wrongly posed. What about corporate networks that were set up ~5 years ago and do not receive the amount of constant attention that AWS does? |
How much does Worker latency is captured by Prometheus so we have lots of recent data there, e.g., I've also done a couple tests with scheduler and worker in different zones, and latencies looked like this: (see https://github.com/coiled/platform/issues/165 for more details on that) |
True, it may not be that dramatic. My point is just that, as far as I know, it's already not good Dask practice to have tasks which are so fast that their runtime is the same order of magnitude as network latency (whatever that latency may be). You should probably increase task size in this case regardless, and if you do, it would make the worker-saturation effect pretty much moot. All things being equal, I certainly want to improve performance in these higher-latency cases. It's just that the benchmark data makes it seem like that choice may come at a cost. And I wouldn't want to pick a default to help a less-common, bad-practice use case if it comes at the expense of a more common use-case. |
It actually would be pretty interesting to see the difference between kernel-level ping (dictated almost exclusively by network) and application-level latency, where the GIL plays a major role. If we saw that the latter eclipses the former, it would inform us that even if the network became much worse we probably wouldn't notice as long as ping remains << than GIL-dominated RTT. ~250ms for the 4th quintile in your first plot above is pretty horrid. I'd be surprised if AWS were to blame for it. |
We are currently intending to turn on scheduler-side queuing by default. This will give much better cluster memory use in most cases, and help users avoid the painful "running out of memory on simple workloads" experience #2602.
It does slow things down, but the increased stability and better out-of-the-box user experience seems worthwhile.
Specifically, this will mean changing the default
worker-saturation
value frominf
to1.0
or1.1
or something. Advanced users who need to keep old behavior can get it back by changing this config, which will be documented #7203.TODO:
decide_worker_rootish_queuing_disabled
assertion fails when retiring worker #7063AssertionError: waiting on released dep
when worker-saturation < .inf #7204secede
#7223decide_worker
could be expensive on large clusters with queuing enabled #7246test_nested_compute
#7247The text was updated successfully, but these errors were encountered: