Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask order and task queuing aftermath #8225

Open
fjetter opened this issue Oct 2, 2023 · 0 comments
Open

Dask order and task queuing aftermath #8225

fjetter opened this issue Oct 2, 2023 · 0 comments
Labels
discussion Discussing a topic with no specific actions yet

Comments

@fjetter
Copy link
Member

fjetter commented Oct 2, 2023

There currently a WIP PR open that suggests to modify dask.order, see dask/dask#10535 In dask/dask#9995 and linked issues it has been suggested that wrong ordering is/was a major component to the success of root task queuing. With this in mind, I set out to test this hypothesis and looked at specifically troubled workloads that were "fixed" by root task queuing.

So far, I looked into one specific test case that we're listing as test_vorticity in our benchmarks which was virtually impossible to run efficiently before task queuing. What is the state of this after dask/dask#10535 ?

With queuing enabled, actually nothing changes. Both memory usage and runtime stays constant. Good news.

With queuing disabled, this is actually very different.

image

This is walltime of the test case and avg/peak memory usage looks the same. So, we still need queuing enabled. However, this massive difference is not in alignment with the theory that ordering and queuing is so strongly related.

I ran a slightly different test and got very surprising and interesting results.

Effectively, I disabled the entire root-ish classification logic and performed a couple of measurements.

Note disabling root-ish logic is quite simple even without modifying code, e.g. with dask.annotate(restrictions=list(client.scheduler_info()["workers"])): since rootish is disabled for tasks with resources or restrictions.

image

So, the case without queuing is taking roughly twice the memory. However, this added memory usage is not because later reducer tasks are not run in time but rather because we have a very, very mild root task overproduction. This amount of overproduction is exactly what one would expect following due to the scheduler->worker latency. Effectively, the worker keeps loading data until the scheduler allows it to run a reducer. To scheduler a reducer requires a network roundtrip which is slower than scheduling a new task on the threadpool, i.e. we load about twice as much data.

So, what is happening here? I haven't only disabled task queuing but I disabled root-ish task classification, i.e. I explicitly bypassed also the "is root-ish task but not queued" logic which tries to be smart about placing data (i.e. "co-assignment").

I also subtly changed the way how decide_worker functions by setting restrictions.

It may be worth to further investigate this diff and whether root-ish classification could be removed again.

Why would we remove root-ish classification again? It is working well, isn't it?

Well, it is only working for some cases (#8005) and it is known to slow down certain workloads.
For instance, looking at benchmarking results of dask/dask#10535 shows that some workloads could be up to 50% faster when ran without queuing (at the cost of more memory usage)

image

Task queuing also doesn't work for tasks with resource, worker or host restrictions which can be surprising to users relying on this.

Last but not least, it is driving internal complexity quite a bit.

@fjetter fjetter added the discussion Discussing a topic with no specific actions yet label Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussing a topic with no specific actions yet
Projects
None yet
Development

No branches or pull requests

1 participant