Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support scheduling_hint=SPREAD|COLOCATE for tasks and actors #18524

Closed
ericl opened this issue Sep 12, 2021 · 13 comments
Closed

Support scheduling_hint=SPREAD|COLOCATE for tasks and actors #18524

ericl opened this issue Sep 12, 2021 · 13 comments
Assignees
Labels
P1 Issue that should be fixed within a few weeks performance size:medium
Milestone

Comments

@ericl
Copy link
Contributor

ericl commented Sep 12, 2021

Several use cases benefit from finer-grained control over scheduling, and cannot benefit from automatic locality-aware scheduling nor placement groups.

Proposal:

# The scheduler will try to spread the tasks across the cluster.
func.options(scheduling_hint="SPREAD").remote()

# The scheduler will only schedule the task on current node.
func.options(scheduling_hint="COLOCATE").remote()

Data reading tasks: These tasks have no input, but produce large amounts of output. Ideally Ray would spread these tasks across the cluster, but currently there is no way to do so. This causes data imbalance in ML ingest and Dask-on-Ray workloads. Currently Dask-on-Ray recommends a hidden scheduler flag for this: https://docs.ray.io/en/latest/data/dask-on-ray.html#best-practice-for-large-scale-workloads

This is also a blocker for scalable ML ingest without the "resource prefix" hack, since large datasets cause memory imbalance across the cluster without spreading.

Helper tasks relying on local resources: Suppose a task a file locally, but wants to launch sub-tasks for parallelism. There is no current way to do this except by relying on hacky node id resources. Another example is the driver forking a "main" task on the head node for easy debugging.

Related issues: #18465, #5722

@ericl ericl added enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks usability size:large labels Sep 12, 2021
@ericl ericl added this to the Core Backlog milestone Sep 12, 2021
@valiantljk
Copy link
Contributor

For colocate, does this api allow users to colocate any two+ tasks easily?

I think we also have this requirement, so far we just hacked with node id as customized res.

Good to see this proposal!

@ericl
Copy link
Contributor Author

ericl commented Sep 12, 2021

In the above proposal, you could force colocation of two tasks if the second task is launched by the first task. If the two tasks are launched independently you can already force colocation using a placement group, hope this helps.

@simon-mo
Copy link
Contributor

This would also generalize to actors placement right?

@ericl
Copy link
Contributor Author

ericl commented Sep 13, 2021

Yes, we could support it for both tasks and actors, though the implementation may differ slightly for actors.

@ericl ericl modified the milestones: Core Backlog, Datasets Beta Sep 14, 2021
@mwtian
Copy link
Member

mwtian commented Sep 24, 2021

For scheduling_hint="SPREAD", do we envision "spread" tasks and actors get launched together or separately, e.g. 100 tasks each launching another, or a single task launching 100 other tasks?

@ericl
Copy link
Contributor Author

ericl commented Sep 24, 2021

For the Datasets use case they're all launched together (by the driver). I think we'd want the hint to work well in both scenarios, are there some implications per scenario you're thinking of?

@mwtian
Copy link
Member

mwtian commented Sep 24, 2021

For the spread hint, it seems we would need a higher level collection abstraction to spread tasks within, or an identifier where tasks / actors with the same identifier value are spread out. The second approach seems to work for both scenarios of launching. Maybe there are more elegant approaches. I'm curious to see what the final API we decide to have.

@robertnishihara
Copy link
Collaborator

COLOCATE seems to force colocation, whereas "hint" sounds like a best effort thing.

@clarkzinzow
Copy link
Contributor

Bump, another OSS user ran into this with Datasets, where read tasks (and therefore downstream map tasks) are packing onto a single node, causing poor performance and cluster instability.

@raulchen
Copy link
Contributor

func.options(scheduling_hint="SPREAD").remote()

I'm confused about the semantic of this. Does this mean that func will be scheduled to a node that is different from the current node?
If I submit 5 tasks with scheduling_hint="SPREAD". What happens to them? Are those 5 tasks independent or will they be spread?

@ericl
Copy link
Contributor Author

ericl commented Nov 15, 2021

If I submit 5 tasks with scheduling_hint="SPREAD". What happens to them? Are those 5 tasks independent or will they be spread?

The scheduler will do its best to spread them equally across different nodes, similar to SPREAD in placement groups. No guarantees though. They are independent.

@clay4megtr
Copy link
Contributor

If I submit 5 tasks with scheduling_hint="SPREAD". What happens to them? Are those 5 tasks independent or will they be spread?

The scheduler will do its best to spread them equally across different nodes, similar to SPREAD in placement groups. No guarantees though. They are independent.

hmm... It sounds like SOFT SPREAD in placement group but without Gang Scheduler?

@jjyao
Copy link
Collaborator

jjyao commented Nov 17, 2021

@clay4444 Yea, behave similar to soft spread in placement group.

@ericl ericl added performance size:medium and removed enhancement Request for new feature and/or capability usability size:large labels Nov 17, 2021
@ericl ericl changed the title [RFC] Support scheduling_hint=SPREAD|COLOCATE for tasks and actors Support scheduling_hint=SPREAD|COLOCATE for tasks and actors Nov 19, 2021
@ericl ericl closed this as completed Feb 22, 2022
Repository owner moved this from In Progress to Done in Ray Core Public Roadmap Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Issue that should be fixed within a few weeks performance size:medium
Projects
Development

No branches or pull requests

9 participants