-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for RayCluster #1272
Add support for RayCluster #1272
Comments
@alculquicondor initial thoughts on this? I can try to put together a KEP if this idea doesn't sound crazy :) |
Hi, in my opinion an easy way to do this is to have Ray create "Kueueble" pods for the workers. |
It sounds reasonable to me to add support for RayCluster. cc @kerthcet as one of the primary users of Ray+Kueue |
+1 I think that supporting long-living resources would be worth it. |
Make sense to me, several points here:
I think we can tell this via the |
/assign |
Do we still want a KEP for this? I think with the new suspend API (ray-project/kuberay#1667), the implementation should be very straight forward. |
Historically, we haven't written KEPs for integrations. FWIIW, the implementation should be quite similar to that of Job, whereas the implementation for RayJob will change a little to be more similar to Job. Note that if a user is queuing a RayJob, its RayCluster shouldn't be doubly queued. |
FYI @vicentefb who is working on the feature now |
What would you like to be added:
Support RayCluster as a queue-able workload in Kueue (much like RayJob).
Why is this needed:
Currently Kueue supports RayJob which works great when managing ray jobs that run on ephemeral ray clusters. However, there are many use-cases and existing workloads that depend on long-lived RayClusters. Being able to account for these RayClusters with Kueue would greatly improve integration of Kueue with Ray.
Completion requirements:
This probably needs a KEP, but very roughly the requirements would be:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: