generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 299
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added documentation for RayCluster integration (#1607)
added ray cluster sample yaml file added more limitations nit NIT nit updated nit added version nit
- Loading branch information
Showing
4 changed files
with
166 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
--- | ||
title: "Run A RayCluster" | ||
date: 2024-01-17 | ||
weight: 6 | ||
description: > | ||
Run a RayCluster on Kueue. | ||
--- | ||
|
||
This page shows how to leverage Kueue's scheduling and resource management capabilities when running [RayCluster](https://docs.ray.io/en/latest/cluster/getting-started.html). | ||
|
||
This guide is for [batch users](/docs/tasks#batch-user) that have a basic understanding of Kueue. For more information, see [Kueue's overview](/docs/overview). | ||
|
||
## Before you begin | ||
|
||
1. Make sure you are using Kueue v0.6.0 version or newer and Kuberay 1.1.0 or newer. | ||
|
||
2. Check [Administer cluster quotas](/docs/tasks/administer_cluster_quotas) for details on the initial Kueue setup. | ||
|
||
3. See [KubeRay Installation](https://ray-project.github.io/kuberay/deploy/installation/) for installation and configuration details of KubeRay. | ||
|
||
## RayCluster definition | ||
|
||
When running [RayClusters](https://docs.ray.io/en/latest/cluster/getting-started.html) on | ||
Kueue, take into consideration the following aspects: | ||
|
||
### a. Queue selection | ||
|
||
The target [local queue](/docs/concepts/local_queue) should be specified in the `metadata.labels` section of the RayCluster configuration. | ||
|
||
```yaml | ||
metadata: | ||
name: raycluster-sample | ||
namespace: default | ||
labels: | ||
kueue.x-k8s.io/queue-name: local-queue | ||
``` | ||
### b. Configure the resource needs | ||
The resource needs of the workload can be configured in the `spec`. | ||
|
||
```yaml | ||
headGroupSpec: | ||
spec: | ||
affinity: {} | ||
containers: | ||
- env: [] | ||
image: rayproject/ray:2.7.0 | ||
imagePullPolicy: IfNotPresent | ||
name: ray-head | ||
resources: | ||
limits: | ||
cpu: "1" | ||
memory: 2G | ||
requests: | ||
cpu: "1" | ||
memory: 2G | ||
securityContext: {} | ||
volumeMounts: | ||
- mountPath: /tmp/ray | ||
name: log-volume | ||
workerGroupSpecs: | ||
template: | ||
spec: | ||
affinity: {} | ||
containers: | ||
- env: [] | ||
image: rayproject/ray:2.7.0 | ||
imagePullPolicy: IfNotPresent | ||
name: ray-worker | ||
resources: | ||
limits: | ||
cpu: "1" | ||
memory: 1G | ||
requests: | ||
cpu: "1" | ||
memory: 1G | ||
``` | ||
|
||
Note that a RayCluster will hold resource quotas while it exists. For optimal resource management, you should delete a RayCluster that is no longer in use. | ||
|
||
### c. Limitations | ||
- Limited Worker Groups: Because a Kueue workload can have a maximum of 8 PodSets, the maximum number of `spec.workerGroupSpecs` is 7 | ||
- In-Tree Autoscaling Disabled: Kueue manages resource allocation for the RayCluster; therefore, the cluster's internal autoscaling mechanisms need to be disabled | ||
|
||
## Example RayCluster | ||
|
||
The RayCluster looks like the following: | ||
|
||
{{< include "examples/jobs/ray-cluster-sample.yaml" "yaml" >}} | ||
|
||
You can submit a Ray Job using the [CLI](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/quickstart.html) or log into the Ray Head and execute a job following this [example](https://ray-project.github.io/kuberay/deploy/helm-cluster/#end-to-end-example) with kind cluster. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
apiVersion: ray.io/v1 | ||
kind: RayCluster | ||
metadata: | ||
name: raycluster-sample | ||
namespace: default | ||
labels: | ||
kueue.x-k8s.io/queue-name: local-queue | ||
spec: | ||
headGroupSpec: | ||
rayStartParams: | ||
dashboard-host: 0.0.0.0 | ||
serviceType: ClusterIP | ||
template: | ||
metadata: | ||
annotations: {} | ||
spec: | ||
affinity: {} | ||
containers: | ||
- env: [] | ||
image: rayproject/ray:2.7.0 | ||
imagePullPolicy: IfNotPresent | ||
name: ray-head | ||
resources: | ||
limits: | ||
cpu: "1" | ||
memory: 2G | ||
requests: | ||
cpu: "1" | ||
memory: 2G | ||
securityContext: {} | ||
volumeMounts: | ||
- mountPath: /tmp/ray | ||
name: log-volume | ||
imagePullSecrets: [] | ||
nodeSelector: {} | ||
tolerations: [] | ||
volumes: | ||
- emptyDir: {} | ||
name: log-volume | ||
workerGroupSpecs: | ||
- groupName: workergroup | ||
maxReplicas: 10 | ||
minReplicas: 1 | ||
rayStartParams: {} | ||
replicas: 4 | ||
template: | ||
metadata: | ||
annotations: {} | ||
spec: | ||
affinity: {} | ||
containers: | ||
- env: [] | ||
image: rayproject/ray:2.7.0 | ||
imagePullPolicy: IfNotPresent | ||
name: ray-worker | ||
resources: | ||
limits: | ||
cpu: "1" | ||
memory: 1G | ||
requests: | ||
cpu: "1" | ||
memory: 1G | ||
securityContext: {} | ||
volumeMounts: | ||
- mountPath: /tmp/ray | ||
name: log-volume | ||
imagePullSecrets: [] | ||
nodeSelector: {} | ||
tolerations: [] | ||
volumes: | ||
- emptyDir: {} | ||
name: log-volume |