Skip to content

Commit

Permalink
[Serve][Doc] Kuberay RayService Autoscaling (ray-project#36590)
Browse files Browse the repository at this point in the history
This PR adds documentation for KubeRay RayService Autoscaling.

---------

Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
  • Loading branch information
sihanwang41 authored and arvind-chandra committed Aug 31, 2023
1 parent 182c3d8 commit a89092b
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ by the following considerations:
Ray and Autoscaler code versions. Running one autoscaler per Ray cluster and matching the code versions
ensures compatibility.

(kuberay-autoscaler-with-ray-autoscaler)=
### Ray Autoscaler with Kubernetes Cluster Autoscaler
The Ray Autoscaler and the
[Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
Expand Down
10 changes: 10 additions & 0 deletions doc/source/serve/production-guide/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,16 @@ $ kubectl describe rayservice rayservice-sample
In the status, you can see that the `RayService` is preparing a pending cluster.
After the pending cluster is healthy, it becomes the active cluster and the previous cluster is terminated.

## Autoscaling
You can configure autoscaling for your Serve application by setting the autoscaling field in the Serve config. Learn more about the configuration options in [Scaling and Resource Allocation](serve-scaling-and-resource-allocation).

To enable autoscaling in a KubeRay Cluster, you need to set `enableInTreeAutoscaling` to True. Additionally, there are other options available to configure the autoscaling behavior. For further details, please refer to the documentation [here](serve-scaling-and-resource-allocation).


:::{note}
In most use cases, it is recommended to enable Kubernetes autoscaling to fully utilize the resources in your cluster. If you are using GKE, you can utilize the AutoPilot Kubernetes cluster. For instructions, see [Create an Autopilot Cluster]((https://cloud.google.com/kubernetes-engine/docs/how-to/creating-an-autopilot-cluster)). For EKS, you can enable Kubernetes cluster autoscaling by utilizing the Cluster Autoscaler. For detailed information, see [Cluster Autoscaler on AWS](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md). To understand the relationship between Kubernetes autoscaling and Ray autoscaling, see [Ray Autoscaler with Kubernetes Cluster Autoscaler](kuberay-autoscaler-with-ray-autoscaler).
:::

## Next Steps

Check out [the end-to-end fault tolerance guide](serve-e2e-ft) to learn more about Serve's failure conditions and how to guard against them.
Expand Down

0 comments on commit a89092b

Please sign in to comment.