[Serve][Doc] Kuberay RayService Autoscaling (ray-project#36590)

This PR adds documentation for KubeRay RayService Autoscaling. --------- Signed-off-by: Sihan Wang <sihanwang41@gmail.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
lmco · Aug 31, 2023 · a89092b · a89092b
1 parent 182c3d8
commit a89092b
Show file tree

Hide file tree

Showing 2 changed files with 11 additions and 0 deletions.
diff --git a/doc/source/cluster/kubernetes/user-guides/configuring-autoscaling.md b/doc/source/cluster/kubernetes/user-guides/configuring-autoscaling.md
@@ -199,6 +199,7 @@ by the following considerations:
   Ray and Autoscaler code versions. Running one autoscaler per Ray cluster and matching the code versions
   ensures compatibility.
 
+(kuberay-autoscaler-with-ray-autoscaler)=
 ### Ray Autoscaler with Kubernetes Cluster Autoscaler
 The Ray Autoscaler and the
 [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)

diff --git a/doc/source/serve/production-guide/kubernetes.md b/doc/source/serve/production-guide/kubernetes.md
@@ -245,6 +245,16 @@ $ kubectl describe rayservice rayservice-sample
 In the status, you can see that the `RayService` is preparing a pending cluster.
 After the pending cluster is healthy, it becomes the active cluster and the previous cluster is terminated.
 
+## Autoscaling
+You can configure autoscaling for your Serve application by setting the autoscaling field in the Serve config. Learn more about the configuration options in [Scaling and Resource Allocation](serve-scaling-and-resource-allocation).
+
+To enable autoscaling in a KubeRay Cluster, you need to set `enableInTreeAutoscaling` to True. Additionally, there are other options available to configure the autoscaling behavior. For further details, please refer to the documentation [here](serve-scaling-and-resource-allocation).
+
+
+:::{note}
+In most use cases, it is recommended to enable Kubernetes autoscaling to fully utilize the resources in your cluster. If you are using GKE, you can utilize the AutoPilot Kubernetes cluster. For instructions, see [Create an Autopilot Cluster]((https://cloud.google.com/kubernetes-engine/docs/how-to/creating-an-autopilot-cluster)). For EKS, you can enable Kubernetes cluster autoscaling by utilizing the Cluster Autoscaler. For detailed information, see [Cluster Autoscaler on AWS](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md). To understand the relationship between Kubernetes autoscaling and Ray autoscaling, see [Ray Autoscaler with Kubernetes Cluster Autoscaler](kuberay-autoscaler-with-ray-autoscaler).
+:::
+
 ## Next Steps
 
 Check out [the end-to-end fault tolerance guide](serve-e2e-ft) to learn more about Serve's failure conditions and how to guard against them.