Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[core][autoscaler] Fuse scaling requests together to avoid overloadin…
…g the Kubernetes API server (ray-project#49150) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Without this PR, the Ray Autoscaler sends a patch request to the Kubernetes (K8s) API server to scale up or down a Ray Pod. That is, if the Ray Autoscaler plans to scale up 10 Pods, 10 patch requests will be sent to the Kubernetes (K8s) API server. This is highly likely to overload the K8s API server when there are multiple Ray clusters within a single K8s cluster. This PR fuses the requests together to avoid overloading the K8s API server. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks * Create a Autoscaler V2 RayCluster CR. * head Pod: `num-cpus: 0` * worker Pod: Each worker Pod has 1 CPU, and the `maxReplicas` of the worker group is 10. * Run the following script in the head Pod: https://gist.github.com/kevin85421/6f09368ba48572e28f53654dca854b57 * Results * Without this PR, Ray Autoscaler submits 9 patch requests to the K8s API server (from 1 worker Pod -> 10 worker Pods). <img width="1440" alt="Screenshot 2024-12-07 at 11 29 17 AM" src="https://github.com/user-attachments/assets/b1757a8c-85df-4d76-a920-c8a81e5b92b2"> * With this PR, Ray Autoscaler submits 1 patch request to the K8s API server to scale up 9 worker Pods. <img width="1440" alt="Screenshot 2024-12-07 at 4 45 10 PM" src="https://github.com/user-attachments/assets/7a42fa56-4671-4b39-bb83-03b0a9a25ec0"> - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: kaihsun <kaihsun@anyscale.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
- Loading branch information