[core][autoscaler] Fuse scaling requests together to avoid overloading the Kubernetes API server #49150

kevin85421 · 2024-12-08T00:47:49Z

Why are these changes needed?

Without this PR, the Ray Autoscaler sends a patch request to the Kubernetes (K8s) API server to scale up or down a Ray Pod. That is, if the Ray Autoscaler plans to scale up 10 Pods, 10 patch requests will be sent to the Kubernetes (K8s) API server. This is highly likely to overload the K8s API server when there are multiple Ray clusters within a single K8s cluster.

This PR fuses the requests together to avoid overloading the K8s API server.

Related issue number

Checks

Create a Autoscaler V2 RayCluster CR.
- head Pod: num-cpus: 0
- worker Pod: Each worker Pod has 1 CPU, and the maxReplicas of the worker group is 10.
Run the following script in the head Pod: https://gist.github.com/kevin85421/6f09368ba48572e28f53654dca854b57
Results
- Without this PR, Ray Autoscaler submits 9 patch requests to the K8s API server (from 1 worker Pod -> 10 worker Pods).
- With this PR, Ray Autoscaler submits 1 patch request to the K8s API server to scale up 9 worker Pods.

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 · 2024-12-08T23:26:46Z

python/ray/autoscaler/_private/fake_multi_node/node_provider.py

@@ -344,6 +344,9 @@ def create_node_with_resources_and_labels(
    def _create_node_with_resources_and_labels(
        self, node_config, tags, count, resources, labels
    ):
+        # This function calls `pop`. To avoid side effects, we make a


Without this, the first node in the scaling request will be created correctly. However, the following nodes will be created without CPU/GPU because their resources have already been popped from resources.

rickyyx

Great catch!

…g the Kubernetes API server (ray-project#49150)   ## Why are these changes needed? Without this PR, the Ray Autoscaler sends a patch request to the Kubernetes (K8s) API server to scale up or down a Ray Pod. That is, if the Ray Autoscaler plans to scale up 10 Pods, 10 patch requests will be sent to the Kubernetes (K8s) API server. This is highly likely to overload the K8s API server when there are multiple Ray clusters within a single K8s cluster. This PR fuses the requests together to avoid overloading the K8s API server. ## Related issue number  ## Checks * Create a Autoscaler V2 RayCluster CR. * head Pod: `num-cpus: 0` * worker Pod: Each worker Pod has 1 CPU, and the `maxReplicas` of the worker group is 10. * Run the following script in the head Pod: https://gist.github.com/kevin85421/6f09368ba48572e28f53654dca854b57 * Results * Without this PR, Ray Autoscaler submits 9 patch requests to the K8s API server (from 1 worker Pod -> 10 worker Pods). <img width="1440" alt="Screenshot 2024-12-07 at 11 29 17 AM" src="https://github.com/user-attachments/assets/b1757a8c-85df-4d76-a920-c8a81e5b92b2"> * With this PR, Ray Autoscaler submits 1 patch request to the K8s API server to scale up 9 worker Pods. <img width="1440" alt="Screenshot 2024-12-07 at 4 45 10 PM" src="https://github.com/user-attachments/assets/7a42fa56-4671-4b39-bb83-03b0a9a25ec0"> - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: kaihsun <kaihsun@anyscale.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

update

d3e937c

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 changed the title ~~[core][autoscaler] Fuse scaling requests together to avoid overloading the Kubernetes API server.~~ [core][autoscaler] Fuse scaling requests together to avoid overloading the Kubernetes API server Dec 8, 2024

kevin85421 added 2 commits December 8, 2024 23:23

fix

8e3cd76

Signed-off-by: kaihsun <kaihsun@anyscale.com>

fix

7e121a4

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 commented Dec 8, 2024

View reviewed changes

kevin85421 marked this pull request as ready for review December 8, 2024 23:27

kevin85421 requested review from hongchaodeng and a team as code owners December 8, 2024 23:27

kevin85421 assigned rickyyx and jjyao Dec 8, 2024

kevin85421 added the go add ONLY when ready to merge, run all tests label Dec 9, 2024

rickyyx approved these changes Dec 9, 2024

View reviewed changes

rickyyx merged commit 7b476ea into ray-project:master Dec 9, 2024
6 checks passed

kevin85421 assigned kevin85421 and unassigned jjyao and rickyyx Dec 9, 2024

kevin85421 mentioned this pull request Dec 11, 2024

[Umbrella] Autoscaler improvements ray-project/kuberay#2600

Open

31 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][autoscaler] Fuse scaling requests together to avoid overloading the Kubernetes API server #49150

[core][autoscaler] Fuse scaling requests together to avoid overloading the Kubernetes API server #49150

kevin85421 commented Dec 8, 2024 •

edited

Loading

kevin85421 Dec 8, 2024

rickyyx left a comment

[core][autoscaler] Fuse scaling requests together to avoid overloading the Kubernetes API server #49150

[core][autoscaler] Fuse scaling requests together to avoid overloading the Kubernetes API server #49150

Conversation

kevin85421 commented Dec 8, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

kevin85421 Dec 8, 2024

Choose a reason for hiding this comment

rickyyx left a comment

Choose a reason for hiding this comment

kevin85421 commented Dec 8, 2024 •

edited

Loading