-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core][autoscaler] Autoscaler v2 does not honor minReplicas/replicas count of the worker nodes and constantly terminates after idletimeout #48623
Conversation
Signed-off-by: kaihsun <kaihsun@anyscale.com>
Signed-off-by: kaihsun <kaihsun@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. Should have definitely tested this one.
- terminate_nodes_by_type[node_type] | ||
<= min_count | ||
): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's add a debug log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Btw, I have several questions:
-
I don't find any place in
scheduler
to callsetLevel
. How can I set the logger's level to DEBUG when I launch the autoscaler via CLI? -
How to determine it should be
INFO
orDEBUG
?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for 1 - it's using ray configured logging AFAIK so that should be configured with how ray logging level is configured.
For 2 - it's more arbitrary and style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there seems no way to configure log level when launching the autoscaler via Ray CLI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there seems no way to configure log level when launching the autoscaler via Ray CLI.
I will verify whether this is correct. If so, I will open a PR to make it configurable.
Signed-off-by: kaihsun <kaihsun@anyscale.com>
…count of the worker nodes and constantly terminates after idletimeout (ray-project#48623) Signed-off-by: kaihsun <kaihsun@anyscale.com>
…count of the worker nodes and constantly terminates after idletimeout (ray-project#48623) Signed-off-by: kaihsun <kaihsun@anyscale.com> Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
Why are these changes needed?
Currently, Autoscaler V2 deletes idle nodes without considering
min_worker_nodes
. This PR skips termination if the current number of nodes of that type is less than or equal tomin_worker_nodes
.Reproduce
minReplicas
is 2Related issue number
Change the image with this PR.
Note that the screenshot is INFO, but this PR uses DEBUG
Closes #47578
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.