Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: Add note about suspending AZRebalance #1802

Closed
wants to merge 1 commit into from

Conversation

mgalgs
Copy link
Contributor

@mgalgs mgalgs commented Mar 18, 2019

According to some user reports [1], you can actually run
cluster-autoscaler against an ASG that spans multiple AZs,
you just have to suspend the AZRebalance scaling process
to avoid unexpected node termination.

[1] https://kubernetes.slack.com/archives/C8SH2GSL9/p1552600210276600?thread_ts=1552420686.257000&cid=C8SH2GSL9


NOTE: I've only been running in this configuration for about 1 day, so I can't personally vouch for the correctness of this workaround. As mentioned in the commit text above, other users have reported running in this configuration without issues. Would be great to get confirmation from an expert though ;)

According to some user reports [1], you can actually run
cluster-autoscaler against an ASG that spans multiple AZs,
you just have to suspend the AZRebalance scaling process
to avoid unexpected node termination.

[1] https://kubernetes.slack.com/archives/C8SH2GSL9/p1552600210276600?thread_ts=1552420686.257000&cid=C8SH2GSL9
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: losipiuk

If they are not already assigned, you can assign the PR to them by writing /assign @losipiuk in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 18, 2019
@mgalgs
Copy link
Contributor Author

mgalgs commented Mar 18, 2019

/joke

@k8s-ci-robot
Copy link
Contributor

@mgalgs: Why was the broom late for the meeting? He overswept.

In response to this:

/joke

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@MaciekPytel
Copy link
Contributor

Rebalance is not the only reason why CA doesn't support multi-AZ nodegroups. The core logic of CA works by taking a random existing node and assuming any new node in the same ASG will look exactly the same. In multi-AZ ASG the new node can be in a different zone than CA assumes it will be, which can lead to incorrect autoscaling decisions (unnecessary scale-up/no scale-up). This commonly leads to issues, especially when using PVs in multi-AZ clusters. A recent example: kubernetes/kubernetes#75402.

It may work ok-ish with multi-AZ ASG if you disable rebalancing, don't use storage, don't use podAffinity with topology other than host, don't use nodeAffinity on zone label, never scale any zone to 0, ...
The list of ifs is long, so while it may work in some very specific cluster it's not supported in general case.

@mgalgs mgalgs deleted the patch-1 branch March 19, 2019 20:06
@mgalgs
Copy link
Contributor Author

mgalgs commented Mar 19, 2019

Got it, thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants