From 0b968325187d51acbe5b12924eeeb57230aa0aef Mon Sep 17 00:00:00 2001 From: Mitchel Humpherys Date: Mon, 18 Mar 2019 16:44:06 -0700 Subject: [PATCH] AWS: Add note about suspending AZRebalance According to some user reports [1], you can actually run cluster-autoscaler against an ASG that spans multiple AZs, you just have to suspend the AZRebalance scaling process to avoid unexpected node termination. [1] https://kubernetes.slack.com/archives/C8SH2GSL9/p1552600210276600?thread_ts=1552420686.257000&cid=C8SH2GSL9 --- cluster-autoscaler/cloudprovider/aws/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cluster-autoscaler/cloudprovider/aws/README.md b/cluster-autoscaler/cloudprovider/aws/README.md index b9fce062676c..d23a28cf4664 100644 --- a/cluster-autoscaler/cloudprovider/aws/README.md +++ b/cluster-autoscaler/cloudprovider/aws/README.md @@ -143,7 +143,7 @@ If you'd like to scale node groups from 0, an `autoscaling:DescribeLaunchConfigu ## Common Notes and Gotchas: - The `/etc/ssl/certs/ca-certificates.crt` should exist by default on your ec2 instance. If you use Amazon Linux 2 (EKS worker node AMI by default), use `/etc/kubernetes/pki/ca.crt` instead for the volume hostPath in your cluster autoscaler manifest. -- Cluster autoscaler does not support Auto Scaling Groups which span multiple Availability Zones; instead you should use an Auto Scaling Group for each Availability Zone and enable the [--balance-similar-node-groups](../../FAQ.md#im-running-cluster-with-nodes-in-multiple-zones-for-ha-purposes-is-that-supported-by-cluster-autoscaler) feature. If you do use a single Auto Scaling Group that spans multiple Availability Zones you will find that AWS unexpectedly terminates nodes without them being drained because of the [rebalancing feature](https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-benefits.html#arch-AutoScalingMultiAZ). +- Cluster autoscaler does not support Auto Scaling Groups which span multiple Availability Zones; instead you should use an Auto Scaling Group for each Availability Zone and enable the [--balance-similar-node-groups](../../FAQ.md#im-running-cluster-with-nodes-in-multiple-zones-for-ha-purposes-is-that-supported-by-cluster-autoscaler) feature. If you do use a single Auto Scaling Group that spans multiple Availability Zones you will find that AWS unexpectedly terminates nodes without them being drained because of the [rebalancing feature](https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-benefits.html#arch-AutoScalingMultiAZ). Alternatively, you can suspend the [AZRebalance scaling process](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-suspend-resume-processes.html). - EBS volumes cannot span multiple AWS Availability Zones. If you have a Pod with Persistent Volume in an AZ, It must be running on a k8s/EKS node which is in the same Availability Zone of the Persistent Volume. If AWS Auto Scaling Group launches a new k8s/EKS node in different AZ and moves this Pod into the new node, The Persistent volume in previous AZ will not be available from the new AZ. The pod will stay in Pending status. The Workaround is using a single AZ for the k8s/EKS nodes. - By default, cluster autoscaler will not terminate nodes running pods in the kube-system namespace. You can override this default behaviour by passing in the `--skip-nodes-with-system-pods=false` flag. - By default, cluster autoscaler will wait 10 minutes between scale down operations, you can adjust this using the `--scale-down-delay-after-add`, `--scale-down-delay-after-delete`, and `--scale-down-delay-after-failure` flag. E.g. `--scale-down-delay-after-add=5m` to decrease the scale down delay to 5 minutes after a node has been added.