From 05f03d55f1940c4d20a2606223b7fb658055ef6e Mon Sep 17 00:00:00 2001 From: Mitchel Humpherys Date: Tue, 19 Mar 2019 15:10:32 -0700 Subject: [PATCH] README: Add note about cluster-autoscaler not supporting multiple AZs --- README.md | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 2731b79e7f4..deb20ad6589 100644 --- a/README.md +++ b/README.md @@ -130,7 +130,7 @@ To use a 3-5 node Auto Scaling Group, run: eksctl create cluster --name=cluster-5 --nodes-min=3 --nodes-max=5 ``` -> NOTE: You will still need to install and configure autoscaling. See the "Enable Autoscaling" section below. +> NOTE: You will still need to install and configure autoscaling. See the "Enable Autoscaling" section below. Also note that depending on your workloads you might need to use a separate nodegroup for each AZ. See [Zone-aware Autoscaling](#zone-aware-autoscaling) below for more info. To use 30 `c4.xlarge` nodes and prevent updating current context in `~/.kube/config`, run: @@ -410,6 +410,43 @@ and `k8s.io/cluster-autoscaler/` tags, so nodegroup discovery shoul [cluster autoscaler]: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md +#### Zone-aware Autoscaling + +If your workloads are zone-specific you'll need to create separate nodegroups for each zone. This is because the `cluster-autoscaler` assumes that all nodes in a group are exactly equivalent. So, for example, if a scale-up event is triggered by a pod which needs a zone-specific PVC (e.g. an EBS volume), the new node might get scheduled in the wrong AZ and the pod will fail to start. + +You won't need a separate nodegroup for each AZ if your environment meets the following criteria: + +- The AZRebalance scaling process is [suspended](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-suspend-resume-processes.html). +- No zone-specific storage requirements. +- No podAffinity with topology other than host. +- No nodeAffinity on zone label. +- Never scale any zone to 0. + +(Read more [here](https://github.com/kubernetes/autoscaler/pull/1802#issuecomment-474295002) and [here](https://github.com/weaveworks/eksctl/pull/647#issuecomment-474698054).) + +If you meet all of the above requirements (and possibly others) then you should be safe with a single nodegroup which spans multiple AZs. Otherwise you'll want to create separate, single-AZ nodegroups: + +BEFORE: + +```yaml +nodeGroups: + - name: ng1-public + instanceType: m5.xlarge + # availabilityZones: ["eu-west-2a", "eu-west-2b"] +``` + +AFTER: + +```yaml +nodeGroups: + - name: ng1-public-2a + instanceType: m5.xlarge + availabilityZones: ["eu-west-2a"] + - name: ng1-public-2b + instanceType: m5.xlarge + availabilityZones: ["eu-west-2b"] +``` + ### VPC Networking By default, `eksctl create cluster` will build a dedicated VPC, in order to avoid interference with any existing resources for a