AWS Cluster Autoscaler README #1552

andrewsykim · 2016-08-17T20:05:08Z

This change is

andrewsykim · 2016-08-17T20:06:24Z

iterion · 2016-08-17T20:15:50Z

cluster-autoscaler/cloudprovider/aws/README.md

+
+## Deployment Specification
+Your deployment configuration should look something like this:
+```


You can hint to Github that this should be formatted as yaml with: "```yaml". Same with the json above.

cool didn't know that

mumoshu · 2016-08-19T05:07:08Z

cluster-autoscaler/cloudprovider/aws/README.md

+```
+Note: 
+- the `/etc/ssl/certs/ca-certificates.crt` should exist by default on your ec2 instance.
+- at the time of writing this, cluster autoscaler is unaware of availability zones, the availability zone of the instance should be configured by the autoscaling group. Although autoscaling groups can contain instances in multiple availability zones, the autoscaling group should span 1 availability zone for the cluster autoscaler to work.  


Hi, really excited to see this PR!!

Excuse me if I'm misreading but I couldn't figure out why the autoscaling group should span 1 availability zone for the cluster autoscaler to work.

As noted, an cluster autoscaler is unaware of AZs, but an ASG is aware of them. If configured, spreading EC2 instances over multiple AZs will be done completely out of band in AWS AutoScaling. So I believe you can just assign 2 or more AZs to an ASG to eventually make cluster autoscaler multi-AZ aware as far as the AWS impl for Cluster Autoscaler delegates adding/removing instances to AWS AutoScaling.

And it seems so(according to https://github.com/kubernetes/contrib/pull/1377/files#diff-ade7b95627ea0dd6b6f4deee7f24fa7eR124 it is calling SetDesiredCapacity to delegate adding/removing instances)

Hey @mumoshu, I can't really speak to multi-AZ setups since our use case was a single-AZ ASG but that sounds right. You're correct regarding SetDesiredCapacity for instance creation, but this is where the instances are deleted.

@mumoshu, it will likely work with a multi-AZ ASG, but there may be some caveats if Kubernetes is configured to be zone-aware. Pardon the long-winded explanation. Here is the reasoning:

The cluster-autoscaler asks the AWS CloudProvider for a sample Node from the NodeGroup (backed by an ASG), and uses it to make scaling decisions. It assumes that the sample Node is equivalent to all other nodes in the ASG - eg. same instance type, storage, etc. When it needs to scale up, for example, it will know with certainty that new Nodes will have the same capacity and will be able to accommodate the pending pods.

The cluster-autoscaler has logic that simulates the Scheduler's decisions to see if a new Node will be able to accommodate pending workloads. If the Scheduler is zone-aware, it may specifically want to distribute workloads across AZs.

Consider this scenario:

The Scheduler wants to run some Pods in Zone A because there are already equivalent pods in Zone B and it wants a multi-AZ Pod distribution.

There are not enough Nodes in Zone A

In this situation the Pods will be pending, waiting for a Node in Zone A

The cluster autoscaler sees pending pods and starts looking for a NodeGroup (an ASG) that can accommodate these Pods

Say we have a NodeGroup whose ASG spans Zones A and B

This NodeGroup returns a random sampleNode that is in Zone A

The cluster-autoscaler says "Great, this NodeGroup has an appropriate node, let me scale it up", and it increases the DesiredCapacity by 1

The ASG launches a Node in Zone B (because it is trying to keep the Zones balanced)

The new Node appears in Kubernetes and the Scheduler sees that is in Zone B

The Scheduler continues to wait a Node in Zone A in order to schedule the pending Pods

The new Node did not help accommodate the pending Pods

Eventually the cluster-autoscaler will try to launch another Node because there are still Pods pending. By chance if a new Node lands in Zone A, the Pods will get scheduled and things will work, but this would be non-deterministic and would make the process less reliable

The crux of the matter is the contract between NodeGroup that provides a sampleNode, and the scaling logic, which expects that a sampleNode is equivalent to all other nodes in that NodeGroup.

These are theoretical, I haven't tested these scenarios, but from my understanding of the Node sampling logic, something like this would happen.

On that note, maybe we should say in the README that if one wants the Scheduler to be Zone-aware and distribute workloads evenly across Zones, then the 1-zone-per-ASG rule must be followed. Otherwise, it doesn't matter.

Thoughts?

At the time of writing this, cluster autoscaler is unaware of availability zones, the availability zone of the instance should be configured by the autoscaling group. Although autoscaling groups can contain instances in multiple availability zones, the cluster autoscaler will not evenly distribute pods across zones. Use mutli-AZ ASG at your own risk.

??

@osxi @pbitty @andrewsykim Your explanation really helped me understand what's under the hood. Thanks!

So, to be clear,

As long as we keep single AZ per ASG, cluster-autoscaling works reliably. That's because cluster autoscaler isn't AZ aware. Multiple AZs per ASG can result in a node added to an unneeded AZ because cluster scaler doesn't know/is unable to control which AZ to add a node.

If you are O.K. with undeterministic behavior @pbitty described, you can assign multiple AZs to an ASG but it isn't necessary because as @pbitty pointed out in AWS Cluster Autoscaler README #1552 (comment), keeping 1-zone-per-ASG fixes the reliability gotcha without any unwanted side-efffect

?

Then, IMHO, something like the below makes sense to me:

The autoscaling group should span exactly 1 availability zone for the cluster autoscaler to work. If you want to distribute workloads evenly across zones, set up multiple ASGs, each in a distinct availability zone.

At the time of writing this, cluster autoscaler is unaware of availability zones. Although autoscaling groups can contain instances in multiple availability zones when configured so, cluster autoscaler can't reliably add nodes to desired zones then. That's because AWS AutoScaling determines which zone to add node completely out-of-band from cluster autoscaler. For more information, see #1552 (comment)

looks good to me, I'll make the changes

mwielgus · 2016-08-23T18:25:12Z

Is the current state of the doc acceptable?

osxi · 2016-08-23T18:29:57Z

cluster-autoscaler/cloudprovider/aws/README.md

+    ]
+}
+```
+Unfortunately AWS does not support ARNs for autoscaling groups yet so you must use "*" as the resource.


We could probably add this link: http://docs.aws.amazon.com/autoscaling/latest/userguide/IAM.html#UsingWithAutoScaling_Actions

osxi · 2016-08-23T18:30:41Z

LGTM after adding a link to the autoscaling permissions docs!

andrewsykim · 2016-08-23T18:43:10Z

Comments addressed. @mwielgus I was wondering if there's an official gcr image for the CA that supports AWS as a cloud provider we can put here?

mwielgus · 2016-08-23T19:14:01Z

There is one: gcr.io/google_containers/cluster-autoscaler:v0.3.0-beta2 but it is a beta image. I will will update the doc once we get the final release.

mwielgus · 2016-08-25T10:36:01Z

lgtm

k8s-github-robot · 2016-08-25T10:43:04Z

Automatic merge from submit-queue

Automatic merge from submit-queue AWS Cluster Autoscaler README under kubernetes-retired/contrib#1311

kubernetes-retired/contrib#1552 (comment) seems to explain the reasoning behind multiple ASGs much better then the previous link target.

googlebot added the cla: yes label Aug 17, 2016

k8s-github-robot assigned mwielgus Aug 17, 2016

iterion reviewed Aug 17, 2016
View reviewed changes

mwielgus added the area/autoscaler label Aug 18, 2016

andrewsykim force-pushed the aws_docs branch 3 times, most recently from 20a561b to 29f639e Compare August 18, 2016 19:54

mumoshu reviewed Aug 19, 2016
View reviewed changes

osxi reviewed Aug 23, 2016
View reviewed changes

add README for running CA on AWS

9078c4b

andrewsykim force-pushed the aws_docs branch from 29f639e to 9078c4b Compare August 23, 2016 18:39

mwielgus added the lgtm Indicates that a PR is ready to be merged. label Aug 25, 2016

k8s-github-robot merged commit 0fca8c7 into kubernetes-retired:master Aug 25, 2016

andrewsykim deleted the aws_docs branch September 13, 2016 20:42

mumoshu mentioned this pull request Sep 14, 2016

kube-aws: Kubernetes' Cluster Autoscaler on AWS requires separate ASG for each AZ coreos/coreos-kubernetes#668

Closed

mumoshu mentioned this pull request Nov 17, 2016

Production Quality Deployment kubernetes-retired/kube-aws#9

Closed

22 tasks

johanneswuerbach mentioned this pull request Mar 11, 2017

Support cluster autoscaler kubernetes/kops#2106

Closed

mwielgus pushed a commit to kubernetes/autoscaler that referenced this pull request Apr 18, 2017

Merge pull request kubernetes-retired/contrib#1552 from Wattpad/aws_docs

e2b29d0

Automatic merge from submit-queue AWS Cluster Autoscaler README under kubernetes-retired/contrib#1311

mwielgus pushed a commit to kubernetes/autoscaler that referenced this pull request Apr 18, 2017

Merge pull request kubernetes-retired/contrib#1552 from Wattpad/aws_docs

ef570c0

Automatic merge from submit-queue AWS Cluster Autoscaler README under kubernetes-retired/contrib#1311

johanneswuerbach mentioned this pull request Jun 19, 2017

AWS: Fixed link to the detailed explanation kubernetes/autoscaler#126

Merged

whereisaaron mentioned this pull request Jan 28, 2018

v0.9.9 cluster autoscaling with two nodepools kubernetes-retired/kube-aws#1072

Closed

aleksandra-malinowska mentioned this pull request Feb 4, 2019

AWS EKS autodiscover autoscaling Pod scheduling failed kubernetes/autoscaler#1649

Closed

jfoy mentioned this pull request Feb 12, 2019

Unable to scale AWS autoscaling array up to satisfy pod EBS requirement in particular zone (NoVolumeZoneConflict) kubernetes/autoscaler#1431

Closed

mumoshu mentioned this pull request Mar 20, 2019

README: Add note about cluster-autoscaler not supporting multiple AZs eksctl-io/eksctl#647

Merged

2 tasks

cblecker unassigned mwielgus Apr 17, 2019

mascah mentioned this pull request Apr 25, 2019

Extend current worker_group ASG creation behavior (1 AZ per ASG) terraform-aws-modules/terraform-aws-eks#346

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Cluster Autoscaler README #1552

AWS Cluster Autoscaler README #1552

andrewsykim commented Aug 17, 2016 •

edited by thockin

Loading

andrewsykim commented Aug 17, 2016

iterion Aug 17, 2016 •

edited

Loading

andrewsykim Aug 17, 2016

mumoshu Aug 19, 2016 •

edited

Loading

osxi Aug 19, 2016

pbitty Aug 19, 2016

pbitty Aug 19, 2016

andrewsykim Aug 19, 2016 •

edited

Loading

mumoshu Aug 22, 2016

andrewsykim Aug 23, 2016

mwielgus commented Aug 23, 2016

osxi Aug 23, 2016

osxi commented Aug 23, 2016

andrewsykim commented Aug 23, 2016

mwielgus commented Aug 23, 2016

mwielgus commented Aug 25, 2016

k8s-github-robot commented Aug 25, 2016

AWS Cluster Autoscaler README #1552

AWS Cluster Autoscaler README #1552

Conversation

andrewsykim commented Aug 17, 2016 • edited by thockin Loading

andrewsykim commented Aug 17, 2016

iterion Aug 17, 2016 • edited Loading

Choose a reason for hiding this comment

andrewsykim Aug 17, 2016

Choose a reason for hiding this comment

mumoshu Aug 19, 2016 • edited Loading

Choose a reason for hiding this comment

osxi Aug 19, 2016

Choose a reason for hiding this comment

pbitty Aug 19, 2016

Choose a reason for hiding this comment

pbitty Aug 19, 2016

Choose a reason for hiding this comment

andrewsykim Aug 19, 2016 • edited Loading

Choose a reason for hiding this comment

mumoshu Aug 22, 2016

Choose a reason for hiding this comment

andrewsykim Aug 23, 2016

Choose a reason for hiding this comment

mwielgus commented Aug 23, 2016

osxi Aug 23, 2016

Choose a reason for hiding this comment

osxi commented Aug 23, 2016

andrewsykim commented Aug 23, 2016

mwielgus commented Aug 23, 2016

mwielgus commented Aug 25, 2016

k8s-github-robot commented Aug 25, 2016

andrewsykim commented Aug 17, 2016 •

edited by thockin

Loading

iterion Aug 17, 2016 •

edited

Loading

mumoshu Aug 19, 2016 •

edited

Loading

andrewsykim Aug 19, 2016 •

edited

Loading