Skip to content
This repository has been archived by the owner on Sep 4, 2021. It is now read-only.

kube-aws: Kubernetes' Cluster Autoscaler on AWS requires separate ASG for each AZ #668

Closed
mumoshu opened this issue Sep 14, 2016 · 2 comments

Comments

@mumoshu
Copy link
Contributor

mumoshu commented Sep 14, 2016

As long as kube-aws relies on a single ASG associated with multiple AZs to achieve H/A for k8s workers, the Cluster Autoscaler won't work reliably.

Please take a look into the Note part in the cluster-autoscaler documentation and the related discussion in a PR.

How should we tackle this? Fixing the doc to say e.g. "kube-aws doesn't support k8s' cluster autoscaler when you've provided multiple AZs in cluster.yaml" or by creating an ASG for each AZ?

@colhom
Copy link
Contributor

colhom commented Sep 14, 2016

@mumoshu we should probably look to rectify the situation. I see two ways:

  1. Modify kube-aws to deploy an ASG per zone.
  2. Teach the cluster-autoscaler to deal with multi-zone ASGs. I personally don't see any problem with the autoscaler not having control over which zone an instance pops up in... I just want it to keep compute resources available across all configured zones.

I think I'm favoring the second option at this point.

\cc @cgag

@mumoshu
Copy link
Contributor Author

mumoshu commented Sep 14, 2016

@colhom If it was possible, I'd also like to go with the latter option. Though, AFAIK, there's no straightforward way for us to teach an ASG which zone to create "next instance" in, when needed to do so by the k8s cluster autoscaler.

For example, let assume that we have

  • an ASG with ap-northeast-1a and ap-northeast-1b assigned
  • the ASG has 2 instances in 1a, 1 instance in 1b
  • a replica set X currently distributing 1 pod in 1a and 2 pods in 1b
  • bunch of other pods retaining all the remaining resources

When k8s' cluster autoscaler noticed that a pod in the replica set X scheduled in 1a are pending due to insufficient resources, even though the ASG has more instances in 1a than 1b, it should add a new EC2 instance in 1a to balance number of pods between AZs.

On the other hand, what we can do via AWS API to scale up number of instances in an ASG is to "increase desired capacity", which automatically balance number of instances between AZs, not pods. So, in this case, we have to increase desired cap. by 2 to have sufficient resource in 1a, possibly resulting the newly added instance in 1b being very low usage. I believe this is also explained by @pbitty in kubernetes-retired/contrib#1552 (comment) from a slightly different perspective.

Maybe fixing cluster autoscaler to work-around the ASG limitation by temporarily assigning the ASG solely to 1a before increasing desired cap would work? It seems a bit hacky for me although I'm O.K. if it works.

As I understand, having a distinct AZ for each ASG solves this without that work-around.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants