Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaling group rebalancing #3319

Closed
mikesplain opened this issue Aug 30, 2017 · 9 comments
Closed

Autoscaling group rebalancing #3319

mikesplain opened this issue Aug 30, 2017 · 9 comments

Comments

@mikesplain
Copy link
Contributor

We use aws cluster autoscaler so we can scale up and down (for instance) when jenkins needs more resources for builds. A number of times I've seen it launch a boatload of instances, then AWS decides to rebalance the ASG, improperly killing instances mid build. Seems like we could stop this by allowing users to suspend AZRebalance on an ASG.

http://docs.aws.amazon.com/autoscaling/latest/userguide/as-suspend-resume-processes.html

I'm planning to test out some code for this and was hoping to make sure I didn't miss if this was already possible in kops, and what people think the default value would be (suspend AZRebalance or not).

@chrislovecnm
Copy link
Contributor

It this an autoscaler issue or kops?

@dewet22
Copy link

dewet22 commented Oct 17, 2017

This is an issue for us at the moment; we fire up temporary fleets of spot IGs to do batch processing, so balancing is not a concern for us. Our bids are pretty aggressive and differences between AZ pricing will often result in completely unbalanced IGs, which autoscaling unhelpfully tries to rebalance continually, sometimes quite disruptively when it manages to start a new instance which almost immediately gets outbid again. So disabling AZRebalance would be useful for our use case.

@chrislovecnm
Copy link
Contributor

So a value on the ig API to do this? Possibly a value when we launch the asg?

@dewet22
Copy link

dewet22 commented Oct 17, 2017

@chrislovecnm Indeed, this is a function of the autoscaling group itself. The programmatic equivalent is

aws autoscaling suspend-processes --auto-scaling-group-name my-asg --scaling-processes AZRebalance

See https://docs.aws.amazon.com/autoscaling/latest/userguide/as-suspend-resume-processes.html for more.

Terraform also picks up on these changes, and undoes them every time we apply by hand:

  ~ aws_autoscaling_group.spot-m4-16xl-xxxx
      suspended_processes.#:         "1" => "0"
      suspended_processes.1234567: "AZRebalance" => ""

@ffjia
Copy link

ffjia commented Nov 16, 2017

@mikesplain Are you using Spot or On-demand instance?

@dewet22 What will happen if you disable AZRebalance, when say your bid is below the spot price in one AZ?

@dewet22
Copy link

dewet22 commented Nov 16, 2017

@ffjia We are happy with a completely unbalanced deployment. What usually happens is one out of the two zones in the region will have its spot price spike, so ideally we will end up with only those instances having to migrate and then the cluster stabilising. However AZRebalance continually tries to restart the instances in the more expensive AZ and that causes all the instability with those instances very likely to get outbid again right after starting.

@chrislovecnm
Copy link
Contributor

@mikesplain I am wondering if this would be an API option?

@ghost
Copy link

ghost commented Dec 20, 2017

Note that the cluster autoscaler has the flag "--balance-similar-node-groups" which should help a bit with this (this way the autoscaler will try to keep the zones balanced when scaling down), see https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#im-running-cluster-with-nodes-in-multiple-zones-for-ha-purposes-is-that-supported-by-cluster-autoscaler

@mikesplain
Copy link
Contributor Author

@chrislovecnm do you mean as mentioned in my PR? #3829 (comment)

@ffjia We use both in separate IG's

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants