-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend current worker_group ASG creation behavior (1 AZ per ASG) #346
Comments
Interesting issue!
I would say almost everyone will be running the cluster-autoscaler.
This is the first I've heard of this and I have always run the cluster-autoscaler with multi-AZ ASGs. But I've never seen rebalancing happening or an instance terminated by the ASG (unless EC2 healthcheck fails).
Interesting idea! I'm keen to get other opinions here as I've never seen or heard of this issue before. Also, couldn't this problem simply be avoided by passing |
This is a good point. I guess yes that would prevent the issues with rebalancing. Like I said, it's not blocking at the moment but I still think that adding the flexibility could be interesting for the users of the module. |
I would say this problem isn't avoided just by passing If you have AZ aware workloads, CA can request an ASG to schedule an instance, but the ASG will schedule it into a zone that does not meet the requirements of the pod. The issue has been explained/addressed by people who are much more intimate with CA and the use cases. kubernetes-retired/contrib#1552 (comment) |
I think this is helped in 1.12: https://kubernetes.io/blog/2018/10/11/topology-aware-volume-provisioning-in-kubernetes/ An common example is pods that have an EBS volume. But I don't think it's related to this issue? Whether we have an ASG per AZ or a single ASG, I don't think CA or pre 1.12 k8s can help with this. |
@mascah is correct, use of the cluster-autoscaler with AutoScaling Groups which span more than one Availability Zone is unsupported. A separate ASG should be created per zone. As noted, this is primarily because EBS are scoped to a single zone.
When there is an ASG per AZ, cluster-autoscaler can increase the desired capacity in the group which is located in the appropriate zone. I believe it goes something like this:
The 1.12 Topology-Aware Volume Provisioning comes at this from the other direction which primarily helps when the cluster-autoscaler is not in use.
The default remains as before:
|
Unless you suspend We have a large PVC for prometheus in 1 cluster so we defined a second worker group like this with a label for scheduling in k8s:
|
Maybe do the reverse so as not to break compatibility and see if there's a clean way of doing it in TF? All the worker group related code is quite complicated at the moment 😬 |
That's a reasonable work around for the time being. I'm also optimistic that CA supports multi AZ ASGs before we need to dump a bunch more worker group related code here to be able to handle it 😄 |
It is required to have a separate ASG for each available zone configured in an EKS cluster. If you like, try to kill a pod or increate your workloads, to see whether the cluster would autoscale. |
This is still needed, as during autoscaling for resources pinned to AZ-A causes the autoscaler to have a chance of spawning the additional capacity in another AZ, as it does not have the ability to scale based on zone when multiple AZs are in an ASG. |
To avoid code duplication in worker_groups = [
for subnet in data.aws_subnet_ids.private.ids :
{
subnets = [subnet],
asg_desired_capacity = 3,
instance_type = "m5.large",
ami_id = data.aws_ami.eks_worker.id
}
] It is also useful in case you don't know how many subnets you're going to have initially. |
Nice 😃 |
Good to see this thread. Any plan to natively integrate that in the module without any need for "for loops"? |
Do we need to? That solution by @mgar is very elegant? |
I'd also need this, but I'll (hopefully) try @mgar's solution over the next few days and report back. 👍 |
Quick update, @mgar's seems to be working well for us 🤞 |
Great. I think we can close this issue then, right? |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
I have issues
I'm submitting a...
What is the current behavior?
Currently,
worker_groups
will take a list ofsubnets
and will create a single ASG spreading across all the requested subnets (availability zones) .While this is a useful behavior in some conditions, it's incompatible with the use of
cluster-autoscaler
that does not support Auto Scaling Groups which span multiple Availability ZonesAssuming that a fair number of people will use Kubernetes alongside with the cluster-autoscaler, will it be possible to consider extending the feature?
What is the desired behavior?
I'm suggesting switching the current behavior under a flag like:
And implement the new default behavior to create as many ASG as the number of declared subnets.
Environment details
Notes:
This is not currently impossible to achieve that with the existing code but it leads to a lot of code duplication for make it happen.
Example:
The text was updated successfully, but these errors were encountered: