Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capacity Rebalance support #1124

Closed
2 of 4 tasks
zeyaddeeb opened this issue Nov 27, 2020 · 9 comments · Fixed by #1326
Closed
2 of 4 tasks

Capacity Rebalance support #1124

zeyaddeeb opened this issue Nov 27, 2020 · 9 comments · Fixed by #1326

Comments

@zeyaddeeb
Copy link

I have issues

I was looking to add capacity rebalance to some worker launch templates and ran into issues:

exampe:

resource "aws_autoscaling_group" "example" {
  capacity_rebalance = true
}

as per this PR: hashicorp/terraform-provider-aws#16127

I'm submitting a...

  • bug report
  • feature request
  • support request - read the FAQ first!
  • kudos, thank you, warm fuzzy

What is the current behavior?

When adding capacity_rebalance = true in the worker launch template will not work since there are no lookup values.

If this is a bug, how to reproduce? Please include a code sample if relevant.

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

  • Affected module version: latest
  • OS: ubuntu18.04
  • Terraform version: 0.13

Any other relevant info

hashicorp/terraform-provider-aws#16127

@AdamTylerLynch
Copy link

AdamTylerLynch commented Nov 30, 2020

@zeyaddeeb you mention a Launch Template, but your example is showing creating an Auto Scaling group. Can you please share your code so we can help troubleshoot?

@AdamTylerLynch
Copy link

Capacity Rebalance is not a property of a launch template, it is a property of the autoscaling group through. You can create an Auto Scaling group and associate it with a launch template (or multiple launch templates in the latest versions of the SDK).

@zeyaddeeb
Copy link
Author

Maybe using the wrong terminology, but it would look something like:

...

  worker_groups_launch_template = [
    {
      name                                     = "worker1"
      subnets                                  = var.subnet_ids
      asg_min_size                             = 1
      asg_desired_capacity                     = 1
      asg_max_size                             = 1
      autoscaling_enabled                      = true
      capacity_relabance                       = true   # desired variable
    },

...

@AdamTylerLynch
Copy link

AdamTylerLynch commented Dec 2, 2020

@zeyaddeeb I misunderstood. Thank you for the clarification.

@AdamTylerLynch
Copy link

After reviewing, as of this posting AWS EKS/ECS does not support the Capacity Rebalance feature at this time.

@petekneller
Copy link

@AdamTylerLynch When you say Capacity Rebalancing is not supported by EKS, can you elaborate please? I don't understand how a feature on the ASG requires direct support in EKS. We use spot instances managed by EKS and were, like the OP, looking to enable Capacity Rebalance. I enabled it manually for an ASG via the console and it seemed to have the expected effect. When I then went looking for the correct option on this TF module I found my way here. Are you able to clarify what would need to happen in EKS before this feature was available in the module?

@davidgp1701
Copy link

Hi,

I would also be interested in adding this to the module. I'm even willing to do a PR for it, but as @petekneller is commented, not sure if I understand why @AdamTylerLynch comments this it is not compatible with EKS.

So, my idea of why this feature would be interested is the following one. I have deployed both the Cluster Autoscaler: https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html and the AWS Node Termination Handler: https://github.com/aws/aws-node-termination-handler . In my mind the workflow will be like this:

  1. At some point a spot instance in the ASG will get a Rebalance notification: https://docs.aws.amazon.com/autoscaling/ec2/userguide/capacity-rebalance.html, basically, that the instance is at high risk of being deleted. This notification is way early than the spot instance removal one that is just 120 seconds.
  2. At the arrival of that notification, you can configure the AWS Node Termination Handler to put the node in NoScheduled and since last month (feat: add ability to drain on rebalance aws/aws-node-termination-handler#400 ), also to drain the node. That is what I would like.
  3. At the same time, if your spot instances are configured in "Capacity Optimized" and the ASG has enabled the option of "Capacity Rebalance", the ASG will automatically create a new instance to take over the one that it is going to be shortly deleted.
  4. The new instance has been created and automatically registered in EKS, the at-risk rebalance node is being drained and pods migrating the new one. After the draining process ends, two things could happen, the instance is being automatically deleted by AWS or the Cluster Autoscaler reaches the configured time limit that the instance has not being used by any pod and automatically removes it from the ASG.

That should optimize the usage of spot instances. Maybe I'm missing something. As far as I can see, this option is supported by the ASG terraform resource: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group#capacity_rebalance .

@LAKostis
Copy link
Contributor

#1326

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants