-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EKS Node Group fails to recreate when using launch template, on minor template update #1152
Comments
After updating the userdata again, I now looked back and the actual trigger of the reload was a change in the node type:
|
Hi @jcam , I am facing the same issue with managed node groups, I tried adding:
as you mentioned, but it's not working, it's still adding recreating node group for me. |
I'm failing to understand the correct behaviour expected by the original description in this issue (above). In my mind - the Managed Node Group must not be recreated when the Launch Template is updated. The Managed Node Group is using a certain version of a Launch Template. When that Launch Template is updated, a new version of the same Launch Template will be available (same Launch Template id). As a result, the existing Managed Node Group can be updated to use the new version of the same Launch Template. The crucial point is: The Managed Node Group should NOT be recreated when a Launch Template is updated. Instead, the new version of the same Launch Template will be made available to the Managed Node Group, but it is a separate decision to update the Managed Node Group use this new Launch Template version. |
Btw, check out the following #1109 (comment) You might be experiencing the same issue, but having different symptoms. TL;DR avoid |
I agree the existing node group should be updated, but that is not what terraform tries to do. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I think #1138 fixes this. |
Still hitting this with v15.2.0 that AFAICT is including #1138. In my case, previous state was coming from a run with a private copy of the module that was just including a prerelease of #1138 and the replacement is forced by:
The error is still: Error: error creating EKS Node Group (xxxxxxxxxxxxxxx-1-apps-mng-1-on-cicada): ResourceInUseException: NodeGroup already exists with name xxxxxxxxxxxxxxxx-1-apps-mng-1-on-cicada and cluster name xxxxxxxxxxxxx HTH |
Yes, facing this also on 15.2 |
Can confirm this is still a problem on 15.2 |
@Chinikins @olegcoreq @mc-meta is #1372 resolves this issue ? |
Hello @barryib, i've replicated cluster state as in my previous comment, and in my case, while v15.2.0 and v16.1.0 are still failing with same error, code from #1372 seems to be working fine and MNGs are recreated as expected without errors. Couple of attention points:
HTH |
also confirming that if I try with #1372 and bumping up the aws provider, this fixes the issue for me. |
Thank you for your feedback. Just bumped required version. |
How do we fix the max length issue? Would be nice to truncate the name somewhere... |
I think we could drop the terraform-aws-eks/modules/node_groups/locals.tf Lines 30 to 38 in 9022013
Node groups are already namespaced under the cluster:
As workaround, node_groups = {
foobar = {
name_prefix = "foobar"
# ...
}
} |
Bumped into this issue using this provider: terraform { I don't think updating the provider would resolve this issue. |
@barryib |
which version of module you are using? |
@daroga0002 It seems to be 17.1.0. |
Please update to latest as there was multiple improvements in that area and then share a error which you are getting |
I, too, am having the exact same problem with eks version 17.23.0, the latest release available. Also using the latest aws provider, v3.64.2. My case is (perhaps) a little different - I'm adding key_name and source_security_group_ids to my node_groups - but it's the same situation, it's forcing terraform to delete/recreate instead of updating in place. |
In the end I had to create new nodegroups, drain the old nodegroups, delete them via eksctl, remove them from my .tf files, and then the next terraform apply deleted and recreated the new nodegroups, but finally accepted them as "terraform managed" and subsequent terraform plans were clean. No amount of terraform state importing or moving or editing was able to get me around terraform wanting to delete all my nodegroups and recreate them. I finally let it go, even though there was a 5 minute outage while there were 0 worker nodes, because we could take the outage, and it was faster than digging through this issue further. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
I have issues (while true, that is separate from this bug report ;) )
I'm submitting a...
What is the current behavior?
After creating an EKS node group with a launch template, updating the template causes a node group already exists failure
If this is a bug, how to reproduce? Please include a code sample if relevant.
Create an EKS node group with a custom launch template, using custom userdata
Update the user data being consumed by the launch template.
On apply, terraform will instruct AWS to update the launch template in place, creating a new version of the template.
tf will also instruct AWS to create a new node_group using the updated launch template version, but will fail with this error:
This is caused by the configuration of the random_pet for the node_group, which currently does have the launch template version as one of its keepers.
What's the expected behavior?
The random_pet name should be updated, then a new node group will be created, EKS will migrate all services to the new node group and shut down the old group
Are you able to fix this problem and submit a PR? Link here if you have already.
In this file: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/modules/node_groups/random.tf#L20
Add the following as line 21:
Environment details
Any other relevant info
The text was updated successfully, but these errors were encountered: