Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Drop random pets from Managed Node Groups #1372

Merged
merged 4 commits into from
May 27, 2021

Conversation

barryib
Copy link
Member

@barryib barryib commented May 19, 2021

PR o'clock

Description

BREAKING CHANGES: We now decided to remove random_pet resources in Managed Node Groups (MNG). Those were used to recreate MNG if something changed and also simulate the new argument node_group_name_prefix. But they were causing a lot of issues. To upgrade the module without recreating your MNG, you will need to explicitly reuse their previous name and set them in your MNG name argument. Please see upgrade docs for more details.

Checklist

@barryib barryib changed the title fix: Create random pets before destroy for Managed Node Groups fix: Create random pets before Managed Node Groups are destroyed May 19, 2021
@barryib barryib self-assigned this May 19, 2021
@barryib barryib changed the title fix: Create random pets before Managed Node Groups are destroyed feat: Drop random pets from Managed Node Groups May 19, 2021
@barryib
Copy link
Member Author

barryib commented May 19, 2021

cc @ArchiFleKs @daroga0002 @stevehipwell could you review this please ?

@stevehipwell
Copy link
Contributor

/lgtm

@ArchiFleKs
Copy link
Contributor

LGTM, I think this will prevent a lot of issues :)

Copy link
Contributor

@daroga0002 daroga0002 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the only thing which is missing me is some docs on how to upgrade the module to a newer version.

I think a solution can be removal from terraform state current node groups, creating new groups from new module versions, and legacy node groups from AWS console/cli.

Maybe next week I will find some time to find upgrade path as my workload is using module version 13.2.1

@stevehipwell
Copy link
Contributor

@daroga0002 I think this is only a change if you've not been providing name for the groups? To solve this I expect that you just need to set the name key to the dynamic name.

@daroga0002
Copy link
Contributor

@daroga0002 I think this is only a change if you've not been providing name for the groups? To solve this I expect that you just need to set the name key to the dynamic name.

after introducing this change I believe it will force all group to recreate (because node_group_name_prefix will force replacement), so this should be marked as Breaking change.

@stevehipwell
Copy link
Contributor

@daroga0002 does the launch templates being re-created actually matter? I don't think that the ASGs will be re-created as we're using name there.

@daroga0002
Copy link
Contributor

@stevehipwell we are recreating whole aws_eks_node_group as we moving from node_group_name to node_group_name_prefix.

But let me test it just to ensure

@barryib
Copy link
Member Author

barryib commented May 19, 2021

@stevehipwell we are recreating whole aws_eks_node_group as we moving from node_group_name to node_group_name_prefix.

But let me test it just to ensure

Yes, I think the node group will be recreated. Should we let users to define a specific name like before ?

Something like this:

  node_group_name_prefix = lookup(each.value, "name", null) == null ? lookup(each.value, "name_prefix", join("-", [var.cluster_name, each.key])) : null
  node_group_name        = lookup(each.value, "name", null)

@stevehipwell
Copy link
Contributor

stevehipwell commented May 19, 2021

I missed the move to prefix over name, how come we're doing this?

Just saw your reply @barryib.

@stevehipwell
Copy link
Contributor

@barryib I'd like to see the ability to specify a name directly.

@daroga0002
Copy link
Contributor

But let me test it just to ensure

Just giving here test result:

Terraform will perform the following actions:

  # module.eks.module.node_groups.aws_eks_node_group.workers["example"] must be replaced
+/- resource "aws_eks_node_group" "workers" {
      ~ arn                    = "arn:aws:eks:us-west-2:*******:nodegroup/test-eks-u8D6IkXR/test-eks-u8D6IkXR-example-grand-kitten/12bcc295-5e10-ba79-5b8a-4d1b61bae030" -> (known after apply)
      ~ id                     = "test-eks-u8D6IkXR:test-eks-u8D6IkXR-example-grand-kitten" -> (known after apply)
      ~ node_group_name        = "test-eks-u8D6IkXR-example-grand-kitten" -> (known after apply)
      + node_group_name_prefix = "test-eks-u8D6IkXR-example-" # forces replacement
      ~ release_version        = "1.17.12-20210512" -> (known after apply)
      ~ resources              = [
          - {
              - autoscaling_groups              = [
                  - {
                      - name = "eks-12bcc295-5e10-ba79-5b8a-4d1b61bae030"
                    },
                ]
              - remote_access_security_group_id = ""
            },
        ] -> (known after apply)
      ~ status                 = "ACTIVE" -> (known after apply)
        tags                   = {
            "Environment" = "test"
            "ExtraTag"    = "example"
            "GithubOrg"   = "terraform-aws-modules"
            "GithubRepo"  = "terraform-aws-eks"
        }
      ~ version                = "1.17" -> (known after apply)
        # (9 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.eks.module.node_groups.random_pet.node_groups["example"] will be destroyed
  - resource "random_pet" "node_groups" {
      - id        = "grand-kitten" -> null
      - keepers   = {
          - "ami_type"                  = "AL2_x86_64"
          - "capacity_type"             = "SPOT"
          - "disk_size"                 = "50"
          - "iam_role_arn"              = "arn:aws:iam::*******:role/test-eks-u8D6IkXR20210519143621640500000009"
          - "instance_types"            = "m5.large"
          - "key_name"                  = ""
          - "node_group_name"           = "test-eks-u8D6IkXR-example"
          - "source_security_group_ids" = ""
          - "subnet_ids"                = "subnet-0f4e1d832b0b777b8|subnet-024a5e4004c12cf31|subnet-07223e6126567f61c"
        } -> null
      - length    = 2 -> null
      - separator = "-" -> null
    }

Plan: 1 to add, 0 to change, 2 to destroy.

@stevehipwell
Copy link
Contributor

@daroga0002 I assume that's without the suggested changes in #1372 (comment)?

@daroga0002
Copy link
Contributor

yes, this is original PR, those changes proposed by @barryib will solve this issue and give additional possibility to switch to prefixed names (which from my side is desired feature)

@jack1902
Copy link

is there a way to extend this so that a launch_template is always created? since #1348 resulted in two node groups not being able to communicate with one another because one used a launch-template defined as part of the module and one didn't (i know EKS created a copy under the hood)

@barryib
Copy link
Member Author

barryib commented May 19, 2021

@daroga0002 @ArchiFleKs @stevehipwell just updated my PR according to your reviews.

is there a way to extend this so that a launch_template is always created? since #1348 resulted in two node groups not being able to communicate with one another because one used a launch-template defined as part of the module and one didn't (i know EKS created a copy under the hood)

@jack1902 good point. We can't always create LT, because some users don't need it. But maybe we should create the LT if LT is needed and stop supporting external LT ?

@jack1902
Copy link

@barryib maybe, (the worker_sg piece really caught me off guard, since by default, the worker_sg is not attached to nodes created unless a launch template is defined)

Might be worth updating the description on the Security Group to reflect this caveat OR like you say, allow people to extend a pre-defined launch template? [Happy to move this to thread since the change in this PR is a decent one, to swap nodes out without having to go through the process of creating a new managed node group through the petname piece]

@stevehipwell
Copy link
Contributor

/lgtm

@barryib
Copy link
Member Author

barryib commented May 19, 2021

@ArchiFleKs
Copy link
Contributor

@barryib maybe, (the worker_sg piece really caught me off guard, since by default, the worker_sg is not attached to nodes created unless a launch template is defined)

Might be worth updating the description on the Security Group to reflect this caveat OR like you say, allow people to extend a pre-defined launch template? [Happy to move this to thread since the change in this PR is a decent one, to swap nodes out without having to go through the process of creating a new managed node group through the petname piece]

What about https://github.com/terraform-aws-modules/terraform-aws-eks#input_worker_create_cluster_primary_security_group_rules ? I personally always set it and this allow all the different type to worker to communicate together ?

@fractos
Copy link

fractos commented May 19, 2021

Heh, turned out I was logged in as my test user for the above comment.

@barryib
Copy link
Member Author

barryib commented May 19, 2021

One thing I noted around this issue is that, in the previous version, if you had supplied a name for the node group then the create_before_destroy=true lifecycle rule meant the Terraform apply failed if you did something like changing an EC2 instance type. I just wanted to highlight this as I can see that this is potentially an issue here as well.

I'm don't use MNG, so don't test this intensively. So if changing something like ami_id will re-create the node group, then we should stick with the name_prefix to avoir node group name collision because of create_before_destroy.

Edit:
Just noticed that almost every change in node group will force it's re-creatiom. See https://github.com/hashicorp/terraform-provider-aws/blob/main/aws/resource_aws_eks_node_group.go#L43 :)

@fractos
Copy link

fractos commented May 19, 2021

Edit: Just noticed that almost every change in node group will force it's re-creatiom. See https://github.com/hashicorp/terraform-provider-aws/blob/main/aws/resource_aws_eks_node_group.go#L43 :)

yep, exactly. I've had to vend this module and drop the lifecycle rule in my current project because of this.

@barryib
Copy link
Member Author

barryib commented May 20, 2021

What is the best upgrade path when you're using Managed Node Groups with Terrarform ?

@stevehipwell
Copy link
Contributor

@barryib AMI ID is in the launch template and doesn't force node group re-creation but enough arguments that might be changed regularly do so I'm thinking that name_prefix with create_before_destroy=true would be a better fit. This would also be clearer when names are duplicated which is entirely possible (unlike map keys).

@stevehipwell
Copy link
Contributor

@barryib any progress on this?

@barryib
Copy link
Member Author

barryib commented May 27, 2021

@barryib any progress on this?

@stevehipwell still working on it. Just wanted to test it more.

@barryib barryib force-pushed the node-group-random-pet branch 2 times, most recently from 0514f9d to 3b2c357 Compare May 27, 2021 21:43
@barryib
Copy link
Member Author

barryib commented May 27, 2021

@ArchiFleKs @daroga0002 @stevehipwell a final review please.

@barryib barryib merged commit 6d7d6f6 into terraform-aws-modules:master May 27, 2021
@barryib barryib deleted the node-group-random-pet branch May 27, 2021 23:50
@barryib
Copy link
Member Author

barryib commented May 27, 2021

Thanks everyone for your help on this.

ArchiFleKs pushed a commit to ArchiFleKs/terraform-aws-eks that referenced this pull request Jun 1, 2021
…s#1372)

BREAKING CHANGES: We now decided to remove `random_pet` resources in Managed Node Groups (MNG). Those were used to recreate MNG if something change and also simulate the newly added argument `node_group_name_prefix`. But they were causing a lot of troubles. To upgrade the module without recreating your MNG, you will need to explicitly reuse their previous name and set them in your MNG `name` argument. Please see [upgrade docs](https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/upgrades.md#upgrade-module-to-v1700-for-managed-node-groups) for more details.
@philicious
Copy link
Contributor

philicious commented Jun 16, 2021

for what its worth: I can report TF lets me upgrade from <=14.0.0 to 17.0, w/o having to recreate node-groups !
which is why we got stuck below 14 ... and I didnt yet dare to manually patch TF state around...gladly

so thanks for this change !

steps to reproduce, might help someone coming here:

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.