Update a launch template of a managed node group without recreating that managed node group #1109

pre · 2020-11-19T11:37:07Z

I have issues with updating a launch template of a Managed Node Group

I'm submitting a...

bug report

What is the current behavior?

Practically any change to a Launch Template will force recreation of a Managed Node Group.

If this is a bug, how to reproduce? Please include a code sample if relevant.

Use eg. the examples from https://github.com/terraform-aws-modules/terraform-aws-eks/tree/e4520d2c2e1b1a182d4abb8bdaf032534f5ed89f/examples/launch_templates_with_managed_node_groups
See concrete examples at the bottom
Using node_group name instead of name_prefix to remove any traces of random_pet does not have an effect.
Run terraform apply twice, so that the second run does not show any changes.
Update any attribute in the Launch Template. For example, do a change in the userdata template or change block_device_mappings > ebs > volume_size.

What's the expected behavior?

Changing the Launch Template happens in-place (as it currently does), this works and Launch Template gets a new version number.
Existing Managed Node Group is updated to the new Launch Template version number:

      launch_template_id      = aws_launch_template.ceph_osd.id
      launch_template_version = aws_launch_template.ceph_osd.default_version

Existing Managed Node Group is not replaced with a new resource. Instead the existing MNG is kept.
Something causes ami_type , disk_size and node_group_name to change and force a replacement. I have tried setting them explicitly in node_groups_defaults, or in the node_group but it does not have an effect. I have tried leaving them out - no effect, still trying to recreate the MNG.

Environment details

Affected module version: 13.2.0
OS: MacOS
Terraform version:

Terraform v0.13.5
+ provider registry.terraform.io/hashicorp/aws v3.15.0
+ provider registry.terraform.io/hashicorp/kubernetes v1.13.3
+ provider registry.terraform.io/hashicorp/local v1.4.0
+ provider registry.terraform.io/hashicorp/null v2.1.2
+ provider registry.terraform.io/hashicorp/random v3.0.0
+ provider registry.terraform.io/hashicorp/template v2.2.0

Any other relevant info

Terraform output


  # module.eks.module.node_groups.aws_eks_node_group.workers["example"] must be replaced
+/- resource "aws_eks_node_group" "workers" {
      ~ ami_type        = "AL2_x86_64" -> (known after apply) # forces replacement
      ~ arn             = "arn:aws:eks:eu-central-1:161325215566:nodegroup/aws-eu-central-1-2/aws-eu-central-1-2-example/62baf003-22ef-a6e1-a4bc-ee8e1eda4d27" -> (known after apply)
        cluster_name    = "aws-eu-central-1-2"
      ~ disk_size       = 0 -> (known after apply) # forces replacement
      ~ id              = "aws-eu-central-1-2:aws-eu-central-1-2-example" -> (known after apply)
        instance_types  = []
      ~ labels          = {} -> (known after apply)
      ~ node_group_name = "aws-eu-central-1-2-example" -> (known after apply) # forces replacement
        node_role_arn   = "arn:aws:iam::161325215566:role/aws-eu-central-1-22020111907234856210000000a"
      ~ release_version = "1.18.9-20201117" -> (known after apply)
      ~ resources       = [
          - {
              - autoscaling_groups              = [
                  - {
                      - name = "eks-62baf003-22ef-a6e1-a4bc-ee8e1eda4d27"
                    },
                ]
              - remote_access_security_group_id = ""
            },
        ] -> (known after apply)
      ~ status          = "ACTIVE" -> (known after apply)
        subnet_ids      = [
            "subnet-0a97ca5e61048b727",
        ]
      ~ tags            = {
          - "CephRole" = "mon"
          - "NodeRole" = "storage"
          - "az"       = "eu-central-1c"
        } -> (known after apply)
      ~ version         = "1.18" -> (known after apply)

      ~ launch_template {
            id      = "lt-0054877e118c06bfc"
          ~ name    = "aws-eu-central-1-2-example" -> (known after apply)
          ~ version = "2" -> (known after apply)
        }

        scaling_config {
            desired_size = 1
            max_size     = 1
            min_size     = 1
        }
    }

  # module.eks.module.node_groups.random_pet.node_groups["example"] must be replaced
+/- resource "random_pet" "node_groups" {
      ~ id        = "capable-serval" -> (known after apply)
      ~ keepers   = {
          - "ami_type"                  = "AL2_x86_64"
          - "disk_size"                 = "0"
          - "iam_role_arn"              = "arn:aws:iam::161325215566:role/aws-eu-central-1-22020111907234856210000000a"
          - "instance_type"             = "t3.large"
          - "key_name"                  = ""
          - "launch_template"           = "lt-0054877e118c06bfc"
          - "node_group_name"           = "aws-eu-central-1-2-example"
          - "source_security_group_ids" = ""
          - "subnet_ids"                = "subnet-0a97ca5e61048b727"
        } -> (known after apply) # forces replacement
        length    = 2
        separator = "-"
    }

Node Group

node_groups = {
  example = {
    name             = "${var.cluster_name}-example"
    subnets          = [module.vpc.private_subnets[2]]
    desired_capacity = 1
    max_capacity     = 1
    min_capacity     = 1

    launch_template_id      = aws_launch_template.example.id
    instance_type           = aws_launch_template.example.instance_type
    launch_template_version = aws_launch_template.example.default_version

    additional_tags = {
      NodeRole = "storage"
      az       = "eu-central-1c"
    }
  }
}

Launch Template

# https://github.com/terraform-aws-modules/terraform-aws-eks/blob/v13.2.0/examples/launch_templates_with_managed_node_groups/launchtemplate.tf

data "template_file" "launch_template_userdata_mon" {
  template = file("${path.module}/templates/userdata.sh.tpl")

  vars = {
    cluster_name        = var.cluster_name
    endpoint            = module.eks.cluster_endpoint
    cluster_auth_base64 = module.eks.cluster_certificate_authority_data
  }
}

resource "aws_launch_template" "example" {
  name                   = "${var.cluster_name}-example"
  description            = "Example Launch-Template"
  update_default_version = true

  user_data = base64encode(
    data.template_file.launch_template_userdata_mon.rendered
  )

  instance_type = "t3.large"

  block_device_mappings {
    device_name = "/dev/xvda"

    ebs {
      volume_size           = 25
      volume_type           = "gp2"
      delete_on_termination = true
    }
  }

  monitoring {
    enabled = true
  }

  network_interfaces {
    associate_public_ip_address = false
    delete_on_termination       = true
    security_groups             = [module.eks.worker_security_group_id]
  }

  tag_specifications {
    resource_type = "instance"

    tags = {
      NodeRole = "storage"
    }
  }

  tag_specifications {
    resource_type = "volume"

    tags = {
      NodeRole = "storage"
    }
  }

  tags = {
    NodeRole = "storage"
  }

  lifecycle {
    create_before_destroy = true
  }
}

The text was updated successfully, but these errors were encountered:

pre · 2020-11-19T11:45:34Z

I tried what happens if I remove instance_type from node_group as the instance type is defined by Launch Template:

No changes to the Managed Node Group (as expected)
But: module.eks.module.node_groups.random_pet.node_groups["example"] must be replaced

Maybe this is a clue? I don't understand what purpose the module.eks.module.node_groups.random_pet.node_groups["example"] serves in the first place as it contains attributes which belong to the Launch Template. Also, as I have defined explicit name the random_pet is still created - is there a way to avoid the random_pet?

Terraform will perform the following actions:

  # module.eks.module.node_groups.random_pet.node_groups["example"] must be replaced
+/- resource "random_pet" "node_groups" {
      ~ id        = "amazed-seasnail" -> (known after apply)
      ~ keepers   = { # forces replacement
            "ami_type"                  = "AL2_x86_64"
            "disk_size"                 = "0"
            "iam_role_arn"              = "arn:aws:iam::161325215566:role/aws-eu-central-1-22020111907234856210000000a"
          ~ "instance_type"             = "t3.large" -> "m4.large"
            "key_name"                  = ""
            "launch_template"           = "lt-0054877e118c06bfc"
            "node_group_name"           = "aws-eu-central-1-2-example"
            "source_security_group_ids" = ""
            "subnet_ids"                = "subnet-0a97ca5e61048b727"
        }
        length    = 2
        separator = "-"
    }

Plan: 1 to add, 0 to change, 1 to destroy.

pre · 2021-01-20T09:03:15Z

I finally took a deeper look into this issue.

Learnings:

There exists only one single link between Managed Node Groups and Random Pet and its in node_group_name
You may define an explicit node_group name and it will be used.
When you use an explicit node_group name the random_pet resource for that Node Group is still created ( module.eks.module.node_groups.random_pet.node_groups["test-workers"] ) but it is not used for anything
When the random_pet resource is created, changing any attribute in a Launch Template used in a Node Group will result in recreating this Node Group.

=> RECREATING the Node Group is incorrect behaviour.

Changing the Launch Template will create a new version of that Launch Template. However, updating the Managed Node Group use this new version of the Launch Template is a separate action.

Managed Node Group can have either:

launch_template_version = 1 # explicit version number
aws_launch_template.worker_lolcat.default_version # Node Group always uses the latest version

Launch Template versions are meant to be used as a managed way to distribute updates to Managed Node Groups. If the Managed Node Group is recreated from scratch, rather than updated in-place, it defeats the purpose.

Workaround - The random_pet resource is harmful

Issue can be fixed as follows:

Define an explicit name for your node_group (so random_pet is not used for anything in node_group)

  node_groups = {
    test-workers = {
      name                    = "${var.cluster_name}-test-workers"
      # ..
    }
}

Delete modules/node_groups/random.tf

TL;DR With explicit node_group name, modules/node_groups/random.tf is not used for anything, but its mere existence causes this whole issue.

Avoid random_pet - it is a nice helper for starting up easily but it will only cause debug pains in a live environment.

- Fixes terraform-aws-modules#1109 terraform-aws-modules#1109 - Fixes terraform-aws-modules#1188 terraform-aws-modules#1188

pinkavaj · 2021-02-28T10:31:33Z

This is particullary issue when Terraform is applied from Linux and the from Windows machine (or vice versa), because userdata.sh.tpl differs because of the different newlines ...

ngocketit · 2021-04-05T19:14:22Z

This is indeed a pain for production use. @pre How did you manage to delete random.tf?

pre · 2021-04-05T20:03:25Z

This is indeed a pain for production use. @pre How did you manage to delete random.tf?

I forked this module and stopped dreaming about community maintained terraform.

drunkirishcoder · 2021-05-07T17:24:34Z

we just ran into this problem as well. is there plan to fix this in the module?

github-actions · 2022-11-21T02:28:19Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

pre changed the title ~~How to update a launch template of a managed node group~~ Update a launch template of a managed node group without recreating the managed node group Nov 19, 2020

pre changed the title ~~Update a launch template of a managed node group without recreating the managed node group~~ Update a launch template of a managed node group without recreating that managed node group Nov 19, 2020

pre mentioned this issue Nov 30, 2020

Feat/bug: workers ASGs that are more resilient to unnecessary change #1105

Closed

4 tasks

This was referenced Jan 20, 2021

EKS Node Group fails to recreate when using launch template, on minor template update #1152

Closed

Remove modules/node_groups/random.tf completely - it only causes issues #1188

Closed

pre added a commit to pre/terraform-aws-eks that referenced this issue Jan 20, 2021

fix: Updating Launch Template must not recreate Managed Node Group

24fd6d5

- Fixes terraform-aws-modules#1109 terraform-aws-modules#1109 - Fixes terraform-aws-modules#1188 terraform-aws-modules#1188

pre mentioned this issue Jan 20, 2021

fix: Updating Launch Template must not recreate Managed Node Group #1189

Closed

2 tasks

siku4 mentioned this issue Mar 25, 2021

How to perform rolling updates via terraform for managed node groups when using custom AMI #1238

Closed

4 tasks

barryib mentioned this issue May 19, 2021

feat: Drop random pets from Managed Node Groups #1372

Merged

2 tasks

barryib closed this as completed in #1372 May 27, 2021

github-actions bot locked as resolved and limited conversation to collaborators Nov 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update a launch template of a managed node group without recreating that managed node group #1109

Update a launch template of a managed node group without recreating that managed node group #1109

pre commented Nov 19, 2020 •

edited

Loading

pre commented Nov 19, 2020 •

edited

Loading

pre commented Jan 20, 2021

pinkavaj commented Feb 28, 2021 •

edited

Loading

ngocketit commented Apr 5, 2021

pre commented Apr 5, 2021

drunkirishcoder commented May 7, 2021

github-actions bot commented Nov 21, 2022

Update a launch template of a managed node group without recreating that managed node group #1109

Update a launch template of a managed node group without recreating that managed node group #1109

Comments

pre commented Nov 19, 2020 • edited Loading

I have issues with updating a launch template of a Managed Node Group

I'm submitting a...

What is the current behavior?

If this is a bug, how to reproduce? Please include a code sample if relevant.

What's the expected behavior?

Related

Environment details

Any other relevant info

pre commented Nov 19, 2020 • edited Loading

pre commented Jan 20, 2021

Workaround - The random_pet resource is harmful

pinkavaj commented Feb 28, 2021 • edited Loading

ngocketit commented Apr 5, 2021

pre commented Apr 5, 2021

drunkirishcoder commented May 7, 2021

github-actions bot commented Nov 21, 2022

pre commented Nov 19, 2020 •

edited

Loading

pre commented Nov 19, 2020 •

edited

Loading

pinkavaj commented Feb 28, 2021 •

edited

Loading