Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot upgrade cluster version from 1.16 to 1.17 #1003

Closed
1 of 4 tasks
randrusiak opened this issue Sep 8, 2020 · 7 comments
Closed
1 of 4 tasks

Cannot upgrade cluster version from 1.16 to 1.17 #1003

randrusiak opened this issue Sep 8, 2020 · 7 comments

Comments

@randrusiak
Copy link

I have issues

Hi there!

I have a problem with upgrading a cluster to the latest version of EKS. After changing cluster_version to 1.17 and submitting terraform apply I got an error:

Error: Cycle: module.eks.random_pet.workers_launch_template[1] (destroy deposed 08cff672), module.eks.random_pet.workers_launch_template[0] (destroy deposed 7972b6ab), module.eks (close)

I'm wondering if it is caused by terraform 0.13.1?

I've tried to do upgrade with 0.12.29 but I can't refresh state with the older version of terraform.

I'm submitting a...

  • bug report
  • feature request
  • support request - read the FAQ first!
  • kudos, thank you, warm fuzzy

What is the current behavior?

I described it above.

If this is a bug, how to reproduce? Please include a code sample if relevant.

I just change cluster_version to 1.17 in following resource

module "eks" {
  source                                             = "terraform-aws-modules/eks/aws"
  version                                            = "12.2.0"
  cluster_name                                       = var.cluster_name
  cluster_version                                    = "1.16"
  subnets                                            = module.vpc.private_subnets
  enable_irsa                                        = true # Whether to create OpenID Connect Provider for EKS to enable IRSA
  config_output_path                                 = "./"
  write_kubeconfig                                   = true
  cluster_enabled_log_types                          = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
  cluster_log_retention_in_days                      = 30
  cluster_endpoint_private_access                    = true
  cluster_endpoint_public_access_cidrs               = var.cluster_endpoint_public_access_cidrs
  worker_create_security_group                       = true
  worker_create_cluster_primary_security_group_rules = true

  tags = {
    "k8s.io/cluster-autoscaler/${var.cluster_name}" = "owned"
    "k8s.io/cluster-autoscaler/enabled"             = "true"
  }

  vpc_id = module.vpc.vpc_id

  worker_groups_launch_template = [
    {
      name                    = "generic-spot-workers-01"
      public_ip               = false
      enable_monitoring       = false
      root_volume_size        = 10
      override_instance_types = ["m5.large", "m5a.large"]
      spot_price              = "0.115" # maximum price equals current price for ondemand m5.large instance
      asg_max_size            = 5
      asg_desired_capacity    = 1
      kubelet_extra_args      = "--node-labels=node.kubernetes.io/lifecycle=spot,NodePurpose=generic"
    },
    {
      name                                     = "on-demand-workers-01"
      public_ip                                = false
      enable_monitoring                        = false
      root_volume_size                         = 10
      override_instance_types                  = ["m5.large", "m5a.large"]
      asg_max_size                             = 1
      asg_desired_capacity                     = 1
      on_demand_percentage_above_base_capacity = 0 # for testing purpose use only spot instances, after testing set 100 
      kubelet_extra_args                       = "--node-labels=node.kubernetes.io/lifecycle=normal --register-with-taints=Lifecycle=normal:PreferNoSchedule"
    },
  ]

  map_users = [
    for key, value in merge(var.cluster_users, var.existing_cluster_users) :
    {
      userarn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:user/${key}"
      username = key
      groups   = value.groups
    }
  ]
}

Here is a plan output:

  # module.eks.aws_eks_cluster.this[0] will be updated in-place
  ~ resource "aws_eks_cluster" "this" {
        arn                       = "arn:aws:eks:eu-central-1:00000:cluster/some-cluster"
        certificate_authority     = [
            {
                data = "xxxx"
            },
        ]
        created_at                = "2020-09-08 05:20:37.78 +0000 UTC"
        enabled_cluster_log_types = [
            "api",
            "audit",
            "authenticator",
            "controllerManager",
            "scheduler",
        ]
        endpoint                  = "https://some-endpoint.gr7.eu-central-1.eks.amazonaws.com"
        id                        = "some-cluster"
        identity                  = [
            {
                oidc = [
                    {
                        issuer = "https://oidc.eks.eu-central-1.amazonaws.com/id/00000000"
                    },
                ]
            },
        ]
        name                      = "some-cluster"
        platform_version          = "eks.3"
        role_arn                  = "arn:aws:iam::00000000:role/some-cluster20200908052034122400000001"
        status                    = "ACTIVE"
        tags                      = {
            "k8s.io/cluster-autoscaler/some-cluster" = "owned"
            "k8s.io/cluster-autoscaler/enabled"        = "true"
        }
      ~ version                   = "1.16" -> "1.17"

        timeouts {
            create = "30m"
            delete = "15m"
        }

        vpc_config {
            cluster_security_group_id = "sg-0402dd9a48f3af3cb"
            endpoint_private_access   = true
            endpoint_public_access    = true
            public_access_cidrs       = [
                "0.0.0.0/0",
            ]
            security_group_ids        = [
                "sg-045bae69b359f1725",
            ]
            subnet_ids                = [
                "subnet-026e8145f409986ee",
                "subnet-0633454877ad17cf6",
                "subnet-064ef1da8021a688d",
            ]
            vpc_id                    = "vpc-0a50477dfcb7a82dc"
        }
    }

  # module.eks.aws_launch_template.workers_launch_template[0] will be updated in-place
  ~ resource "aws_launch_template" "workers_launch_template" {
        arn                     = "arn:aws:ec2:eu-central-1:xxxxx:launch-template/lt-06595302f6ba38e7a"
        default_version         = 1
        disable_api_termination = false
        ebs_optimized           = "true"
        id                      = "lt-06595302f6ba38e7a"
      ~ image_id                = "ami-0b2edbf190fe05b92" -> "ami-047e3ad49b70ed809"
        instance_type           = "m4.large"
      ~ latest_version          = 1 -> (known after apply)
        name                    = "some-cluster-generic-spot-workers-012020090805351684350000000d"
        name_prefix             = "some-cluster-generic-spot-workers-01"
        security_group_names    = []
        tags                    = {
            "k8s.io/cluster-autoscaler/some-cluster" = "owned"
            "k8s.io/cluster-autoscaler/enabled"        = "true"
        }
        user_data               = "xxx="
        vpc_security_group_ids  = []

        block_device_mappings {
            device_name = "/dev/xvda"

            ebs {
                delete_on_termination = "true"
                encrypted             = "false"
                iops                  = 0
                volume_size           = 10
                volume_type           = "gp2"
            }
        }

        credit_specification {
            cpu_credits = "standard"
        }

        iam_instance_profile {
            name = "some-cluster2020090805324091800000000b"
        }

        metadata_options {
            http_endpoint               = "enabled"
            http_put_response_hop_limit = 0
            http_tokens                 = "optional"
        }

        monitoring {
            enabled = false
        }

        network_interfaces {
            associate_public_ip_address = "false"
            delete_on_termination       = "true"
            device_index                = 0
            ipv4_address_count          = 0
            ipv4_addresses              = []
            ipv6_address_count          = 0
            ipv6_addresses              = []
            security_groups             = [
                "sg-0dbbed1cc6fbed9e1",
            ]
        }

        tag_specifications {
            resource_type = "volume"
            tags          = {
                "Name"                                     = "some-cluster-generic-spot-workers-01-eks_asg"
                "k8s.io/cluster-autoscaler/some-cluster" = "owned"
                "k8s.io/cluster-autoscaler/enabled"        = "true"
            }
        }
        tag_specifications {
            resource_type = "instance"
            tags          = {
                "Name"                                     = "some-cluster-generic-spot-workers-01-eks_asg"
                "k8s.io/cluster-autoscaler/some-cluster" = "owned"
                "k8s.io/cluster-autoscaler/enabled"        = "true"
            }
        }
    }

  # module.eks.aws_launch_template.workers_launch_template[1] will be updated in-place
  ~ resource "aws_launch_template" "workers_launch_template" {
        arn                     = "arn:aws:ec2:eu-central-1:xxxxx:launch-template/lt-025fdfba223ae7492"
        default_version         = 1
        disable_api_termination = false
        ebs_optimized           = "true"
        id                      = "lt-025fdfba223ae7492"
      ~ image_id                = "ami-0b2edbf190fe05b92" -> "ami-047e3ad49b70ed809"
        instance_type           = "m4.large"
      ~ latest_version          = 1 -> (known after apply)
        name                    = "some-cluster-on-demand-workers-012020090805351698020000000f"
        name_prefix             = "some-cluster-on-demand-workers-01"
        security_group_names    = []
        tags                    = {
            "k8s.io/cluster-autoscaler/some-cluster" = "owned"
            "k8s.io/cluster-autoscaler/enabled"        = "true"
        }
        user_data               = "xxxx="
        vpc_security_group_ids  = []

        block_device_mappings {
            device_name = "/dev/xvda"

            ebs {
                delete_on_termination = "true"
                encrypted             = "false"
                iops                  = 0
                volume_size           = 10
                volume_type           = "gp2"
            }
        }

        credit_specification {
            cpu_credits = "standard"
        }

        iam_instance_profile {
            name = "some-cluster2020090805324229910000000c"
        }

        metadata_options {
            http_endpoint               = "enabled"
            http_put_response_hop_limit = 0
            http_tokens                 = "optional"
        }

        monitoring {
            enabled = false
        }

        network_interfaces {
            associate_public_ip_address = "false"
            delete_on_termination       = "true"
            device_index                = 0
            ipv4_address_count          = 0
            ipv4_addresses              = []
            ipv6_address_count          = 0
            ipv6_addresses              = []
            security_groups             = [
                "sg-0dbbed1cc6fbed9e1",
            ]
        }

        tag_specifications {
            resource_type = "volume"
            tags          = {
                "Name"                                     = "some-cluster-on-demand-workers-01-eks_asg"
                "k8s.io/cluster-autoscaler/some-cluster" = "owned"
                "k8s.io/cluster-autoscaler/enabled"        = "true"
            }
        }
        tag_specifications {
            resource_type = "instance"
            tags          = {
                "Name"                                     = "some-cluster-on-demand-workers-01-eks_asg"
                "k8s.io/cluster-autoscaler/some-cluster" = "owned"
                "k8s.io/cluster-autoscaler/enabled"        = "true"
            }
        }
    }

  # module.eks.random_pet.workers_launch_template[0] must be replaced
+/- resource "random_pet" "workers_launch_template" {
      ~ id        = "related-bobcat" -> (known after apply)
      ~ keepers   = {
          - "lt_name" = "some-cluster-generic-spot-workers-012020090805351684350000000d-1"
        } -> (known after apply) # forces replacement
        length    = 2
        separator = "-"
    }

  # module.eks.random_pet.workers_launch_template[1] must be replaced
+/- resource "random_pet" "workers_launch_template" {
      ~ id        = "live-buck" -> (known after apply)
      ~ keepers   = {
          - "lt_name" = "some-cluster-on-demand-workers-012020090805351698020000000f-1"
        } -> (known after apply) # forces replacement
        length    = 2
        separator = "-"
    }

What's the expected behavior?

I expect that I'm able to upgrade cluster version without any errors.

Are you able to fix this problem and submit a PR? Link here if you have already.

Unfortunately, I'm not.

Environment details

  • Affected module version: 12.2.0
  • OS: Ubuntu 20.04
  • Terraform version: 0.13.1

Any other relevant info

@dpiddockcmp
Copy link
Contributor

0.13 is giving us lots of problems. This is a slightly different error to what we've seen before when changes to the launch template were required #939

I'm not sure how module.eks depends on the random_pets

@randrusiak
Copy link
Author

So should I revert to the older version of terraform? In general, using old version is not a problem for me but I don't know how to refresh state to the older version. Do you know how to do that?

@randrusiak
Copy link
Author

randrusiak commented Sep 11, 2020

For everyone who has a similar problem, don't try to downgrading remote state according to documentation (https://support.hashicorp.com/hc/en-us/articles/360001147287-Downgrading-Terraform) in my case it didn't work.
If you have versioning enabled for your remote backend just restore to oldest version it will save your time :)

@dpiddockcmp
I have already upgrade EKS from 1.16 to 1.17 with terraform 0.12.29 without any issues. So I think currently is better to stay with older version of terraform. Maybe you should add a warning about that in readme?

@barryib
Copy link
Member

barryib commented Nov 11, 2020

@randrusiak Can you check if you're still experiencing this issue with the latest version of this module and TF >= 0.13.4 ? We fixed some cycle issue for random pets.

FWIW, random pet destroy shouldn't destroy anything (by default). They have been added for ASG recreation when LT/LC change. But only when asg_recreate_on_change is set to true.

@randrusiak
Copy link
Author

@barryib I will try to test latest version of module by the end of the week and I'll let you know.

@randrusiak
Copy link
Author

@barryib I've checked upgrading process with latest versions and issue hasn't occurred.
I've tried to do the same with older versions which I reported, and I couldn't reproduce this bug. So I'm a little confused but we can assume that problem is solved.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants