Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE Cluster gets destroyed and reconfigured on reapply #1442

Closed
jlevertov opened this issue Oct 30, 2022 · 8 comments
Closed

GKE Cluster gets destroyed and reconfigured on reapply #1442

jlevertov opened this issue Oct 30, 2022 · 8 comments
Labels
bug Something isn't working triaged Scoped and ready for work

Comments

@jlevertov
Copy link

jlevertov commented Oct 30, 2022

TL;DR

Because of the zones random shuffle, when the cluster is regional no matter what, when I reapply the module (even if no change was made) terraform reads the cluster as needs to be recreated (destroyed and created)

Expected behavior

When no change is made in the module configuration, nothing should happen.

Observed behavior

when I reapply the module (even if no change was made) the cluster is being recreated (destroyed and created)
Plan example:

 # module.cloud-infra.module.gke_configuration.module.gke.random_shuffle.available_zones must be replaced
-/+ resource "random_shuffle" "available_zones" {
      ~ id           = "-" -> (known after apply)
      ~ input        = [
          - "zone-a",
          - "zone-b",
          - "zone-c",
        ] -> (known after apply) # forces replacement
      ~ result       = [
          - "zone-b",
          - "zone-a",
          - "zone-c",
        ] -> (known after apply)
        # (1 unchanged attribute hidden)
    }

Logs Example:

module.cloud-infra.module.gke_configuration.module.gke.random_shuffle.available_zones: Destroying... [id=-]
module.cloud-infra.module.gke_configuration.module.gke.random_shuffle.available_zones: Destruction complete after 0s
.
.
.
module.cloud-infra.module.gke_configuration.module.gke.data.google_compute_zones.available: Read complete after 0s [id=<redacted>]
module.cloud-infra.module.gke_configuration.module.gke.data.google_container_engine_versions.zone: Reading...
module.cloud-infra.module.gke_configuration.module.gke.random_shuffle.available_zones: Creating...
module.cloud-infra.module.gke_configuration.module.gke.random_shuffle.available_zones: Creation complete after 0s [id=-]
module.cloud-infra.module.gke_configuration.module.gke.data.google_container_engine_versions.zone: Read complete after 1s [id=2022-10-30 15:07:05.538703735 +0000 UTC]

Terraform Configuration

module "gke" {
  source                            = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
  version                           = "23.0.0"
  project_id                        = var.project_id
  name                              = var.config.name
  kubernetes_version                = var.config.kubernetes_version
  regional                          = true
  region                            = var.config.region
  zones                             = [zone-a, zone-b, zone-c]
  network                           = var.config.network_config.name
  subnetwork                        = var.config.network_config.subnet
  enable_private_nodes              = true
  ip_range_pods                     = var.config.network_config.ip_range_pods
  ip_range_services                 = var.config.network_config.ip_range_services
  master_ipv4_cidr_block            = var.config.network_config.master_ipv4_cidr_block
  master_authorized_networks        = var.config.network_config.master_authorized_networks
  
  node_pools = var.config.node_pools

  node_pools_oauth_scopes = {
    all = [
      ...
    ]
  }
}

Terraform Version

v1.3.1

Additional information

The cluster itself seems to not change and all the workloads and nodes stay the same.

@jlevertov jlevertov added the bug Something isn't working label Oct 30, 2022
@bharathkkb
Copy link
Member

Thanks for the report @jlevertov.

~ input        = [
          - "zone-a",
          - "zone-b",
          - "zone-c",
        ] -> (known after apply) # forces replacement

This seems to me like something deferring the data source read which is the input to the random_shuffle. Are you using depends_on with the module?

An option could be to make this conditional and only create the random resource if var.zones is not set.

@bharathkkb bharathkkb added the triaged Scoped and ready for work label Nov 1, 2022
@jlevertov
Copy link
Author

Hi,
I do use depends on, I depend on creating the VPC network before the GKE can be created. the VPC does not change.
This happens also when I set var.zones. Is the option you suggested for me or for discussion as to how to resolve this in the code?

@sloniki
Copy link

sloniki commented Jan 20, 2023

Any update on this? Having the same issue.

@NissesSenap
Copy link
Contributor

Coulden't this be solved by adding keepers to the random. Then it would change if it would open a new zone in a region. Which is something that doesn't happen that often.

  keepers = {
    # Generate a new id each time we switch to a new AMI id
    zone_names = data.google_compute_zones.available.names
  }

or I guess we could do it some other variable as well.

NissesSenap added a commit to NissesSenap/terraform-google-kubernetes-engine that referenced this issue Aug 10, 2023
This way we will still get random zones but only the first time the cluster is created.
Solves terraform-google-modules#1442

Signed-off-by: Edvin Norling <edvin.norling@kognic.com>
NissesSenap added a commit to NissesSenap/terraform-google-kubernetes-engine that referenced this issue Aug 10, 2023
This way we will still get random zones but only the first time the cluster is created.
Solves terraform-google-modules#1442

Signed-off-by: Edvin Norling <edvin.norling@kognic.com>
@ericyz ericyz closed this as completed Aug 22, 2023
@ericyz
Copy link
Collaborator

ericyz commented Aug 22, 2023

#1709

@wyardley
Copy link
Contributor

It's presumably safe to apply the deleted random shuffle resource that shows up from this?

  # module.gke.random_shuffle.available_zones[0] will be destroyed
  # (because index [0] is out of range for count)
  # (moved from module.gke.random_shuffle.available_zones)
- resource "random_shuffle" "available_zones" {
-       id           = "-" -> null
-       input        = [
-           "us-west2-a",
-           "us-west2-b",
-           "us-west2-c",
        ] -> null
-       result       = [
-           "us-west2-c",
-           "us-west2-a",
-           "us-west2-b",
        ] -> null
-       result_count = 3 -> null
    }

@NissesSenap
Copy link
Contributor

Yes @wyardley , i have done it many times.

@vcolombo
Copy link

vcolombo commented Aug 6, 2024

I know this issue is closed, but I'm running into issues with the random_shuffle trying to recreate my cluster. I'm using the beta-autopilot-private-cluster submodule and creating a regional cluster in us-central1, which is the only region where there are four zones. I am not supplying a zone list as that kind of defeats the purpose of a regional cluster, so the random_shuffle is being invoked. Given that node_locations is optional, why is this random_shuffle necessary at all with a regional cluster? With the random_shuffle returning three results, it will return all zones in every region other than us-central1, unless I'm missing something? Couldn't this module simply not set node_locations for regional clusters unless node_locations is explicitly set? If there's a reason I'm missing as to why node_locations is necessary, couldn't it be set to data.google_compute_zones.available[0].names without the random shuffle?

I commented out node_locations in the submodule as a test and applied to my environment. It worked as I expected and I see all four zones of us-central1 listed in "Default node zones."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Scoped and ready for work
Projects
None yet
Development

No branches or pull requests

7 participants