Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait-for-cluster.sh infinite loop when multiple clusters meet the filter name #733

Closed
rwong2888 opened this issue Nov 2, 2020 · 2 comments · Fixed by #734
Closed

wait-for-cluster.sh infinite loop when multiple clusters meet the filter name #733

rwong2888 opened this issue Nov 2, 2020 · 2 comments · Fixed by #734

Comments

@rwong2888
Copy link
Contributor

rwong2888 commented Nov 2, 2020

terraform -v
Terraform v0.13.5

  • provider registry.terraform.io/hashicorp/external v2.0.0
  • provider registry.terraform.io/hashicorp/google v3.46.0
  • provider registry.terraform.io/hashicorp/kubernetes v1.13.3
  • provider registry.terraform.io/hashicorp/null v3.0.0
  • provider registry.terraform.io/hashicorp/random v3.0.0

gcloud -v
Google Cloud SDK 316.0.0
alpha 2020.10.23
beta 2020.10.23
bq 2.0.62
core 2020.10.23
gsutil 4.54
kubectl 1.15.11

environment
abc-xyz already exists and created successfully via terraform
abc is new cluster that is stuck in terraform apply

gcloud container clusters list --project=REDACTED --filter=name:"abc"
NAME                   LOCATION  MASTER_VERSION   MASTER_IP     MACHINE_TYPE   NODE_VERSION     NUM_NODES  STATUS
abc             us-east4  1.16.13-gke.401  REDACTED  n1-standard-2  1.16.13-gke.401  3          RUNNING
abc-xyz  us-east4  1.16.13-gke.401  REDACTED  n1-standard-2  1.16.13-gke.401  4          RUNNING

behaviour:

module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0]: Provisioning with 'local-exec'...
module.gke.module.gcloud_wait_for_cluster.null_resource.run_destroy_command[0]: Creation complete after 0s [id=4940799199719502166]
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0] (local-exec): Executing: ["/bin/sh" "-c" "PATH=/google-cloud-sdk/bin:$PATH\n.terraform/modules/gke/modules/private-cluster/scripts/wait-for-cluster.sh REDACTED REDACTED\n"]
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0] (local-exec): Waiting for cluster REDACTED in project REDACTED to reconcile...
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0] (local-exec): .
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0] (local-exec): .
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0]: Still creating... [10s elapsed]
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0] (local-exec): .
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0] (local-exec): .
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0]: Still creating... [20s elapsed]
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0] (local-exec): .
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0]: Still creating... [30s elapsed]
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0] (local-exec): .
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0] (local-exec): .
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0]: Still creating... [40s elapsed]
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0] (local-exec): .
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0] (local-exec): .
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0]: Still creating... [50s elapsed]
module.gke.module.gcloud_wait_for_cluster.null_resource.run_command[0] (local-exec): .
...

expected behaviour:
expected to return after cluster is in running status

replication:
In wait-for-cluster.sh

current_status=$(gcloud container clusters list --project="$PROJECT" --filter=name:"$CLUSTER_NAME" --format="value(status)")

current_status="RUNNING RUNNING" instead of "RUNNING"

In this case the filter is returning multiple clusters instead of the cluster exactly named 'abc'

workaround:
Manually deleting cluster, firewall rules, and remote state. Passing skip_provisioners=true as per documentation and 413 to skip RUNNING check and wait for cluster RUNNING manually

@bharathkkb
Copy link
Member

Thanks for reporting @rwong2888
Perhaps we can filter by endpoint which will be unique for a given cluster.

@bharathkkb
Copy link
Member

Looks like we have been using key:pattern as opposed to key=value. I believe an equality match on name AND location should prevent this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants