node-pool autoscaling clause causes Terraform to timeout when creating a zonal GKE cluster #2061

huang-jy · 2018-09-15T09:04:52Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

0.11.8

Affected Resource(s)

google_container_node_pool
google_container_cluster

Terraform Configuration Files

data "google_container_engine_versions" "europe-west2-c" {
  zone = "${var.zone}"
}

variable tags {
  default = ["kubernetes"]
}

variable cluster-name {
  default = "kubernetes-test"
}

variable region {
  default = "europe-west2"
}

variable zone {
  default = "europe-west2-c"
}

variable network {
  default = "{gcp-network}"
}

variable subnetwork {
  default = "{gcp-subnetwork}"
}

provider "google" {
  credentials = "${file("credentials-file.json")}"
  project     = "gcp-project"
  version     = "1.17.1"

  #region = "${var.region}"

  zone = "${var.zone}"
}

## Node pools here

resource "google_container_node_pool" "node-pool" {
  name = "node-pool"

  cluster = "${google_container_cluster.primary.name}"

  zone       = "${var.zone}"
  node_count = 1

  # autoscaling {
  #   min_node_count = 1
  #   max_node_count = 10
  # }

  management {
    auto_repair  = true
    auto_upgrade = true
  }
  node_config {
    oauth_scopes = [
      "https://www.googleapis.com/auth/compute",
      "https://www.googleapis.com/auth/devstorage.read_write",
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
      "https://www.googleapis.com/auth/servicecontrol",
      "https://www.googleapis.com/auth/service.management.readonly",
      "https://www.googleapis.com/auth/cloud-platform",
    ]

    preemptible = false

    tags = ["${var.tags}"]

    machine_type = "n1-highmem-4"
  }
}

resource "google_container_cluster" "primary" {
  name = "${var.cluster-name}"

  zone = "${var.zone}"

  network    = "${var.network}"
  subnetwork = "${var.subnetwork}"

  lifecycle {
    ignore_changes = ["node_pool"]
  }

  node_pool {
    name = "default-pool"
  }

  remove_default_node_pool = true

  addons_config {
    horizontal_pod_autoscaling {
      disabled = false
    }

    http_load_balancing {
      disabled = false
    }

    kubernetes_dashboard {
      disabled = false
    }
  }

  maintenance_policy {
    daily_maintenance_window {
      start_time = "00:00"
    }
  }
}

Debug Output

(Available if required)

Panic Output

N/A

Expected Behavior

Cluster should be created with the configuration specified

Actual Behavior

Terraform timed out after about 13 minutes of creating node pool

Steps to Reproduce

Place the above file in a directory
terraform init
terraform plan -out tfplan
terrraform apply tfplan
The cluster should be created successfully
Uncomment the autoscaling clause

  # autoscaling {
  #   min_node_count = 1
  #   max_node_count = 10
  # }

terraform destroy
terraform plan -out tfplan
terrraform apply tfplan
Terraform will create the cluster, then the node pool, but timeout on the node pool after around 12-13 minutes

google_container_node_pool.node-pool: Still creating... (11m20s elapsed)
google_container_node_pool.node-pool: Still creating... (11m30s elapsed)
google_container_node_pool.node-pool: Still creating... (11m40s elapsed)
google_container_node_pool.node-pool: Still creating... (11m50s elapsed)
google_container_node_pool.node-pool: Still creating... (12m0s elapsed)

Error: Error applying plan:

1 error(s) occurred:

* google_container_node_pool.node-pool: 1 error(s) occurred:

* google_container_node_pool.node-pool: Error waiting for creating GKE NodePool: All cluster resources were brought up, but the cluster API is reporting that: component "kube-apiserver" from endpoint "gke-94f5f5ee1ceb4de83211-0e4c" is unhealthy
goroutine 111502849 [running]:
runtime/debug.Stack(0xc00d55858b, 0x3, 0x2dc1b7a)
	third_party/go/gc/src/runtime/debug/stack.go:24 +0xa7
google3/cloud/kubernetes/engine/common/errdesc.(*GKEErrorDescriptor).createErr(0x55277e0, 0xc000120e00)
	cloud/kubernetes/engine/common/error_desc.go:199 +0x26
google3/cloud/kubernetes/engine/common/errdesc.(*GKEErrorDescriptor).WithDetail(0x55277e0, 0x312a4a0, 0xc00edef6a0, 0xc00edef6a0, 0x3121ac0)
	cloud/kubernetes/engine/common/error_desc.go:166 +0x40
google3/cloud/kubernetes/engine/common/healthcheck.glob..func1.1(0x0, 0xc00f805e60)
	cloud/kubernetes/engine/common/healthcheck.go:141 +0x7bb
google3/cloud/kubernetes/engine/common/call.WithTimeout(0x318d620, 0xc014e7cb70, 0x77359400, 0x8bb2c97000, 0xc019f7dd08, 0xc014e7cb70, 0xc013a16880)
	cloud/kubernetes/engine/common/call.go:36 +0x153
google3/cloud/kubernetes/engine/common/healthcheck.glob..func1(0x318d620, 0xc014e7cb70, 0xc00e06bec0, 0xc002a9d880, 0xc00a394dd0, 0x8bb2c97000, 0x0, 0x0)
	cloud/kubernetes/engine/common/healthcheck.go:137 +0x33b
google3/cloud/kubernetes/engine/server/deploy/deploy.upgradeMasterAndVerify.func3(0xc0031b6e00, 0x318d560, 0xc00b3434c0, 0x7f1c80010940, 0xc00aba1680, 0xc011535680, 0x0, 0xc018173ef0, 0xc00e06bec0, 0xc002a9d880, ...)
	cloud/kubernetes/engine/server/deploy/update.go:969 +0x1b3
google3/cloud/kubernetes/engine/server/deploy/deploy.upgradeMasterAndVerify(0x318d560, 0xc00b3434c0, 0xc0031b6e00, 0x7f1c80010940, 0xc00aba1680, 0xc00e06bec0, 0xc011535680, 0x0, 0x1, 0x0, ...)
	cloud/kubernetes/engine/server/deploy/update.go:975 +0x13f
google3/cloud/kubernetes/engine/server/deploy/deploy.(*Deployer).recreateMasterReplicas.func2(0x0, 0x0)
	cloud/kubernetes/engine/server/deploy/update.go:546 +0x23c
google3/cloud/kubernetes/engine/common/errors.CollectFns.func1(0xc00c0d39e0, 0xc00bd592c0)
	cloud/kubernetes/engine/common/errors.go:162 +0x27
created by google3/cloud/kubernetes/engine/common/errors.CollectFns
	cloud/kubernetes/engine/common/errors.go:162 +0x82
.

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Important Factoids

References

The text was updated successfully, but these errors were encountered:

danawillow · 2018-09-18T20:47:30Z

Closing as duplicate of #2022

ghost · 2018-11-16T14:13:28Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

ghost added the bug label Sep 15, 2018

danawillow closed this as completed Sep 18, 2018

ghost locked and limited conversation to collaborators Nov 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node-pool autoscaling clause causes Terraform to timeout when creating a zonal GKE cluster #2061

node-pool autoscaling clause causes Terraform to timeout when creating a zonal GKE cluster #2061

huang-jy commented Sep 15, 2018

danawillow commented Sep 18, 2018

ghost commented Nov 16, 2018

node-pool autoscaling clause causes Terraform to timeout when creating a zonal GKE cluster #2061

node-pool autoscaling clause causes Terraform to timeout when creating a zonal GKE cluster #2061

Comments

huang-jy commented Sep 15, 2018

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

danawillow commented Sep 18, 2018

ghost commented Nov 16, 2018