Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-pool autoscaling clause causes Terraform to timeout when creating a zonal GKE cluster #2061

Closed
huang-jy opened this issue Sep 15, 2018 · 2 comments
Labels

Comments

@huang-jy
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

0.11.8

Affected Resource(s)

  • google_container_node_pool
  • google_container_cluster

Terraform Configuration Files

data "google_container_engine_versions" "europe-west2-c" {
  zone = "${var.zone}"
}

variable tags {
  default = ["kubernetes"]
}

variable cluster-name {
  default = "kubernetes-test"
}

variable region {
  default = "europe-west2"
}

variable zone {
  default = "europe-west2-c"
}

variable network {
  default = "{gcp-network}"
}

variable subnetwork {
  default = "{gcp-subnetwork}"
}

provider "google" {
  credentials = "${file("credentials-file.json")}"
  project     = "gcp-project"
  version     = "1.17.1"

  #region = "${var.region}"

  zone = "${var.zone}"
}

## Node pools here

resource "google_container_node_pool" "node-pool" {
  name = "node-pool"

  cluster = "${google_container_cluster.primary.name}"

  zone       = "${var.zone}"
  node_count = 1

  # autoscaling {
  #   min_node_count = 1
  #   max_node_count = 10
  # }

  management {
    auto_repair  = true
    auto_upgrade = true
  }
  node_config {
    oauth_scopes = [
      "https://www.googleapis.com/auth/compute",
      "https://www.googleapis.com/auth/devstorage.read_write",
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
      "https://www.googleapis.com/auth/servicecontrol",
      "https://www.googleapis.com/auth/service.management.readonly",
      "https://www.googleapis.com/auth/cloud-platform",
    ]

    preemptible = false

    tags = ["${var.tags}"]

    machine_type = "n1-highmem-4"
  }
}

resource "google_container_cluster" "primary" {
  name = "${var.cluster-name}"

  zone = "${var.zone}"

  network    = "${var.network}"
  subnetwork = "${var.subnetwork}"

  lifecycle {
    ignore_changes = ["node_pool"]
  }

  node_pool {
    name = "default-pool"
  }

  remove_default_node_pool = true

  addons_config {
    horizontal_pod_autoscaling {
      disabled = false
    }

    http_load_balancing {
      disabled = false
    }

    kubernetes_dashboard {
      disabled = false
    }
  }

  maintenance_policy {
    daily_maintenance_window {
      start_time = "00:00"
    }
  }
}

Debug Output

(Available if required)

Panic Output

N/A

Expected Behavior

Cluster should be created with the configuration specified

Actual Behavior

Terraform timed out after about 13 minutes of creating node pool

Steps to Reproduce

  1. Place the above file in a directory
  2. terraform init
  3. terraform plan -out tfplan
  4. terrraform apply tfplan
  5. The cluster should be created successfully
  6. Uncomment the autoscaling clause
  # autoscaling {
  #   min_node_count = 1
  #   max_node_count = 10
  # }
  1. terraform destroy
  2. terraform plan -out tfplan
  3. terrraform apply tfplan
  4. Terraform will create the cluster, then the node pool, but timeout on the node pool after around 12-13 minutes
google_container_node_pool.node-pool: Still creating... (11m20s elapsed)
google_container_node_pool.node-pool: Still creating... (11m30s elapsed)
google_container_node_pool.node-pool: Still creating... (11m40s elapsed)
google_container_node_pool.node-pool: Still creating... (11m50s elapsed)
google_container_node_pool.node-pool: Still creating... (12m0s elapsed)

Error: Error applying plan:

1 error(s) occurred:

* google_container_node_pool.node-pool: 1 error(s) occurred:

* google_container_node_pool.node-pool: Error waiting for creating GKE NodePool: All cluster resources were brought up, but the cluster API is reporting that: component "kube-apiserver" from endpoint "gke-94f5f5ee1ceb4de83211-0e4c" is unhealthy
goroutine 111502849 [running]:
runtime/debug.Stack(0xc00d55858b, 0x3, 0x2dc1b7a)
	third_party/go/gc/src/runtime/debug/stack.go:24 +0xa7
google3/cloud/kubernetes/engine/common/errdesc.(*GKEErrorDescriptor).createErr(0x55277e0, 0xc000120e00)
	cloud/kubernetes/engine/common/error_desc.go:199 +0x26
google3/cloud/kubernetes/engine/common/errdesc.(*GKEErrorDescriptor).WithDetail(0x55277e0, 0x312a4a0, 0xc00edef6a0, 0xc00edef6a0, 0x3121ac0)
	cloud/kubernetes/engine/common/error_desc.go:166 +0x40
google3/cloud/kubernetes/engine/common/healthcheck.glob..func1.1(0x0, 0xc00f805e60)
	cloud/kubernetes/engine/common/healthcheck.go:141 +0x7bb
google3/cloud/kubernetes/engine/common/call.WithTimeout(0x318d620, 0xc014e7cb70, 0x77359400, 0x8bb2c97000, 0xc019f7dd08, 0xc014e7cb70, 0xc013a16880)
	cloud/kubernetes/engine/common/call.go:36 +0x153
google3/cloud/kubernetes/engine/common/healthcheck.glob..func1(0x318d620, 0xc014e7cb70, 0xc00e06bec0, 0xc002a9d880, 0xc00a394dd0, 0x8bb2c97000, 0x0, 0x0)
	cloud/kubernetes/engine/common/healthcheck.go:137 +0x33b
google3/cloud/kubernetes/engine/server/deploy/deploy.upgradeMasterAndVerify.func3(0xc0031b6e00, 0x318d560, 0xc00b3434c0, 0x7f1c80010940, 0xc00aba1680, 0xc011535680, 0x0, 0xc018173ef0, 0xc00e06bec0, 0xc002a9d880, ...)
	cloud/kubernetes/engine/server/deploy/update.go:969 +0x1b3
google3/cloud/kubernetes/engine/server/deploy/deploy.upgradeMasterAndVerify(0x318d560, 0xc00b3434c0, 0xc0031b6e00, 0x7f1c80010940, 0xc00aba1680, 0xc00e06bec0, 0xc011535680, 0x0, 0x1, 0x0, ...)
	cloud/kubernetes/engine/server/deploy/update.go:975 +0x13f
google3/cloud/kubernetes/engine/server/deploy/deploy.(*Deployer).recreateMasterReplicas.func2(0x0, 0x0)
	cloud/kubernetes/engine/server/deploy/update.go:546 +0x23c
google3/cloud/kubernetes/engine/common/errors.CollectFns.func1(0xc00c0d39e0, 0xc00bd592c0)
	cloud/kubernetes/engine/common/errors.go:162 +0x27
created by google3/cloud/kubernetes/engine/common/errors.CollectFns
	cloud/kubernetes/engine/common/errors.go:162 +0x82
.

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Important Factoids

References

@ghost ghost added the bug label Sep 15, 2018
@danawillow
Copy link
Contributor

Closing as duplicate of #2022

@ghost
Copy link

ghost commented Nov 16, 2018

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked and limited conversation to collaborators Nov 16, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants