Error: NodePool was created in the error state RUNNING_WITH_ERROR #10823

smmnazar · 2022-01-04T08:28:10Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

1.1.2

Affected Resource(s)

google_container_node_pool

Terraform Configuration Files

# GKE cluster
resource "google_container_cluster" "primary" {
  #count    = var.destroy_infra ? 1 : 0
  name     = var.clustername
  location = var.regionname
  remove_default_node_pool = var.remove_defaultnode
  initial_node_count       = var.initialnode_count

  network    = google_compute_network.vpc.name
  subnetwork = google_compute_subnetwork.subnet.name
}

# Separately Managed Node Pool
resource "google_container_node_pool" "primary_nodes" {
  #count    	 = var.destroy_infra ? 1 : 0
  name       = "${google_container_cluster.primary.name}-node"
  location   = var.regionname
  cluster    = google_container_cluster.primary.name
  node_count = var.gke_num_nodes

  node_config {
    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]

    labels = {
      env = var.project_id
    }

    # preemptible  = true
    machine_type = "g1-small"
    tags         = ["gke-node", "${var.clustername}"]
    metadata = {
      disable-legacy-endpoints = "true"
    }
  }

  lifecycle {
    ignore_changes = [
      initial_node_count
    ]
  }
}

Debug Output

Panic Output

Error: NodePool cp-ofs-poc-gke-cluster-node was created in the error state "RUNNING_WITH_ERROR"

Expected Behavior

Node has to be created and status should be running.

Actual Behavior

Node created with error state "Running_With_Error"

Steps to Reproduce

terraform apply

The text was updated successfully, but these errors were encountered:

bon77 · 2022-02-14T05:51:18Z

Not sure if it is related - I get a similar problem on asia-northeast1 and asia-northeast2, but asia-northeast3 the same code works fine.

rileykarson · 2022-02-15T22:55:54Z

If you can capture debug logs with export TF_LOG=DEBUG that would help! There are entirely valid reasons for a NP to be in an error state, so this may indicate a GCP problem and not a provider one (i.e. as @bon77 pointed out, regional differences).

There's probably a space to improve the error message at least, provided the API provides a useful one.

haggishunk · 2022-05-05T17:29:14Z

Can you check the node pool in the GCP console?

I recently ran into this issue and the console gave more information about the error-- IP exhaustion in the secondary ip range. I was using /24 blocks for the secondary ip range (set by another terraform module or by console) and the cluster was being created with the default 110 pods per node. This article helped me out quite a bit with understanding how this exhaustion can happen.

https://cloud.google.com/kubernetes-engine/docs/how-to/multi-pod-cidr

I changed default pods per node to a modest 16 and voila.

smmnazar · 2022-08-25T08:29:17Z

If you can capture debug logs with export TF_LOG=DEBUG that would help! There are entirely valid reasons for a NP to be in an error state, so this may indicate a GCP problem and not a provider one (i.e. as @bon77 pointed out, regional differences).

There's probably a space to improve the error message at least, provided the API provides a useful one.

This helped to resolve the issue. Thanks

github-actions · 2022-09-25T02:43:58Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

smmnazar added the bug label Jan 4, 2022

rileykarson added the waiting-response label Feb 15, 2022

github-actions bot removed the waiting-response label Feb 15, 2022

rileykarson added the service/container label Jul 22, 2022

smmnazar closed this as completed Aug 25, 2022

github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: NodePool was created in the error state RUNNING_WITH_ERROR #10823

Error: NodePool was created in the error state RUNNING_WITH_ERROR #10823

smmnazar commented Jan 4, 2022

bon77 commented Feb 14, 2022

rileykarson commented Feb 15, 2022

haggishunk commented May 5, 2022

smmnazar commented Aug 25, 2022

github-actions bot commented Sep 25, 2022

Error: NodePool was created in the error state RUNNING_WITH_ERROR #10823

Error: NodePool was created in the error state RUNNING_WITH_ERROR #10823

Comments

smmnazar commented Jan 4, 2022

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

bon77 commented Feb 14, 2022

rileykarson commented Feb 15, 2022

haggishunk commented May 5, 2022

smmnazar commented Aug 25, 2022

github-actions bot commented Sep 25, 2022