nvidia taint along custom taints in google_container_node_pool #7928

andre-lx · 2020-12-03T21:28:18Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

terraform -v

Terraform v0.13.5
+ provider registry.terraform.io/hashicorp/google v3.49.0
+ provider registry.terraform.io/hashicorp/google-beta v3.49.0

Affected Resource(s)

google_container_node_pool

Terraform Configuration Files

resource "google_container_node_pool" "gpu_pool_test" {
  ...

    taint = [
      {
        effect = "NO_SCHEDULE"
        key    = "nvidia.com/gpu"
        value  = "present"
      },
      {
        key    = "another_taint"
        value  = "true"
        effect = "NO_SCHEDULE"
      },
    ]

....
}

Debug Output

Right now, we have a lot of pools, and with our gpu pools we have our own taints, but we need to comment this taint in the first deploy:

{
  effect = "NO_SCHEDULE"
  key    = "nvidia.com/gpu"
  value  = "present"
}

Otherwise, terraform will output the error:

Error: error creating NodePool: googleapi: Error 400: Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

After the first deploy, we need to uncomment in the subsequent deploys (terraform apply), or terraform will replace the node_pool each time we run the apply command.

          ~ taint             = [ # forces replacement
                {
                    effect = "NO_SCHEDULE"
                    key    = "another_taint"
                    value  = "true"
                },
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "nvidia.com/gpu"
                  - value  = "present"
                },
            ]

Important Factoids

Authenticating as a service account instead of a user.

b/299312479

The text was updated successfully, but these errors were encountered:

edwardmedia · 2020-12-04T16:48:53Z

@andre-lx help me understand how it should work after you uncomment the block?

andre-lx · 2020-12-04T17:52:37Z

@andre-lx help me understand how it should work after you uncomment the block?

Hi @edwardmedia . Don't know if I understand correctly your question.

After uncomment the nvidia taint, everything works correctly in the updates.

The problem is with the first deploy using terraform apply, if the gpu pool have more than one taint.

I will provide a more extensive example:

First terraform apply:

gke-cluster.tf

resource "google_container_cluster" "gke_cluster" {
....
}

resource "google_container_node_pool" "gpu_pool" {
  name     = "gpu-pool"
  project  = project.id
  location = zone

  ...

  cluster            = google_container_cluster.gke_cluster.name

  ...

  node_config {
    machine_type = machine_type

    taint = [
      {
        key    = "my_own_taint"
        value  = "true"
        effect = "NO_SCHEDULE"
      },
    ]
  }

  ...

}

This configuration, will work, and the pool is correctly created.
If I want to use my own taint in a gpu pool, I need to create the pool without the gpu taint, or terraform will output the error:

Error: error creating NodePool: googleapi: Error 400: Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

The next terraform apply:

gke-cluster.tf

resource "google_container_cluster" "gke_cluster" {
....
}

resource "google_container_node_pool" "gpu_pool" {
  name     = "gpu-pool"
  project  = project.id
  location = zone

  ...

  cluster            = google_container_cluster.gke_cluster.name

  ...

  node_config {
    machine_type = machine_type

    taint = [
      {
        key    = "my_own_taint"
        value  = "true"
        effect = "NO_SCHEDULE"
      },
      {
       effect = "NO_SCHEDULE"
       key    = "nvidia.com/gpu"
       value  = "present"
      },
    ]
  }

  ...

}

If I don't insert the gpu taint together with our own taints like the previous file, terraform will "force replace" my pools all the time, since the taint is not available in the configurations file.

  # google_container_node_pool.gpu_pool must be replaced
-/+ resource "google_container_node_pool" "gpu_pool" {

        ......

          ~ taint             = [ # forces replacement
                {
                    effect = "NO_SCHEDULE"
                    key    = "another_taint"
                    value  = "true"
                },
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "nvidia.com/gpu"
                  - value  = "present"
                },
            ]

           .....

That's why, I need to comment in the first deploy, and uncomment in the subsequent deploys.

An image with the terraform plan output (with the taint commented):

edwardmedia · 2020-12-06T19:53:33Z

@andre-lx I have tested cases by providing either one of below taint or both taint together. All tests are fine with me in the first tf apply. Can't hit your error. By changing any taint afterward, it does show force replacement in the following tf apply, which is expected. I noticed the error more than one taint with key nvidia.com/gpu, Are you aware if the key is already in place? Do you provide any other settings in the config that might affect this?

Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

resource "google_container_node_pool" "gpu_pool_test" {
  ...

    taint = [
      {
        effect = "NO_SCHEDULE"
        key    = "nvidia.com/gpu"
        value  = "present"
      },
      {
        key    = "another_taint"
        value  = "true"
        effect = "NO_SCHEDULE"
      },
    ]

....
}

andre-lx · 2020-12-06T21:11:38Z

Hi @edwardmedia .

Thanks for the quick response.

Since the nvidia taint is the default for the gpu node pools created by gke itself (even if you create the node pools manually), the only configuration missing in my examples, that can actually affect this, is the guest_accelerator, as the following example:

  node_config {
    machine_type = ....

    taint = [
      {
        key    = "another_taint"
        value  = "true"
        effect = "NO_SCHEDULE"
      },
      {
        effect = "NO_SCHEDULE"
        key    = "nvidia.com/gpu"
        value  = "present"
      },
    ]

    guest_accelerator = [
      {
        count = 1
        type  = nvidia-tesla-k80
      },
    ]
  }

Thanks!

edwardmedia · 2020-12-07T01:06:31Z

@andre-lx below is the state from my first run. Did I miss anything? There are many incompatible configs but that seems beyond what the Terraform provider can control. If you see other cases, can you share your FULL terraform code so I can repro the issue? Another thing you may want to try is to see if you can create the pools using gcloud container ... command

resource "google_container_node_pool" "primary_preemptible_nodes" {
    cluster             = "issue7928-gke-cluster"
    id                  = "projects/myproject/locations/asia-east1-a/clusters/issue7928-gke-cluster/nodePools/issue7928-node-pool"
    initial_node_count  = 1
    instance_group_urls = [
        "https://www.googleapis.com/compute/v1/projects/myproject/zones/asia-east1-a/instanceGroupManagers/gke-issue7928-gke-cl-issue7928-node-p-8fea93f4-grp",
    ]
    location            = "asia-east1-a"
    name                = "issue7928-node-pool"
    node_count          = 1
    node_locations      = [
        "asia-east1-a",
    ]
    project             = "sunedward-1-autotest"
    version             = "1.16.15-gke.4300"
    management {
        auto_repair  = true
        auto_upgrade = true
    }
    node_config {
        disk_size_gb      = 100
        disk_type         = "pd-standard"
        guest_accelerator = [
            {
                count = 1
                type  = "nvidia-tesla-t4"
            },
        ]
        image_type        = "COS"
        labels            = {}
        local_ssd_count   = 0
        machine_type      = "n1-standard-1"
        metadata          = {
            "disable-legacy-endpoints" = "true"
        }
        oauth_scopes      = [
            "https://www.googleapis.com/auth/logging.write",
            "https://www.googleapis.com/auth/monitoring",
        ]
        preemptible       = true
        service_account   = "default"
        taint             = [
            {
                effect = "NO_SCHEDULE"
                key    = "nvidia.com/gpu"
                value  = "present"
            },
        ]
        shielded_instance_config {
            enable_integrity_monitoring = true
            enable_secure_boot          = false
        }
    }
    upgrade_settings {
        max_surge       = 1
        max_unavailable = 0
    }
}

andre-lx · 2020-12-07T11:57:38Z

Hi @edwardmedia.

You didn't miss anything. Bellow is my full config:

resource "google_container_cluster" "gke_cluster" {
  provider = google-beta
  name     = "my-cluster"
  project  = "my-project"
  location = "europe-west1-b"

  min_master_version = "1.16.15-gke.4300"
  network            = google_compute_network.vpc_gke_cluster.name
  subnetwork         = google_compute_subnetwork.subnet_gke_cluster.name
  networking_mode    = "VPC_NATIVE"

  remove_default_node_pool = true
  initial_node_count       = 1

  logging_service    = "logging.googleapis.com/kubernetes"
  monitoring_service = "monitoring.googleapis.com/kubernetes"

  ip_allocation_policy {
    cluster_ipv4_cidr_block  = "/20"
    services_ipv4_cidr_block = "/20"
  }

  resource_labels = {
    "application" = "my_platform"
  }

  master_auth {

    username = ""
    password = ""

    client_certificate_config {
      issue_client_certificate = false
    }
  }
}

resource "google_container_node_pool" "primary_preemptible_nodes" {
    cluster             = google_container_cluster.gke_cluster.name
    initial_node_count  = 1

    location            = "europe-west1-b"
    name                = "issue7928-node-pool"

    project             = "my-project"
    version             = "1.16.15-gke.4300"
    management {
        auto_repair  = true
        auto_upgrade = true
    }
    node_config {
        disk_size_gb      = 100
        disk_type         = "pd-standard"
        guest_accelerator = [
            {
                count = 1
                type  = "nvidia-tesla-k80"
            },
        ]
        image_type        = "COS"
        labels            = {}
        local_ssd_count   = 0
        machine_type      = "n1-standard-1"
        metadata          = {
            "disable-legacy-endpoints" = "true"
        }
        oauth_scopes      = [
            "https://www.googleapis.com/auth/logging.write",
            "https://www.googleapis.com/auth/monitoring",
        ]
        preemptible       = true
        service_account   = "default"
        taint             = [
            {
                effect = "NO_SCHEDULE"
                key    = "nvidia.com/gpu"
                value  = "present"
            },
        ]
        shielded_instance_config {
            enable_integrity_monitoring = true
            enable_secure_boot          = false
        }
    }
    upgrade_settings {
        max_surge       = 1
        max_unavailable = 0
    }
}

I just copy and paste your google_container_node_pool in my files and run tf apply. The follow error occured:

Error: error creating NodePool: googleapi: Error 400: Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

The full tf apply output:

Terraform will perform the following actions:

  # google_container_node_pool.primary_preemptible_nodes will be created
  + resource "google_container_node_pool" "primary_preemptible_nodes" {
      + cluster             = "my-cluster"
      + id                  = (known after apply)
      + initial_node_count  = 1
      + instance_group_urls = (known after apply)
      + location            = "europe-west1-b"
      + max_pods_per_node   = (known after apply)
      + name                = "issue7928-node-pool"
      + name_prefix         = (known after apply)
      + node_count          = (known after apply)
      + node_locations      = (known after apply)
      + project             = "my-project"
      + version             = "1.16.15-gke.4300"

      + management {
          + auto_repair  = true
          + auto_upgrade = true
        }

      + node_config {
          + disk_size_gb      = 100
          + disk_type         = "pd-standard"
          + guest_accelerator = [
              + {
                  + count = 1
                  + type  = "nvidia-tesla-k80"
                },
            ]
          + image_type        = "COS"
          + labels            = (known after apply)
          + local_ssd_count   = 0
          + machine_type      = "n1-standard-1"
          + metadata          = {
              + "disable-legacy-endpoints" = "true"
            }
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/logging.write",
              + "https://www.googleapis.com/auth/monitoring",
            ]
          + preemptible       = true
          + service_account   = "default"
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "nvidia.com/gpu"
                  + value  = "present"
                },
            ]

          + shielded_instance_config {
              + enable_integrity_monitoring = true
              + enable_secure_boot          = false
            }

          + workload_metadata_config {
              + node_metadata = (known after apply)
            }
        }

      + upgrade_settings {
          + max_surge       = 1
          + max_unavailable = 0
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions in workspace "my-workspace"?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

google_container_node_pool.primary_preemptible_nodes: Creating...

Error: error creating NodePool: googleapi: Error 400: Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

Creating the pool using the gcloud container command, with the same service account as terraform (also tested with my admin account using email):

gcloud container node-pools create issue7928-node-pool --accelerator type=nvidia-tesla-t4,count=1 --cluster my-cluster --machine-type n1-standard-1 --zone europe-west1-b --node-taints nvidia.com/gpu=present:NoSchedule

Output:

ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE.

This makes sense, since the nvidia taint is already added by default on gpu node pools by the gke itself.

On the terraform side, if you don't add this taint the gpu pool is created successfully. The problem, as I already described, on the updates, since terraform always show the "forces replacement".

It's important to refer, that, if you don't need to use custom taints (so, without specifying the taint block in the config file), the creation and updates works fine at the moment, and the nvidia taint is added by terraform to the state file, as showing bellow.

First tf apply:

Terraform will perform the following actions:

  # google_container_node_pool.primary_preemptible_nodes will be created
  + resource "google_container_node_pool" "primary_preemptible_nodes" {
      + cluster             = "my-cluster"
      + id                  = (known after apply)
      + initial_node_count  = 1
      + instance_group_urls = (known after apply)
      + location            = "europe-west1-b"
      + max_pods_per_node   = (known after apply)
      + name                = "issue7928-node-pool"
      + name_prefix         = (known after apply)
      + node_count          = (known after apply)
      + node_locations      = (known after apply)
      + project             = "my-project"
      + version             = "1.16.15-gke.4300"

      + management {
          + auto_repair  = true
          + auto_upgrade = true
        }

      + node_config {
          + disk_size_gb      = 100
          + disk_type         = "pd-standard"
          + guest_accelerator = [
              + {
                  + count = 1
                  + type  = "nvidia-tesla-k80"
                },
            ]
          + image_type        = "COS"
          + labels            = (known after apply)
          + local_ssd_count   = 0
          + machine_type      = "n1-standard-1"
          + metadata          = {
              + "disable-legacy-endpoints" = "true"
            }
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/logging.write",
              + "https://www.googleapis.com/auth/monitoring",
            ]
          + preemptible       = true
          + service_account   = "default"
          + taint             = (known after apply)

          + shielded_instance_config {
              + enable_integrity_monitoring = true
              + enable_secure_boot          = false
            }

          + workload_metadata_config {
              + node_metadata = (known after apply)
            }
        }

      + upgrade_settings {
          + max_surge       = 1
          + max_unavailable = 0
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions in workspace "my-workspace"?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

google_container_node_pool.primary_preemptible_nodes: Creating...
....
google_container_node_pool.primary_preemptible_nodes: Still creating... [1m20s elapsed]
google_container_node_pool.primary_preemptible_nodes: Creation complete after 1m24s [id=projects/my-project/locations/europe-west1-b/clusters/my-cluster/nodePools/issue7928-node-pool]

Subsequent tf apply (with the taint block comment or uncomment):

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Getting the terraform state show google_container_node_pool.primary_preemptible_nodes for a pool without the taint block, you see that the taint nvidia is added to the state file:

        taint             = [
            {
                effect = "NO_SCHEDULE"
                key    = "nvidia.com/gpu"
                value  = "present"
            },
        ]

In the next tf apply, terraform checks that the resource in gke it's equal to the state file, and no replacement is needed.

...

Getting the terraform state show google_container_node_pool.primary_preemptible_nodes3 for a pool with the taint block, but only with the custom taint, you can also see that the nvidia taint being add to the state file along with the custom one:

        taint             = [
            {
                effect = "NO_SCHEDULE"
                key    = "another_taint"
                value  = "true"
            },
            {
                effect = "NO_SCHEDULE"
                key    = "nvidia.com/gpu"
                value  = "present"
            },
        ]

So, it's really strange, that terraform thinks that the gpu pool needs a replacement:

          ~ taint             = [ # forces replacement
                {
                    effect = "NO_SCHEDULE"
                    key    = "another_taint"
                    value  = "true"
                },
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "nvidia.com/gpu"
                  - value  = "present"
                },
            ]

The question is, why terraform forces replacement of an array that is equal to the same resource in the state file using custom taints? Since with only the nividia taint, the taint array are successfully added to the state file, and in the subsequent tf apply they match perfectly, so no replacement is needed.

Thanks!

edwardmedia · 2020-12-07T16:26:45Z

@andre-lx forceReplacement on taint is by design. Can you explain why it should not trigger node pool recreation?
Do you still have questions regarding Found more than one taint with key nvidia.com/gpu...? I think running gcloud command... has explained why.

andre-lx · 2020-12-07T16:42:15Z

Hi @edwardmedia.

In short,
Since I can't create the node pool with the nvidia taint, since it's a default from gke, how can I prevent the pool recreation each time I run tf apply? How can I set custom taints at the same time as the nvidia taint? Right now, as I said, I need to comment the nvidia taint on pool creation, and uncomment the nivida taint in the subsequent apply to ensure that the pool is not recreated. After this two steps, I can run tf apply forever and the pool is never recreated.

Why the pool is recreated if the nvidia taint is the default by gke?

And, why the pool is not recreated if no custom taints are used (or better, if only the nvidia taint exists).

edwardmedia · 2020-12-07T17:29:46Z

@andre-lx I am not sure if I understand what you said correctly. In my tests, I have tried to put 1) both nvidia and a customer taint together 2) either one of taint in new node pools. All 3 cases were fine. No exceptions were received. I don't understand what you meant below.

Since I can't create the node pool with the nvidia taint, ...

Where do you see nvidia taint is the default by gke? Can you share a document?

From the provider's perspective, any changes on taints will trigger pool recreation because I don't see GCP API provides a way you can use to update taints directly. Instead, if you run kubectl, you can update the taints, but that is not what Terraform can manage. Does this make sense to you?

andre-lx · 2020-12-07T18:19:39Z

@edwardmedia the nvidia taint is created by default on gpu node pools as you can see here:
https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#create

That't why (I think), I can't add the taint in node pools at creation time, as I explained in the other comments, and that's why terraform and gcloud give me the error:

Error: error creating NodePool: googleapi: Error 400: Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

Because of this I don't understand how did you manage to create the gpu pool with the nvidia taint specified.

I understand, that if you change the taints in both google platform, or via terraform, the terraform will recreate the pool, that's make a lot of sense and I was not expecting another way (since the state file is different from the resource itself). The problem here, is that, using customer specific taints, I can't create the pool with the nvidia taint, and I can't tf apply an unchanged pool without specifiying the nvidia taint after creation.

And that's why, I need to comment the nvidia taint on creation (since this is added by gke itself), and uncomment the nvidia taint in the subsequent tf apply.

I will put this to kind of examples, maybe makes it easy:

1 - No taints in config file:
1.1 - I create the pool with no taints (taint = [])
1.2 - The pool is created successfully, and the nvidia taint is added to the state file (again, since this is created automatically by gke)
1.3 - All the future tf apply will work perfectly, since the taint is in the state file, as well in the gke.

2 - with both nvidia and costumer specific taint:
2.1 - I try to create the pool, but the pool can't be crated because of the error:

Error: error creating NodePool: googleapi: Error 400: Found more than one taint with key nvidia.com/gpu and effect NO_SCHEDULE., badRequest

2.2 - Solution: create the pool has above (example 3)

3 - Only with one costumer specific taint in the config:
3.1 - I create with a taint like this:

        taint             = [
            {
                effect = "NO_SCHEDULE"
                key    = "another_taint"
                value  = "true"
            },
       ]

3.2 - The pool is created successfully, and the costumer specific taint as well the nvidia taint is added to the state file (again, since this is created automatically by gke)
3.3 - All the future tf apply, will ask for pool replacement. Why? That's is the part that don't make sense, the state file includes the nvidia taint as well the costumer specific created in step 3.1.
3.4 - Solution: add the nvidia taint to the taint block:

        taint             = [
            {
                effect = "NO_SCHEDULE"
                key    = "another_taint"
                value  = "true"
            },
            {
                effect = "NO_SCHEDULE"
                key    = "nvidia.com/gpu"
                value  = "present"
            },
        ]

3.5 - All the future tf applywill work perfectly.

edwardmedia · 2020-12-07T18:30:23Z

@andre-lx I see. Thanks for the link. In my tests, all node pools were added to a new cluster, which is different from adding pools to an existing cluster. That explains why it works for mine and it not for yours

When you add a GPU node pool to an existing cluster that already runs a non-GPU node pool, GKE automatically taints

All behaviors you have experienced appear to be controlled by gke/kubenetes. I don't think the provider has much space to do. I am glad you have found a workaround

edwardmedia · 2020-12-07T18:40:42Z

@andre-lx closing this issue then. Feel free to reopen if you see there is something the provider can help. Thank you

andre-lx · 2021-01-15T18:05:39Z

Hi. Some update from my part. As an workaround:

From the docs:

taint - (Optional) A list of Kubernetes taints to apply to nodes. GKE's API can only set this field on cluster creation. However, GKE will add taints to your nodes if you enable certain features such as GPUs. If this field is set, any diffs on this field will cause Terraform to recreate the underlying resource. Taint values can be updated safely in Kubernetes (eg. through kubectl), and it's recommended that you do not use this field to manage taints. If you do, lifecycle.ignore_changes is recommended. Structure is documented below.

So you can set only one taint (without the nvidia taint), and ignore the changes with the lifecycleblock:

resource "google_container_node_pool" "primary_preemptible_nodes" {
  node_config {
    machine_type = ....

    taint = [
      {
        key    = "another_taint"
        value  = "true"
        effect = "NO_SCHEDULE"
      },
    ]

    guest_accelerator = [
      {
        count = 1
        type  = nvidia-tesla-k80
      },
    ]
  }

  lifecycle {
    ignore_changes = [
      node_config[0].taint,
    ]
  }
}

With this, you are able to create and update, without losing the nvidia taint. The only problem I found, is if you need to update the taints in your terraform receipt. This changes are algo ignored.

AndreaGiardini · 2021-01-19T06:50:04Z

Just adding my voice here as well. This is a problem and it's very annoying since every time terraform tries to re-create the nodepool since the taints do not match.

The workarounds are:

Ignore all the taints altogether (like suggested above)
Create the nodepools without the nvidia taint and add them after the first run

More people are discussing the problem here: terraform-google-modules/terraform-google-kubernetes-engine#703

nader-bitstrapped-com · 2021-07-01T07:57:55Z

@andre-lx ignoring taint changes with in the life_cycle block is a great workaround. Much better than commenting/uncommenting. By the way, I did the same with more than one taint and it works:

resource "google_container_node_pool" "kubeflow_primary_gpu" {
    # ...

  node_config {
    # ...
    taint = [
      {
        key    = "preemptible"
        value  = "true"
        effect = "NO_EXECUTE"
      },
      {
        key    = "cloud.google.com/gke-preemptible"
        value  = "true"
        effect = "NO_SCHEDULE"
      },
    ]
  }

  lifecycle {
    ignore_changes = [
      node_config[0].taint,
    ]
  }
}

rileykarson · 2023-03-13T16:35:09Z

Taints are likely to get fixed in a future major release. The current model for them has proven difficult enough to work with that I don't think we can fix it by adding behaviours in a backwards-compatible way.

rileykarson · 2023-09-22T19:53:35Z

Closed in GoogleCloudPlatform/magic-modules#9011

github-actions · 2023-10-23T02:01:48Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

ghost added the bug label Dec 3, 2020

edwardmedia self-assigned this Dec 4, 2020

edwardmedia added the waiting-response label Dec 4, 2020

ghost removed waiting-response labels Dec 4, 2020

andre-lx mentioned this issue Dec 4, 2020

accelerator_type=nvidia-tesla-t4 and remove_default_node_pool=true: GPU k8s taints sometimes missing terraform-google-modules/terraform-google-kubernetes-engine#703

Closed

edwardmedia added the waiting-response label Dec 6, 2020

ghost removed the waiting-response label Dec 6, 2020

edwardmedia added the waiting-response label Dec 7, 2020

ghost removed the waiting-response label Dec 7, 2020

andre-lx changed the title ~~nvidia taint in node_pool~~ nvidia taint along custom taints in google_container_node_pool Dec 7, 2020

edwardmedia added the waiting-response label Dec 7, 2020

ghost removed waiting-response labels Dec 7, 2020

edwardmedia added the waiting-response label Dec 7, 2020

ghost removed waiting-response labels Dec 7, 2020

rileykarson assigned slevenick and unassigned slevenick Dec 7, 2020

rileykarson added the size/s label Dec 14, 2020

rileykarson added this to the Goals milestone Dec 14, 2020

rileykarson added the service/container label Jul 22, 2022

jjk-g mentioned this issue Dec 28, 2022

Suppress diff from GKE default taints GoogleCloudPlatform/magic-modules#7057

Closed

5 tasks

This was referenced Mar 6, 2023

GKE node_pool taints should not force resource recreation #13872

Open

Terraform doesn't detect when a taint needs to removed from a node pool #13309

Closed

rileykarson added the breaking-change label Mar 13, 2023

rileykarson modified the milestones: Goals, Future Major Release Mar 13, 2023

c2thorn modified the milestones: 5.0.0 Under Consideration, 5.0.0 May 23, 2023

c2thorn assigned rileykarson Aug 7, 2023

modular-magician added the forward/linked label Sep 6, 2023

This was referenced Sep 19, 2023

Upgrade guide: Rework taint model in GKE GoogleCloudPlatform/magic-modules#9010

Merged

Breaking change: Rework taint model in GKE GoogleCloudPlatform/magic-modules#9011

Merged

This was referenced Sep 22, 2023

Breaking change: Rework taint model in GKE hashicorp/terraform-provider-google-beta#6351

Merged

Breaking change: Rework taint model in GKE #15959

Merged

rileykarson closed this as completed Sep 22, 2023

This was referenced Sep 22, 2023

Upgrade guide: Rework taint model in GKE hashicorp/terraform-provider-google-beta#6352

Merged

Upgrade guide: Rework taint model in GKE #15960

Merged

Upgrade guide: Rework taint model in GKE GoogleCloudPlatform/terraform-google-conversion#1486

Merged

github-actions bot locked as resolved and limited conversation to collaborators Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvidia taint along custom taints in google_container_node_pool #7928

nvidia taint along custom taints in google_container_node_pool #7928

andre-lx commented Dec 3, 2020 •

edited by modular-magician

Loading

edwardmedia commented Dec 4, 2020

andre-lx commented Dec 4, 2020 •

edited

Loading

edwardmedia commented Dec 6, 2020 •

edited

Loading

andre-lx commented Dec 6, 2020

edwardmedia commented Dec 7, 2020 •

edited

Loading

andre-lx commented Dec 7, 2020 •

edited

Loading

edwardmedia commented Dec 7, 2020 •

edited

Loading

andre-lx commented Dec 7, 2020 •

edited

Loading

edwardmedia commented Dec 7, 2020

andre-lx commented Dec 7, 2020

edwardmedia commented Dec 7, 2020 •

edited

Loading

edwardmedia commented Dec 7, 2020

andre-lx commented Jan 15, 2021

AndreaGiardini commented Jan 19, 2021

nader-bitstrapped-com commented Jul 1, 2021 •

edited

Loading

rileykarson commented Mar 13, 2023

rileykarson commented Sep 22, 2023

github-actions bot commented Oct 23, 2023

nvidia taint along custom taints in google_container_node_pool #7928

nvidia taint along custom taints in google_container_node_pool #7928

Comments

andre-lx commented Dec 3, 2020 • edited by modular-magician Loading

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Important Factoids

edwardmedia commented Dec 4, 2020

andre-lx commented Dec 4, 2020 • edited Loading

edwardmedia commented Dec 6, 2020 • edited Loading

andre-lx commented Dec 6, 2020

edwardmedia commented Dec 7, 2020 • edited Loading

andre-lx commented Dec 7, 2020 • edited Loading

edwardmedia commented Dec 7, 2020 • edited Loading

andre-lx commented Dec 7, 2020 • edited Loading

edwardmedia commented Dec 7, 2020

andre-lx commented Dec 7, 2020

edwardmedia commented Dec 7, 2020 • edited Loading

edwardmedia commented Dec 7, 2020

andre-lx commented Jan 15, 2021

AndreaGiardini commented Jan 19, 2021

nader-bitstrapped-com commented Jul 1, 2021 • edited Loading

rileykarson commented Mar 13, 2023

rileykarson commented Sep 22, 2023

github-actions bot commented Oct 23, 2023

andre-lx commented Dec 3, 2020 •

edited by modular-magician

Loading

andre-lx commented Dec 4, 2020 •

edited

Loading

edwardmedia commented Dec 6, 2020 •

edited

Loading

edwardmedia commented Dec 7, 2020 •

edited

Loading

andre-lx commented Dec 7, 2020 •

edited

Loading

edwardmedia commented Dec 7, 2020 •

edited

Loading

andre-lx commented Dec 7, 2020 •

edited

Loading

edwardmedia commented Dec 7, 2020 •

edited

Loading

nader-bitstrapped-com commented Jul 1, 2021 •

edited

Loading