Error scenarios with google_container_cluster result in broken state when creating a GKE cluster #3033

joestump · 2019-02-11T23:12:24Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

v0.11.11

Affected Resource(s)

google_container_cluster

Expected Behavior

Running terraform plan should be capable of progressively being ran when certain errors are returned by GCP's API despite actually creating the GKE cluster.

The provider should not d.SetId("") when recoverable errors are returned by GCP's API.

Actual Behavior

We've ran into an issue in us-centeral1 when creating region-wide GKE clusters. When we go to create GKE clusters we have sometimes gotten this error:

 * google_container_cluster.gke-cluster: Error waiting for creating GKE cluster: Try a different location, or try again later: Google Compute Engine does not have enough resources available to fulfill request: us-central1-a.

The problem is that any error returned results in d.SetId("") being called, which nukes the GKE cluster from Terraform's state. However, the region-wide GKE cluster still exists inside of GCP.

Because of this, running terraform plan again results in this error:

 * google_container_cluster.gke-cluster: googleapi: Error 409: Already exists: projects/gsf-mgmt-devmvp-ajna/locations/us-central1/clusters/ajna-devmvp-gke-cluster., alreadyExists

We've seen this issue crop up with both capacity errors as well as certain network connectivity errors on create leaving us in a corrupt state.

Also, is there a specific reason the Google provider doesn't use TF's built-in retry helpers?

The text was updated successfully, but these errors were encountered:

joestump · 2019-02-11T23:29:51Z

Also, it should be noted that terraform import fails on the region-wide cluster so we can't import and delete. Makes me think that on certain errors the resource should be tainted when this purgatory error state arises.

rileykarson · 2019-02-11T23:32:14Z

I've ran into this before as well- we probably want to send a delete to the API in these unrecoverable error scenarios.

dgonzalezruiz · 2019-02-12T09:59:54Z

Just ran into this.

ghost · 2019-03-17T13:51:09Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

ghost added the bug label Feb 11, 2019

emilymye mentioned this issue Feb 13, 2019

Clean up failed cluster creation GoogleCloudPlatform/magic-modules#1381

Merged

modular-magician closed this as completed in GoogleCloudPlatform/magic-modules#1381 Feb 14, 2019

ghost locked and limited conversation to collaborators Mar 17, 2019

github-actions bot added service/container forward/review In review; remove label to forward labels Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error scenarios with google_container_cluster result in broken state when creating a GKE cluster #3033

Error scenarios with google_container_cluster result in broken state when creating a GKE cluster #3033

joestump commented Feb 11, 2019

joestump commented Feb 11, 2019

rileykarson commented Feb 11, 2019

dgonzalezruiz commented Feb 12, 2019

ghost commented Mar 17, 2019

Error scenarios with google_container_cluster result in broken state when creating a GKE cluster #3033

Error scenarios with google_container_cluster result in broken state when creating a GKE cluster #3033

Comments

joestump commented Feb 11, 2019

Community Note

Terraform Version

Affected Resource(s)

Expected Behavior

Actual Behavior

joestump commented Feb 11, 2019

rileykarson commented Feb 11, 2019

dgonzalezruiz commented Feb 12, 2019

ghost commented Mar 17, 2019