Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container_cluster resources error on create but leave dangling resources #3875

Closed
chrisst opened this issue Jun 18, 2019 · 3 comments · Fixed by GoogleCloudPlatform/magic-modules#2030
Assignees
Labels
bug forward/review In review; remove label to forward service/container

Comments

@chrisst
Copy link
Contributor

chrisst commented Jun 18, 2019

Affected Resource(s)

  • google_container_cluster

Terraform Configuration Files

Problem

Several categories of Create failures will result in the cluster existing on in GCP but does not get persisted to state. This can include timeouts while waiting for the Create Operation or burning through retries due to quota issues. In these situations the cluster is often created but a subsequent terraform apply will attempt to create the cluster again usually resulting in a conflict because it already exists.

@chrisst chrisst self-assigned this Jun 18, 2019
@chrisst chrisst added the bug label Jun 18, 2019
@nimahak
Copy link

nimahak commented Jun 19, 2019

An example of such failed cluster:

$ gcloud beta container clusters describe xxx --region us-central1 --project xxx
...
status: ERROR
statusMessage: 'Try a different location, or try again later: Google Compute Engine
  does not have enough resources available to fulfill request: us-central1-b.'

A subsequent terraform apply or destroy is bound to fail since this cluster is not persisted in the state. Example from destroy which fails while trying to delete the network that was part of the config:

1 error occurred:                                                                                                                             
        * google_compute_subnetwork.default (destroy): 1 error occurred:                                                                      
        * google_compute_subnetwork.default: Error reading Subnetwork: googleapi: 
Error 400: The subnetwork resource 'projects/xxx/regions/us-central1/subnetworks/xxx' is already
being used by 'projects/xxx/zones/us-central1-a/instances/gke-xxx-sg7j', resourceInUseByAnotherResource   

@chrisst
Copy link
Contributor Author

chrisst commented Jul 9, 2019

I haven't been able to find any examples of our test suite that has left dangling cluster when there was a stockout. I have found a couple examples of stockouts failing to create the cluster but so far our cleanup logic has handled things correctly and removed the cluster.

Since the cleanup doesn't retry the delete call I suspect that what is happening is the call to cleanup the cluster fails at which point the cluster is still removed from state.

I've added handling for that condition but it's only speculative at this point. @nimahak if you see this again and are able to capture the debug log output it would help me confirm for sure.

@ghost
Copy link

ghost commented Aug 10, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked and limited conversation to collaborators Aug 10, 2019
@github-actions github-actions bot added service/container forward/review In review; remove label to forward labels Jan 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug forward/review In review; remove label to forward service/container
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants