Terraform: Helm module should remove itself at destroy time before cluster deletion #1422

markmandel · 2020-03-19T20:42:39Z

Is your feature request related to a problem? Please describe.
This is particularly frustrating with GKE, not sure how it is with other providers.

When you delete GKR cluster, if there is a Service setup, the firewall rules and load balancers are left in place and aren't deleted. So you can sometimes hit quota limits and/or extra charges for LBs, external IPs, ets you aren't using.

Describe the solution you'd like

What I would suggest is that when a destroy event occurs to a cluster, the Terraform Helm module should do a helm delete --purge on the installed chart before the cluster is removed, to ensure this gets cleaned up.

See:
https://www.terraform.io/docs/provisioners/index.html#destroy-time-provisioners

For the hooks to implement this.

Describe alternatives you've considered

Writing a bash script to cleanup orphaned resources, but load balancers in GCP are a combo of various other things, and it gets complicated 😕

Additional context
https://github.com/pantheon-systems/kube-gce-cleanup
kubernetes/ingress-gce#136

The text was updated successfully, but these errors were encountered:

aLekSer · 2020-03-20T12:29:13Z

Relates to #1403 .

aLekSer · 2020-03-23T10:35:49Z

First of all the solution is already mentioned in this repo:
https://github.com/GoogleCloudPlatform/terraform-google-examples/tree/master/example-gke-k8s-helm#cleanup

Delete the nginx-ingress helm release first so that the forwarding rule and firewall rule are cleaned up by the GCE controller

I have added this into my PR #1375 and tested that now no leftover could be found in Firewall Rules and Forwarding Rule (Load balancer) tabs.

aLekSer · 2020-03-23T10:39:51Z

I have tried using provisioner :

resource "helm_release" "agones" {
  name         = "agones"
...
 provisioner "local-exec" {
   when = destroy
   command = "helm delete --purge agones"
 }
}

And it is doing helm delete but is failing that after delete we could not find the agones when after provisioner success, the actual helm_release resource should be destroyed:

module.helm_agones.helm_release.agones: Refreshing state... [id=agones]                                                                                                                         [385/1349]
module.gke_cluster.null_resource.test-setting-variables: Destroying... [id=2666900790194419227]                                                                                                          
module.gke_cluster.null_resource.test-setting-variables: Destruction complete after 0s                                                                                                                   
module.gke_cluster.google_compute_firewall.default: Destroying... [id=game-server-firewall-firewall-test-c]                                                                                              
module.helm_agones.helm_release.agones: Destroying... [id=agones]                                                                                                                                        
module.helm_agones.helm_release.agones: Provisioning with 'local-exec'...                                                                                                                                
module.helm_agones.helm_release.agones (local-exec): Executing: ["/bin/sh" "-c" "echo 'Destroy' && helm delete --purge agones"]                                                                          
module.helm_agones.helm_release.agones (local-exec): Destroy                                                                                                                                             
module.gke_cluster.google_compute_firewall.default: Destruction complete after 9s                                                                                                                        
module.helm_agones.helm_release.agones: Still destroying... [id=agones, 10s elapsed]                                                                                                                     
module.helm_agones.helm_release.agones (local-exec): release "agones" deleted                                                                                                                            
module.helm_agones.helm_release.agones: Still destroying... [id=agones, 20s elapsed]                                                                                                                     
                                                                                                                                                                                                         
Error: rpc error: code = Unknown desc = release: "agones" not found

markmandel · 2020-03-30T21:08:29Z

/cc @chrisst any thoughts on this?

Is there something erroneous with our helm approach?

markmandel · 2020-03-30T21:27:09Z

Weird that it fails when you do that work @aLekSer -- if I run a script to delete all Helm instances before destroying everything else it's fine.

A couple of theories:

The helm delete operation is async by default, so it's actually running, but it's not waiting for it before deleting the cluster?
As a thought, I wonder if you do a command = "helm delete agones" (no --purge) on destroy like above, it will still leave behind an empty helm release, which is then still available to be deleted by TF? Might be worth a shot.

chrisst · 2020-03-30T21:40:40Z

Unfortunately I'm not very experienced with mixing Helm and Terraform so my thoughts are more educated guesses at this point. I don't think using a local-exec provisioner is heading down the correct path. If a terraform resource, in this case the helm release, is removed or modified by an external process, local-exec, it is almost always going to be problematic for Terraform. Cleaning up after a resource should be handled by the resource's destroy call, so in this case the helm_release should be cleaning up the resources it created. Otherwise to me it's the equivalent of shelling out to gcloud to delete a GCP resource because the resource's destroy is buggy.
It does look like helm should be doing an uninstall on destroy so if there are dangling resources after that call happens I would think it's a bug with the cleanup of the helm_release.

I suspect it's failing because Terraform is trying to delete the helm release which has already been removed through the provisioner call. You can try looking at the debug logs for more information.

akremsa · 2020-04-07T12:55:18Z

I and @aLekSer tried following approaches:

Adding

  provisioner "local-exec" {
    when    = "destroy"
    command = "helm delete agones"
  }

to resource "helm_release" "agones"

Result:
There is a following error during destroy step:

module.helm_agones.helm_release.agones: Provisioning with 'local-exec'...
module.helm_agones.helm_release.agones (local-exec): Executing: ["/bin/sh" "-c" "helm delete agones"]
module.helm_agones.helm_release.agones (local-exec): Error: Get https://35.228.125.186/api/v1/namespaces/kube-system/pods?labelSelector=app%3Dhelm%2Cname%3Dtiller: error executing access token command "/opt/google-cloud-sdk/bin/g
cloud config config-helper --format=json": err=fork/exec /opt/google-cloud-sdk/bin/gcloud: no such file or directory output= stderr=
module.gke_cluster.google_compute_firewall.default: Destruction complete after 8s
Warning: Quoted keywords are deprecated
on ../../../install/terraform/modules/gke/cluster.tf line 134, in resource "google_container_cluster" "primary":
134: when = "destroy"

Adding

provisioner "local-exec" {
    when    = "destroy"
    command = "echo 'Destroy-time provisioner' && sleep 60 && echo 'done'"
  }

to resource "google_container_cluster" "primary" in install/terraform/modules/gke/cluster.tf

Result:
Everything is deleted except ingress

advissor · 2021-03-18T14:05:12Z

Hi,
I got similar issue for EKS cluster and randomly found this topic
For me , everything provisioned inside kubernetes cluster was attempted to destroyed after cluster was removed
This left ingresses & sgs & helm_releases yelling

So the fix was , prior to destroy run : terraform refresh

charliesmith-mindera · 2021-11-29T23:23:53Z

Hi,

I also found a similar issue where the helm resource deletion was not completing before the namespace was removed. In my case the only thing being deployed to the namespace was the helm chart that was handled by the helm_release resource so i added the below, which then meant it cleanly deleted, rather than creating the namespace separately.
resource "helm_release" "prometheus_install" {
name = "prometheus-for-amp"
repository = "https://prometheus-community.github.io/helm-charts"
chart = "prometheus"
create_namespace = true
namespace = var.prometheus_namespace
}

github-actions · 2023-06-01T10:00:38Z

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '

github-actions · 2023-07-15T02:30:10Z

This issue is marked as obsolete due to inactivity for last 60 days. To avoid issue getting closed in next 30 days, please add a comment or add 'awaiting-maintainer' label. Thank you for your contributions

github-actions · 2023-08-15T01:23:11Z

We are closing this as there was no activity in this issue for last 90 days. Please reopen if you’d like to discuss anything further.

markmandel added kind/feature New features for Agones area/operations Installation, updating, metrics etc labels Mar 19, 2020

markmandel mentioned this issue Mar 24, 2020

Terraform docs about terraform destroy #1425

Merged

aLekSer self-assigned this Aug 18, 2020

roberthbailey unassigned aLekSer Nov 7, 2022

github-actions bot added the stale Pending closure unless there is a strong objection. label Jun 1, 2023

github-actions bot added obsolete and removed stale Pending closure unless there is a strong objection. labels Jul 15, 2023

github-actions bot added the wontfix Sorry, but we're not going to do that. label Aug 15, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terraform: Helm module should remove itself at destroy time before cluster deletion #1422

Terraform: Helm module should remove itself at destroy time before cluster deletion #1422

markmandel commented Mar 19, 2020

aLekSer commented Mar 20, 2020

aLekSer commented Mar 23, 2020

aLekSer commented Mar 23, 2020

markmandel commented Mar 30, 2020

markmandel commented Mar 30, 2020 •

edited

Loading

chrisst commented Mar 30, 2020

akremsa commented Apr 7, 2020

advissor commented Mar 18, 2021

charliesmith-mindera commented Nov 29, 2021

github-actions bot commented Jun 1, 2023

github-actions bot commented Jul 15, 2023

github-actions bot commented Aug 15, 2023

Terraform: Helm module should remove itself at destroy time before cluster deletion #1422

Terraform: Helm module should remove itself at destroy time before cluster deletion #1422

Comments

markmandel commented Mar 19, 2020

aLekSer commented Mar 20, 2020

aLekSer commented Mar 23, 2020

aLekSer commented Mar 23, 2020

markmandel commented Mar 30, 2020

markmandel commented Mar 30, 2020 • edited Loading

chrisst commented Mar 30, 2020

akremsa commented Apr 7, 2020

advissor commented Mar 18, 2021

charliesmith-mindera commented Nov 29, 2021

github-actions bot commented Jun 1, 2023

github-actions bot commented Jul 15, 2023

github-actions bot commented Aug 15, 2023

markmandel commented Mar 30, 2020 •

edited

Loading