Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to delete the flux-system namespace #67

Closed
neeraj-a-22 opened this issue Dec 23, 2020 · 22 comments
Closed

unable to delete the flux-system namespace #67

neeraj-a-22 opened this issue Dec 23, 2020 · 22 comments

Comments

@neeraj-a-22
Copy link

we are using terraform to instantiate flux operator. These steps were tried

  • used terraform to create namespace "flux-system" using documentation in the example. flux_install is able to create all the requisite pods and CRD's . After creation terraform destroy was run to delete flux-install pods, as expected this also deleted the namespace for flux. this is our ressource defination
    resource "kubernetes_namespace" "flux_system" {
    metadata {
    name = var.flux-namespace
    }
    /*
    lifecycle {
    ignore_changes = [
    metadata[0].labels,
    ]
    }
    */

}

  • Added flux_sync to the mix. with terraform it created all the connections to git works fine. but when we run the terraform destory; every thig is deleted but its getting struck with namespace deletion --

module.eks-flux-controller.kubernetes_namespace.flux_system: Still destroying... [id=flux-system, 10s elapsed]
module.eks-flux-controller.kubernetes_namespace.flux_system[0]: Still destroying... [id=flux-system, 20s elapsed]
module.eks-flux-controller.kubernetes_namespace.flux_system[0]: Still destroying... [id=flux-system, 30s elapsed]

on checking the the EKS cluster via kubectl we found this -
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found -n flux-system
NAME READY STATUS AGE
kustomization.kustomize.toolkit.fluxcd.io/flux-system True Applied revision: main/2c0fdb975aefe40f26eb81bb73a649458c2e1c4e 83m

--
kubectl get ns
NAME STATUS AGE
default Active 3h3m
flux-system Terminating 84m
kube-node-lease Active 3h3m
kube-public Active 3h3m
kube-system Active 3h3m

& this CRD (kustomizations.kustomize.toolkit.fluxcd.io) exists -
kubectl get crd
NAME CREATED AT
eniconfigs.crd.k8s.amazonaws.com 2020-12-23T17:05:28Z
kustomizations.kustomize.toolkit.fluxcd.io 2020-12-23T18:43:54Z
securitygrouppolicies.vpcresources.k8s.aws 2020-12-23T17:05:32Z

please help with clean deletion. we tried force options but nothing seems to work except creation of EKS cluster and recreation. let me know if need any other screen shots or logs ..

@stefanprodan
Copy link
Member

You have to implement the uninstall logic from here https://github.com/fluxcd/flux2/blob/main/cmd/flux/uninstall.go

@neeraj-a-22
Copy link
Author

neeraj-a-22 commented Dec 25, 2020

Thx for replying ; I tried these and it was not able to delete the CRD's its gets stuck & the Flux-system namespace shows as terminating but never gets cleared . For your info i am trying this with AWS EKS
Screen Shot 2020-12-24 at 5 01 42 PM

flux --kubeconfig /tmp/kubeconfig_eks117-playbx-us-west-2 uninstall --dry-run --namespace=flux-system
► uninstalling custom resources
No resources found
No resources found
No resources found
kustomization.kustomize.toolkit.fluxcd.io "flux-system" deleted (server dry run)
► uninstalling components
No resources found
namespace "flux-system" deleted (server dry run)
✔ uninstall finished
~  flux --kubeconfig /tmp/kubeconfig_eks117-playbx-us-west-2 uninstall --resources --crds --namespace=flux-system
Are you sure you want to delete the flux-system namespace: y█
► uninstalling custom resources
No resources found
No resources found
No resources found
kustomization.kustomize.toolkit.fluxcd.io "flux-system" deleted

is there away to force uninstall this "kustomization.kustomize.toolkit.fluxcd.io" , i have already tried force commands it doesn't delete them. Just for you info the Github as no other customizations its just pure fluxv2 installation and destroy

@phillebaba
Copy link
Member

So this is a tricky issue as terraform and finalizers are do not work too well right now. The reason this is happening is because the controller that needs to remove the finalizer is deleted before the resource is. Which causes the blocking of deleting the namespace.

This is a general problem with Terraform and Kubernetes and does not really have a great solution right now. A quick not so nice solution is to manually remove the namespace from the Terraform state, as it will be removed from the cluster when you delete the cluster.

If I come up with a better solution I will link to it here.

@neeraj-a-22
Copy link
Author

Thx for the details. Currently i am using the exact same idea to remove. But it creates an issue if you want to recreate the flux namespace with the old name. To get a clean slate , i have to recreate the entire EKS cluster .

Will it help if I use pure flux command to create and destroy ?

@rogermakram
Copy link

I'm facing the same issue as well. I haven't tested this yet, but another not so nice solution would be to use a null_provider with a local-exec to create the flux-system namespace. This would mean that you need to have the kubectl binary accessible from the Terraform module root, but it allows you to have a single execution and avoids manupulating the state.

@abstrask
Copy link

I've developed this gritty workaround, that seems to work reliably well for us. Flux CRDs, any workload managed by Flux and the namespace is removed gracefully on destroy it seems:

resource "null_resource" "flux_namespace" {
  triggers = {
    namespace  = local.namespace
    kubeconfig = var.kubeconfig_path # Variables cannot be accessed by destroy-phase provisioners, only the 'self' object (including triggers)
  }

  provisioner "local-exec" {
    command = "kubectl --kubeconfig ${self.triggers.kubeconfig} create namespace ${self.triggers.namespace}"
  }

  /*
  Marking the flux-system namespace for deletion, will cause finalizers to be applied for any Flux CRDs in use. The finalize controllers however have been deleted, causing namespace and CRDs to be stuck 'terminating'.

  After marking the namespace for deletion, wait an abitrary amount of time for cascade delete to remove workloads managed by Flux.

  Finally remove any finalizers from Flux CRDs, allowing these and the namespace to transition from 'terminating' and actually be deleted.
  */

  provisioner "local-exec" {
    when       = destroy
    command    = "kubectl --kubeconfig ${self.triggers.kubeconfig} delete namespace ${self.triggers.namespace} --cascade=true --wait=false && sleep 120"
  }

  provisioner "local-exec" {
    when       = destroy
    command    = "kubectl --kubeconfig ${self.triggers.kubeconfig} patch customresourcedefinition helmcharts.source.toolkit.fluxcd.io helmreleases.helm.toolkit.fluxcd.io helmrepositories.source.toolkit.fluxcd.io kustomizations.kustomize.toolkit.fluxcd.io -p '{\"metadata\":{\"finalizers\":null}}'"
    on_failure = continue
  }
}

@abstrask
Copy link

abstrask commented Jan 22, 2021

Well, after what feels like 100 trial-and-errors, unfortunately it appears the Flux-managed workloads do not reliably get removed. The namespace and CRDs do however, and Terraform doesn't fail, so we can still use the workaround for our automated QA.

Can anyone spot flaws in the workaround? I can't say I understand exactly what the references uninstall routine does.

@phillebaba
Copy link
Member

Do you mean that you are able to remove the namespace and Kustomization resources but the resources deployed by for example KC still exist?

@abstrask
Copy link

@phillebaba Exactly. Though in several previous tests today, it also removed these workloads (podinfo and FluentD specifically).

@phillebaba
Copy link
Member

So that becomes a tricky problem as we would need to introduce some ordering when deleting resources from the cluster. Garbage collection is done by the individual controllers, so if you delete KC or HC before a Kustomization or HelmRelease is deleted the related resources will not be garbage collected. This means that the flux-system Kustomization and GitRepository has to be deleted first and then we would have to wait for the controllers to clean up everything that derives from here. This might be an impossible problem to solve, but I can understand that one would expect everything to be cleaned up when the resources are removed.

@abstrask
Copy link

abstrask commented Jan 22, 2021

Sorry, I misunderstood. All the Flux controllers are consistently removed. As are the CRDs.

As far as I can tell, only the workloads deployed via Flux, is not (consistently) deleted upon removal.

But, I could trigger a cleanup of the workload resources, by cascade deleting the flux-system namespace several times previously today. But now they seem to get left behind 🤷 .

@phillebaba
Copy link
Member

This will for the time being be a limitation that we will have to live with until Terraform can become more intelligent in the deletion order of resources. I can't really see how we would solve this otherwise. If you come up with a solution I would love to hear from you.

@tachang
Copy link

tachang commented Mar 29, 2021

I'm running into this as well. Is this something useful? https://medium.com/@craignewtondev/how-to-fix-kubernetes-namespace-deleting-stuck-in-terminating-state-5ed75792647e

@abstrask
Copy link

@tachang That's pretty much what this bit, from my code above, does. This is clearing finalizers for the CRDs, but could be modified to apply to namespace if needed.

  provisioner "local-exec" {
    when       = destroy
    command    = "kubectl --kubeconfig ${self.triggers.kubeconfig} patch customresourcedefinition helmcharts.source.toolkit.fluxcd.io helmreleases.helm.toolkit.fluxcd.io helmrepositories.source.toolkit.fluxcd.io kustomizations.kustomize.toolkit.fluxcd.io -p '{\"metadata\":{\"finalizers\":null}}'"
    on_failure = continue
  }

@tachang
Copy link

tachang commented Mar 31, 2021

The whole patching it out seems correct as it cleared it for me but local-exec seems like a hack. I don't exactly have kubectl available I just have the provisioners. Is there a way to do it without using local-exec?

@phillebaba
Copy link
Member

Sadly no, the issue derives from respecting finalizers and resources created by the controllers. To do this we would need to build dependency logic that currently just is not possible with Terraform. My solution when deleting cluster with Terraform has always been to just remove all Kubernetes resources from state, as it is generally unreliable otherwise.

@cmd-werner-diers
Copy link

Facing the same issue

1 similar comment
@nuriel77
Copy link

Facing the same issue

@bartekus
Copy link

The article posted by @tachang (thank you good sir!) outlines the procedure to alleviate this issue, however I did not test it on terraform created flux but rather the one which was setup using the https://github.com/fluxcd/flux2-kustomize-helm-example repo.
This is how I applied it to remove flux-system stuck in terminating state.

  • Dump the descriptor as JSON to a file
kubectl get namespace flux-system -o json > flux-system.json
  • Edit flux-system.json and remove kubernetes from the finalizers array

From this:

...
"spec": {
        "finalizers": [
            "kubernetes"
        ]
    },
...

To this:

...
"spec": {
        "finalizers": []
    },
...
  • Executing our cleanup command
kubectl replace --raw "/api/v1/namespaces/flux-system/finalize" -f ./flux-system.json

I hope this help anyone else stuck in this position!
Happy Coding!

@ChrisJBurns
Copy link
Contributor

For those who are using the flux provider with Terraform, is it agreed at this point that there is no way of having an apply and destroy that is idempotent each time in the sense that it can create the cluster with flux installed and then when needed, it can all be destroyed in good order. Am I correct in saying that if any destroy is done, there will have to be workarounds to clean up the required resources?

@theartusz
Copy link

@tachang That's pretty much what this bit, from my code above, does. This is clearing finalizers for the CRDs, but could be modified to apply to namespace if needed.

  provisioner "local-exec" {
    when       = destroy
    command    = "kubectl --kubeconfig ${self.triggers.kubeconfig} patch customresourcedefinition helmcharts.source.toolkit.fluxcd.io helmreleases.helm.toolkit.fluxcd.io helmrepositories.source.toolkit.fluxcd.io kustomizations.kustomize.toolkit.fluxcd.io -p '{\"metadata\":{\"finalizers\":null}}'"
    on_failure = continue
  }

I also added gitrepositories.source.toolkit.fluxcd.io to the list of crds in provisioner.
I observed that customresourcecleanup.apiextensions.k8s.io finalizer was sometime added to gitrepositories.source.toolkit.fluxcd.io crd which again blocked the namespace to be deleted.

The final result would than look like this:

provisioner "local-exec" {
    when       = destroy
    command    = "kubectl patch customresourcedefinition helmcharts.source.toolkit.fluxcd.io helmreleases.helm.toolkit.fluxcd.io helmrepositories.source.toolkit.fluxcd.io kustomizations.kustomize.toolkit.fluxcd.io gitrepositories.source.toolkit.fluxcd.io -p '{\"metadata\":{\"finalizers\":null}}'"
    on_failure = continue
  }

@Kevinwoolworth
Copy link

The article posted by @tachang (thank you good sir!) outlines the procedure to alleviate this issue, however I did not test it on terraform created flux but rather the one which was setup using the https://github.com/fluxcd/flux2-kustomize-helm-example repo. This is how I applied it to remove flux-system stuck in terminating state.

  • Dump the descriptor as JSON to a file
kubectl get namespace flux-system -o json > flux-system.json
  • Edit flux-system.json and remove kubernetes from the finalizers array

From this:

...
"spec": {
        "finalizers": [
            "kubernetes"
        ]
    },
...

To this:

...
"spec": {
        "finalizers": []
    },
...
  • Executing our cleanup command
kubectl replace --raw "/api/v1/namespaces/flux-system/finalize" -f ./flux-system.json

I hope this help anyone else stuck in this position! Happy Coding!

This one line command also works for me

kubectl get namespace flux-system -o json  | sed 's/\"kubernetes\"//'  | kubectl replace --raw /api/v1/namespaces/flux-system/finalize -f -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests