-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unable to delete the flux-system namespace #67
Comments
You have to implement the uninstall logic from here https://github.com/fluxcd/flux2/blob/main/cmd/flux/uninstall.go |
So this is a tricky issue as terraform and finalizers are do not work too well right now. The reason this is happening is because the controller that needs to remove the finalizer is deleted before the resource is. Which causes the blocking of deleting the namespace. This is a general problem with Terraform and Kubernetes and does not really have a great solution right now. A quick not so nice solution is to manually remove the namespace from the Terraform state, as it will be removed from the cluster when you delete the cluster. If I come up with a better solution I will link to it here. |
Thx for the details. Currently i am using the exact same idea to remove. But it creates an issue if you want to recreate the flux namespace with the old name. To get a clean slate , i have to recreate the entire EKS cluster . Will it help if I use pure flux command to create and destroy ? |
I'm facing the same issue as well. I haven't tested this yet, but another not so nice solution would be to use a null_provider with a local-exec to create the flux-system namespace. This would mean that you need to have the kubectl binary accessible from the Terraform module root, but it allows you to have a single execution and avoids manupulating the state. |
I've developed this gritty workaround, that seems to work reliably well for us. Flux CRDs, any workload managed by Flux and the namespace is removed gracefully on resource "null_resource" "flux_namespace" {
triggers = {
namespace = local.namespace
kubeconfig = var.kubeconfig_path # Variables cannot be accessed by destroy-phase provisioners, only the 'self' object (including triggers)
}
provisioner "local-exec" {
command = "kubectl --kubeconfig ${self.triggers.kubeconfig} create namespace ${self.triggers.namespace}"
}
/*
Marking the flux-system namespace for deletion, will cause finalizers to be applied for any Flux CRDs in use. The finalize controllers however have been deleted, causing namespace and CRDs to be stuck 'terminating'.
After marking the namespace for deletion, wait an abitrary amount of time for cascade delete to remove workloads managed by Flux.
Finally remove any finalizers from Flux CRDs, allowing these and the namespace to transition from 'terminating' and actually be deleted.
*/
provisioner "local-exec" {
when = destroy
command = "kubectl --kubeconfig ${self.triggers.kubeconfig} delete namespace ${self.triggers.namespace} --cascade=true --wait=false && sleep 120"
}
provisioner "local-exec" {
when = destroy
command = "kubectl --kubeconfig ${self.triggers.kubeconfig} patch customresourcedefinition helmcharts.source.toolkit.fluxcd.io helmreleases.helm.toolkit.fluxcd.io helmrepositories.source.toolkit.fluxcd.io kustomizations.kustomize.toolkit.fluxcd.io -p '{\"metadata\":{\"finalizers\":null}}'"
on_failure = continue
}
} |
Well, after what feels like 100 trial-and-errors, unfortunately it appears the Flux-managed workloads do not reliably get removed. The namespace and CRDs do however, and Terraform doesn't fail, so we can still use the workaround for our automated QA. Can anyone spot flaws in the workaround? I can't say I understand exactly what the references uninstall routine does. |
Do you mean that you are able to remove the namespace and Kustomization resources but the resources deployed by for example KC still exist? |
@phillebaba Exactly. Though in several previous tests today, it also removed these workloads (podinfo and FluentD specifically). |
So that becomes a tricky problem as we would need to introduce some ordering when deleting resources from the cluster. Garbage collection is done by the individual controllers, so if you delete KC or HC before a Kustomization or HelmRelease is deleted the related resources will not be garbage collected. This means that the flux-system Kustomization and GitRepository has to be deleted first and then we would have to wait for the controllers to clean up everything that derives from here. This might be an impossible problem to solve, but I can understand that one would expect everything to be cleaned up when the resources are removed. |
Sorry, I misunderstood. All the Flux controllers are consistently removed. As are the CRDs. As far as I can tell, only the workloads deployed via Flux, is not (consistently) deleted upon removal. But, I could trigger a cleanup of the workload resources, by cascade deleting the |
This will for the time being be a limitation that we will have to live with until Terraform can become more intelligent in the deletion order of resources. I can't really see how we would solve this otherwise. If you come up with a solution I would love to hear from you. |
I'm running into this as well. Is this something useful? https://medium.com/@craignewtondev/how-to-fix-kubernetes-namespace-deleting-stuck-in-terminating-state-5ed75792647e |
@tachang That's pretty much what this bit, from my code above, does. This is clearing finalizers for the CRDs, but could be modified to apply to namespace if needed. provisioner "local-exec" {
when = destroy
command = "kubectl --kubeconfig ${self.triggers.kubeconfig} patch customresourcedefinition helmcharts.source.toolkit.fluxcd.io helmreleases.helm.toolkit.fluxcd.io helmrepositories.source.toolkit.fluxcd.io kustomizations.kustomize.toolkit.fluxcd.io -p '{\"metadata\":{\"finalizers\":null}}'"
on_failure = continue
} |
The whole patching it out seems correct as it cleared it for me but local-exec seems like a hack. I don't exactly have kubectl available I just have the provisioners. Is there a way to do it without using local-exec? |
Sadly no, the issue derives from respecting finalizers and resources created by the controllers. To do this we would need to build dependency logic that currently just is not possible with Terraform. My solution when deleting cluster with Terraform has always been to just remove all Kubernetes resources from state, as it is generally unreliable otherwise. |
Facing the same issue |
1 similar comment
Facing the same issue |
The article posted by @tachang (thank you good sir!) outlines the procedure to alleviate this issue, however I did not test it on terraform created flux but rather the one which was setup using the
I hope this help anyone else stuck in this position! |
For those who are using the flux provider with Terraform, is it agreed at this point that there is no way of having an |
I also added The final result would than look like this: provisioner "local-exec" {
when = destroy
command = "kubectl patch customresourcedefinition helmcharts.source.toolkit.fluxcd.io helmreleases.helm.toolkit.fluxcd.io helmrepositories.source.toolkit.fluxcd.io kustomizations.kustomize.toolkit.fluxcd.io gitrepositories.source.toolkit.fluxcd.io -p '{\"metadata\":{\"finalizers\":null}}'"
on_failure = continue
} |
This one line command also works for me
|
we are using terraform to instantiate flux operator. These steps were tried
resource "kubernetes_namespace" "flux_system" {
metadata {
name = var.flux-namespace
}
/*
lifecycle {
ignore_changes = [
metadata[0].labels,
]
}
*/
}
module.eks-flux-controller.kubernetes_namespace.flux_system: Still destroying... [id=flux-system, 10s elapsed]
module.eks-flux-controller.kubernetes_namespace.flux_system[0]: Still destroying... [id=flux-system, 20s elapsed]
module.eks-flux-controller.kubernetes_namespace.flux_system[0]: Still destroying... [id=flux-system, 30s elapsed]
on checking the the EKS cluster via kubectl we found this -
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found -n flux-system
NAME READY STATUS AGE
kustomization.kustomize.toolkit.fluxcd.io/flux-system True Applied revision: main/2c0fdb975aefe40f26eb81bb73a649458c2e1c4e 83m
--
kubectl get ns
NAME STATUS AGE
default Active 3h3m
flux-system Terminating 84m
kube-node-lease Active 3h3m
kube-public Active 3h3m
kube-system Active 3h3m
& this CRD (kustomizations.kustomize.toolkit.fluxcd.io) exists -
kubectl get crd
NAME CREATED AT
eniconfigs.crd.k8s.amazonaws.com 2020-12-23T17:05:28Z
kustomizations.kustomize.toolkit.fluxcd.io 2020-12-23T18:43:54Z
securitygrouppolicies.vpcresources.k8s.aws 2020-12-23T17:05:32Z
please help with clean deletion. we tried force options but nothing seems to work except creation of EKS cluster and recreation. let me know if need any other screen shots or logs ..
The text was updated successfully, but these errors were encountered: