Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I think NEG finalizers are making my namespaces take 10+ mins to delete #1720

Closed
red8888 opened this issue May 25, 2022 · 13 comments
Closed
Assignees
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@red8888
Copy link

red8888 commented May 25, 2022

It always takes a very long time to delete namespaces.

First I delete all resources in a namespace and confirm its empty:

kubectl get all -n derps
No resources found in derps namespace.

Then I try to delete the namespace:
kubectl delete namespace derps

This hangs for 10+ minutes, but eventually removes the namespace

While the namespace is stuck in the terminating phase I see this:

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    cnrm.cloud.google.com/project-id: derps
  creationTimestamp: "2022-05-24T19:43:34Z"
  deletionTimestamp: "2022-05-25T14:01:00Z"
  labels:
    kubernetes.io/metadata.name: derps
  name: derps
  resourceVersion: "235757709"
  uid: bd05b4d8-31da-4e76-9178-6dff37d52314
spec:
  finalizers:
  - kubernetes
status:
  conditions:
  - lastTransitionTime: "2022-05-25T14:01:07Z"
    message: All resources successfully discovered
    reason: ResourcesDiscovered
    status: "False"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2022-05-25T14:01:07Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2022-05-25T14:01:07Z"
    message: All content successfully deleted, may be waiting on finalization
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  - lastTransitionTime: "2022-05-25T14:01:07Z"
    message: 'Some resources are remaining: ingresses.extensions has 1 resource instances,
      ingresses.networking.k8s.io has 1 resource instances, servicenetworkendpointgroups.networking.gke.io
      has 1 resource instances'
    reason: SomeResourcesRemain
    status: "True"
    type: NamespaceContentRemaining
  - lastTransitionTime: "2022-05-25T14:01:07Z"
    message: 'Some content in the namespace has finalizers remaining: networking.gke.io/ingress-finalizer-V2
      in 2 resource instances, networking.gke.io/neg-finalizer in 1 resource instances'
    reason: SomeFinalizersRemain
    status: "True"
    type: NamespaceFinalizersRemaining
  phase: Terminating

It looks like the NEG finalizers are causing this. Is this normal? I see this on ALL deployments where I'm using GKE ingresses. Want to know if this is expected behavior because its quite clunky.

@kundan2707
Copy link
Contributor

/kind support

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Jun 2, 2022
@red8888
Copy link
Author

red8888 commented Jun 13, 2022

Just to follow up GCP support told me this is expected behavior and I can't do anything to speed it up.

@swetharepakula
Copy link
Member

swetharepakula commented Jun 15, 2022

Yes, this is expected behavior. The NEG finalizers exist to make sure that the ServiceNetworkEndpointGroup CRs ( a status only API) continue to stay around until the respective NEGs in GCE have been deleted. The NEG GC loop occurs every 2 minutes. NEG resources cannot be deleted if there are GCE resources that are using them such as BackendServices etc.

Most likely what is happening is that the Ingress resources are being cleaned up, so NEG deletion fails until all the ingress related resources are deleted first and then the NEG resources can be deleted.

The command kubectl get all -n derps does not show all the resources in the namespaces and only selectively shows certain core K8s resources. In this case Ingress and ServiceNetworkEndpointGroups (a CRD) are not shown. For a more accurate picture of what exists in the namespace try kubectl api-resources as described in kubernetes/kubectl#151 (comment).

kubectl api-resources --verbs=list --namespaced -o name \
  | xargs -n 1 kubectl get --show-kind --ignore-not-found -l <label>=<value> -n <namespace>

Namespace deletion will be faster if all Ingresses and Negs are deleted before the namespace. If the resources are deleted, the namespace deletion should not hang, however deleting the namespace is easier to systematically clean up all resources in a namespace.

@swetharepakula
Copy link
Member

/assign

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 14, 2022
@red8888
Copy link
Author

red8888 commented Sep 15, 2022

Just to follow up I have not experienced this behavior FYI: "Namespace deletion will be faster if all Ingresses and Negs are deleted before the namespace."

I deploy with helm and helm uninstall before removing the namespace. even removing all resources first does not seem to speed up the namespace deletion. maybe you need to wait a min or two after deleting the ingress to delete the namespace?

@swetharepakula
Copy link
Member

@red8888, are you deleting the namespace right after all the NEG and Ingress resources are deleted or waiting for those resources to be completely deleted? A quick check is to run the following command to ensure that no SvcNeg CRs or Ingresses remain in the namespace before deleting the namespace.

kubectl api-resources --verbs=list --namespaced -o name \
  | xargs -n 1 kubectl get --show-kind --ignore-not-found -n <namespace>

As mentioned in the above comment, the problem is that GC takes time. When you delete the namespace, due to finalizers the resources will block namespace deletion. Those finalizers are required though to ensure that the controllers are able to clean up the created GCE resources.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 15, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 14, 2022
@antdking
Copy link

antdking commented Sep 4, 2023

This is still an issue, and has resources lingering for much longer than the "2 minute loop" mentioned above

@radekcz
Copy link

radekcz commented Dec 4, 2023

This is still an issue, and has resources lingering for much longer than the "2 minute loop" mentioned above

The same issue for me: when I try to delete k8s namespace, it takes more than 8 minutes to be able to delete it due to:

Some resources are remaining: servicenetworkendpointgroups.networking.gke.io has 1 resource instances

But all resources managed by Helm have been deleted successfully using the command:

helm uninstall

Please, would it be possible to make this deletion faster, e.g. mark some k8s (GKE) resources as deleted or ?

@bkanuka
Copy link

bkanuka commented Feb 26, 2024

Is this closed as expected behaviour? I have been waiting 20 minutes for a namespace to delete (which is quite a long time for an accidental delete 🙈 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants