Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node deleted but still streaming DeletingNode events #109

Closed
tomfotherby opened this issue Jun 8, 2017 · 1 comment
Closed

Node deleted but still streaming DeletingNode events #109

tomfotherby opened this issue Jun 8, 2017 · 1 comment

Comments

@tomfotherby
Copy link

I installed a Kubernetes cluster on AWS and CoreOS hosts with Tack and the cluster-autoscaler is included as a add-on. This is the yaml they use: https://github.com/kz8s/tack/blob/master/addons/autoscaler/cluster-autoscaler.yml (uses v0.5.2)

After a bit of time with a successful but empty cluster, the autoscaler kicked in and killed 1 or the 3 workers.

The node is no longer shown when doing kubeclt get nodes.

The problem is, the worker node is stuck as DeletingNode which can be seen from thousands of events along the lines of:

Deleting Node ip-10-56-0-138.ec2.internal because it's not present according to cloud provider

Example:

$ kubectl get events
LASTSEEN   FIRSTSEEN   COUNT     NAME                          KIND      SUBOBJECT   TYPE      REASON         SOURCE              MESSAGE
3s         6h          4780      ip-10-56-0-138.ec2.internal   Node                  Normal    DeletingNode   controllermanager   Node ip-10-56-0-138.ec2.internal event: Deleting Node ip-10-56-0-138.ec2.internal because it's not present according to cloud provider

(note: count: 4780!)

Checking the configmap that the autoscaler creates shows the worker node that was removed is still somehow registered. i.e.

  Nodes: Healthy (ready=5 unready=0 notStarted=0 longNotStarted=0 registered=6)

Is there a problem with the autoscaler? Is it supposed to unregister the node or is this normal?

Is there a way I can get more info about why DeletingNode event is appearing so often. There must be a reason for the node not able to be fully deleted. At one point, a stateful set put a pv and pvc on the worker that was deleted - I'm not sure if this could cause a issue with it being unregistered. The pv and pvc were manually removed with no luck curbing the continuing DeletingNode event stream.

Sorry if this issue is not appropriate. Feel free to remove if this is the case. ( It's hard to tell if it could be a bug with the autoscaler or just my use-case.)


The config map in full:

$ kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml
apiVersion: v1
data:
  status: |+
    Cluster-autoscaler status at 2017-06-08 17:30:00.417692456 +0000 UTC:
    Cluster-wide:
      Health:      Healthy (ready=5 unready=0 notStarted=0 longNotStarted=0 registered=6)
                   LastProbeTime:      2017-06-08 17:29:59.812893761 +0000 UTC
                   LastTransitionTime: 2017-06-08 10:26:35.872670968 +0000 UTC
      ScaleUp:     NoActivity (ready=5 registered=6)
                   LastProbeTime:      2017-06-08 17:29:59.812893761 +0000 UTC
                   LastTransitionTime: 2017-06-08 10:26:35.872670968 +0000 UTC
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2017-06-08 17:30:00.119227722 +0000 UTC
                   LastTransitionTime: 2017-06-08 10:46:54.809754422 +0000 UTC

    NodeGroups:
      Name:        worker-general-test
      Health:      Healthy (ready=2 unready=0 notStarted=0 longNotStarted=0 registered=2 cloudProviderTarget=2 (minSize=1, maxSize=5))
                   LastProbeTime:      2017-06-08 17:29:59.812893761 +0000 UTC
                   LastTransitionTime: 2017-06-08 10:26:35.872670968 +0000 UTC
      ScaleUp:     NoActivity (ready=2 cloudProviderTarget=2)
                   LastProbeTime:      2017-06-08 17:29:59.812893761 +0000 UTC
                   LastTransitionTime: 2017-06-08 10:26:35.872670968 +0000 UTC
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2017-06-08 17:30:00.119227722 +0000 UTC
                   LastTransitionTime: 2017-06-08 10:46:54.809754422 +0000 UTC

kind: ConfigMap
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/last-updated: 2017-06-08 17:30:00.417692456 +0000
      UTC
  creationTimestamp: 2017-06-08T10:26:25Z
  name: cluster-autoscaler-status
  namespace: kube-system
  resourceVersion: "60900"
  selfLink: /api/v1/namespaces/kube-system/configmaps/cluster-autoscaler-status
  uid: ed1780d0-4c34-11e7-bb12-0afa88f15a64
@tomfotherby
Copy link
Author

tomfotherby commented Jun 9, 2017

I'm not 100% sure but I think my problem is fixed in PR kubernetes/kubernetes#45923 :

Fix log spam due to unnecessary status update when node is deleted.

Which I found from the Kuberbetes v1.7.0-beta.1 CHANGELOG. So hopefully coming 28/Jun/17.

frobware pushed a commit to frobware/autoscaler that referenced this issue Jul 9, 2019
…etter-delete-nodes-v2

UPSTREAM: <carry>: openshift: Rework logic in DeleteNodes()
yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024
Enable templates by creating ISSUE_TEMPLATE folder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant