Node deleted but still streaming DeletingNode events #109

tomfotherby · 2017-06-08T17:47:27Z

I installed a Kubernetes cluster on AWS and CoreOS hosts with Tack and the cluster-autoscaler is included as a add-on. This is the yaml they use: https://github.com/kz8s/tack/blob/master/addons/autoscaler/cluster-autoscaler.yml (uses v0.5.2)

After a bit of time with a successful but empty cluster, the autoscaler kicked in and killed 1 or the 3 workers.

The node is no longer shown when doing kubeclt get nodes.

The problem is, the worker node is stuck as DeletingNode which can be seen from thousands of events along the lines of:

Deleting Node ip-10-56-0-138.ec2.internal because it's not present according to cloud provider

Example:

$ kubectl get events
LASTSEEN   FIRSTSEEN   COUNT     NAME                          KIND      SUBOBJECT   TYPE      REASON         SOURCE              MESSAGE
3s         6h          4780      ip-10-56-0-138.ec2.internal   Node                  Normal    DeletingNode   controllermanager   Node ip-10-56-0-138.ec2.internal event: Deleting Node ip-10-56-0-138.ec2.internal because it's not present according to cloud provider

(note: count: 4780!)

Checking the configmap that the autoscaler creates shows the worker node that was removed is still somehow registered. i.e.

  Nodes: Healthy (ready=5 unready=0 notStarted=0 longNotStarted=0 registered=6)

Is there a problem with the autoscaler? Is it supposed to unregister the node or is this normal?

Is there a way I can get more info about why DeletingNode event is appearing so often. There must be a reason for the node not able to be fully deleted. At one point, a stateful set put a pv and pvc on the worker that was deleted - I'm not sure if this could cause a issue with it being unregistered. The pv and pvc were manually removed with no luck curbing the continuing DeletingNode event stream.

Sorry if this issue is not appropriate. Feel free to remove if this is the case. ( It's hard to tell if it could be a bug with the autoscaler or just my use-case.)

The config map in full:

$ kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml
apiVersion: v1
data:
  status: |+
    Cluster-autoscaler status at 2017-06-08 17:30:00.417692456 +0000 UTC:
    Cluster-wide:
      Health:      Healthy (ready=5 unready=0 notStarted=0 longNotStarted=0 registered=6)
                   LastProbeTime:      2017-06-08 17:29:59.812893761 +0000 UTC
                   LastTransitionTime: 2017-06-08 10:26:35.872670968 +0000 UTC
      ScaleUp:     NoActivity (ready=5 registered=6)
                   LastProbeTime:      2017-06-08 17:29:59.812893761 +0000 UTC
                   LastTransitionTime: 2017-06-08 10:26:35.872670968 +0000 UTC
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2017-06-08 17:30:00.119227722 +0000 UTC
                   LastTransitionTime: 2017-06-08 10:46:54.809754422 +0000 UTC

    NodeGroups:
      Name:        worker-general-test
      Health:      Healthy (ready=2 unready=0 notStarted=0 longNotStarted=0 registered=2 cloudProviderTarget=2 (minSize=1, maxSize=5))
                   LastProbeTime:      2017-06-08 17:29:59.812893761 +0000 UTC
                   LastTransitionTime: 2017-06-08 10:26:35.872670968 +0000 UTC
      ScaleUp:     NoActivity (ready=2 cloudProviderTarget=2)
                   LastProbeTime:      2017-06-08 17:29:59.812893761 +0000 UTC
                   LastTransitionTime: 2017-06-08 10:26:35.872670968 +0000 UTC
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2017-06-08 17:30:00.119227722 +0000 UTC
                   LastTransitionTime: 2017-06-08 10:46:54.809754422 +0000 UTC

kind: ConfigMap
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/last-updated: 2017-06-08 17:30:00.417692456 +0000
      UTC
  creationTimestamp: 2017-06-08T10:26:25Z
  name: cluster-autoscaler-status
  namespace: kube-system
  resourceVersion: "60900"
  selfLink: /api/v1/namespaces/kube-system/configmaps/cluster-autoscaler-status
  uid: ed1780d0-4c34-11e7-bb12-0afa88f15a64

The text was updated successfully, but these errors were encountered:

tomfotherby · 2017-06-09T07:36:18Z

I'm not 100% sure but I think my problem is fixed in PR kubernetes/kubernetes#45923 :

Fix log spam due to unnecessary status update when node is deleted.

Which I found from the Kuberbetes v1.7.0-beta.1 CHANGELOG. So hopefully coming 28/Jun/17.

…etter-delete-nodes-v2 UPSTREAM: <carry>: openshift: Rework logic in DeleteNodes()

Enable templates by creating ISSUE_TEMPLATE folder

tomfotherby closed this as completed Jun 9, 2017

frobware pushed a commit to frobware/autoscaler that referenced this issue Jul 9, 2019

Merge pull request kubernetes#109 from frobware/machineapi-provider-b…

b7a5eaa

…etter-delete-nodes-v2 UPSTREAM: <carry>: openshift: Rework logic in DeleteNodes()

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024

Merge pull request kubernetes#109 from ArangoGutierrez/devel/github

8f3f538

Enable templates by creating ISSUE_TEMPLATE folder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node deleted but still streaming DeletingNode events #109

Node deleted but still streaming DeletingNode events #109

tomfotherby commented Jun 8, 2017

tomfotherby commented Jun 9, 2017 •

edited

Loading

Node deleted but still streaming DeletingNode events #109

Node deleted but still streaming DeletingNode events #109

Comments

tomfotherby commented Jun 8, 2017

tomfotherby commented Jun 9, 2017 • edited Loading

tomfotherby commented Jun 9, 2017 •

edited

Loading