-
Notifications
You must be signed in to change notification settings - Fork 781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster.yaml shows node that does not exist and is not listed with kubectl get nodes #4311
Comments
Hi @goran-insby thank you for reporting this. Could you please describe in more details how the cluster reached that state? What are the steps to follow to get the node removed but still in cluster.yaml. Could you please share an |
Hi @ktsakalozos, role in cluster.yaml that is assigned is 2, which would mean spare, I think. |
Hi @goran-insby, Where you able to find a solution for this ? I'm in a similar situation: node removed from the cluster but still appearing in cluster.yaml and as a consequence in the "datastore standby nodes" list when running microk8s status. I have however not seen any weird behavior from dqlite itself nor calico, but I would like to remove this to avoid potential future burden. To answer @ktsakalozos's question, in my case the dead node has role set to '1' in cluster.yaml:
I tried to use force remove the node, but with no luck even if the last line could be interpreted as the node being removed from dqlite:
Any help appreciated :) |
Hi @zoc, unfortunately no luck, couple months later I reinstalled whole cluster. This extra issue in configuration didn't seemed to have big effect on cluster. |
Same issue here. I can add something for @zoc which worked for me : Problem state :
The following command allowed me to clean that up :
After which the cluster.yaml showed that it got removed there too :
|
Thanks @d33psky, but unless I'm blind I cannot see any difference with what I have mentioned and tried to do already. Tried again today as I've updated to latest release with no more luck. |
Was your cluster.yaml updated ? Meanwhile I've reproduced the add-node problem that caused this broken state.
The last command produced
And then we have the problem state back :
Repeating the last command occasionally shows
and a second later reports (only) node 1. |
Hey @ktsakalozos is there anything you want me to test on this cluster ? I probably have a few hours tomorrow to do so before I have to wipe both servers in an attempt to create a working cluster. |
No, it wasn't. |
Summary
In cluster.yaml I can see IP address of node that was removed some time ago. This node that is listed in cluster.yaml is not shown with
kubectl get nodes
[which is correct].The problem is that calico tries to contact this server constantly, which I think leads to dqlite going crazy and spiking over 100%.
This situation looks like "lost quorum" but node is listed only in cluster.yaml and does not appear with
kubectl get nodes
, which is strange.What Should Happen Instead?
Expected behaviour is that cluster.yaml is aligned with output from
kubectl get nodes
Reproduction Steps
This is situation on one cluster. I tried to stop all nodes, remove manually ip address from cluster.yaml and start it, but unfortunately it came back to cluster.yaml, so this is not the place to make the change, I assume.
The text was updated successfully, but these errors were encountered: