-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Role reconciler needs to look at all the nodes present and not present on startup #2548
Comments
Some background questions for my understanding:
|
In memory: https://github.com/docker/swarmkit/blob/master/manager/state/raft/raft.go#L106
In memory: https://github.com/docker/swarmkit/blob/master/manager/state/store/nodes.go
I think so - otherwise, if there is no leadership change, the leader should be processing node updates that indicate a demotion. Even if it can't remove the raft member, it is keeping track of that member that needs to be removed in an in-memory queue. The problem is that the queue is lost when there's a leadership change - the information can be re-synthesized from the node list, but currently it only checks the existing nodes, and doesn't check if any nodes corresponding to a raft member are missing from that list. |
Closing this in favor of #2565, which is a more general issue about node promotion/demotion issues. |
An instance was seen where in a snapshot dump, a node was listed in the membership but not in the node objects in the store. Potentially this could have happened because the snapshot occurred after the node role was changed, but before the role reconciler ticked. However, I was looking into whether it could be possible for this to be a more permanent issue.
When a node is demoted, first we check to see if demoting the node would produce a loss of quorum. If not, then the node is demoted by setting its desired role to
worker
. On the node update event, a reconciliation loop inmanager/role_manager.go
removes the node from the raft consensus group by callingraftNode.RemoveMember
, which again checks to see if removing the node would cause a loss of quorum, and if it does or if there is some error with committing the change, the reconciliation loop tries again later.If there is a loss of quorum (say one or more of the remaining node becomes unreachable) between when the desired role changes and when the node is removed from the raft consensus group,
RemoveMember
, and there is then a change in leadership, the new reconciliation loop reconciles all the nodes on startup.However, it does not check for any nodes that have been removed. If a node demotion happened, then a node deletion (because we make sure the node desired role has changed, but not that the demotion process has finished, before we allow a node removal), and then there was a loss of quorum + leader election before the reconciler kicked in, it might be possible that a node may remain in the cluster membership but not the node list.
Since the node is gone from the node list, the new reconciler loop when it starts up will not attempt to reconcile it.
The text was updated successfully, but these errors were encountered: