Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Role reconciler needs to look at all the nodes present and not present on startup #2548

Closed
cyli opened this issue Mar 9, 2018 · 3 comments
Closed
Assignees

Comments

@cyli
Copy link
Contributor

cyli commented Mar 9, 2018

An instance was seen where in a snapshot dump, a node was listed in the membership but not in the node objects in the store. Potentially this could have happened because the snapshot occurred after the node role was changed, but before the role reconciler ticked. However, I was looking into whether it could be possible for this to be a more permanent issue.

When a node is demoted, first we check to see if demoting the node would produce a loss of quorum. If not, then the node is demoted by setting its desired role to worker. On the node update event, a reconciliation loop in manager/role_manager.go removes the node from the raft consensus group by calling raftNode.RemoveMember, which again checks to see if removing the node would cause a loss of quorum, and if it does or if there is some error with committing the change, the reconciliation loop tries again later.

If there is a loss of quorum (say one or more of the remaining node becomes unreachable) between when the desired role changes and when the node is removed from the raft consensus group, RemoveMember, and there is then a change in leadership, the new reconciliation loop reconciles all the nodes on startup.

However, it does not check for any nodes that have been removed. If a node demotion happened, then a node deletion (because we make sure the node desired role has changed, but not that the demotion process has finished, before we allow a node removal), and then there was a loss of quorum + leader election before the reconciler kicked in, it might be possible that a node may remain in the cluster membership but not the node list.

Since the node is gone from the node list, the new reconciler loop when it starts up will not attempt to reconcile it.

@cyli cyli self-assigned this Mar 9, 2018
@cyli cyli changed the title Role reconciler needs to look at all the nodes on startup Role reconciler needs to look at all the nodes present and not present on startup Mar 10, 2018
@anshulpundir
Copy link
Contributor

anshulpundir commented Mar 10, 2018

Some background questions for my understanding:

  1. What constitutes the cluster membership (in memory and/or on-disk) ?
  2. What constitutes the node list (in memory and/or on-disk) ?
  3. Is leadership change a necessary condition ?

@cyli
Copy link
Contributor Author

cyli commented Mar 13, 2018

What constitutes the cluster membership (in memory and/or on-disk) ?

In memory: https://github.com/docker/swarmkit/blob/master/manager/state/raft/raft.go#L106
On disk: in snapshots: https://github.com/docker/swarmkit/blob/master/api/snapshot.proto#L30, and on WALs, they'd be raft conf entry changes

What constitutes the node list (in memory and/or on-disk) ?

In memory: https://github.com/docker/swarmkit/blob/master/manager/state/store/nodes.go
On disk: in snapshot: https://github.com/docker/swarmkit/blob/master/api/snapshot.proto#L17, and in WALs, it'd be a normal entry.

Is leadership change a necessary condition ?

I think so - otherwise, if there is no leadership change, the leader should be processing node updates that indicate a demotion. Even if it can't remove the raft member, it is keeping track of that member that needs to be removed in an in-memory queue. The problem is that the queue is lost when there's a leadership change - the information can be re-synthesized from the node list, but currently it only checks the existing nodes, and doesn't check if any nodes corresponding to a raft member are missing from that list.

@cyli
Copy link
Contributor Author

cyli commented Mar 21, 2018

Closing this in favor of #2565, which is a more general issue about node promotion/demotion issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants