-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check if server is in configuration when receiving a voteRequest #526
Conversation
…eceiving a voteRequest
raft_test.go
Outdated
t.Fatalf("err: %v", err) | ||
} | ||
|
||
//set that follower term to higher term to simulate a partitioning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to simulate a partitioning? Isn't that what c.Disconnect is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes disconnect would prohibit communication from the specified node to any node and from any node to that node.
… the bug manifest Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
Thanks for the new PR! I'd still recommend reordering the term check as proposed in #525 (comment) instead of modifying the voter/server check, though, for 2 reasons:
Any thoughts? |
Hey @szechuen
Only a leader can add a server to the configuration and is responsible to replicate that configuration to other nodes. In that case I think it would not be possible to have the scenario you describe here happening, as this check only verify that a request for vote come from a server already in the configuration, which should be always the case from the perspective of the leader. Am I missing something? |
Hi @dhiaayachi, sorry for the late reply. What I meant by (1) was that we need to consider the behavior when the disconnected server is removed but not subsequently added back to the cluster as a non-voter (by Serf). This is not hypothetical since Vault does not run Serf and only adds a server as part of its initial bootstrap: https://github.com/hashicorp/vault/blob/a2d818bf0ad59bac9a44b2e2a62bc2ae4bad49dd/vault/logical_system_raft.go#L351-L356 However, I'm mistaken that this would result in a livelock since the leader would not try to replicate to the removed server, and hence does not trigger the higher term step-down. On the contrary, only updating our term in The scenario can be reproduced by omitting https://github.com/hashicorp/raft/pull/526/files#diff-0022e96fabdba704c82038d060a73c969456f9d3a0ae768652d03c74e9fa36c1R2679-R2683 from the test. I think the best thing we can do is to maintain a stable leadership with the remaining servers and expect a manual rejoin of the disconnected server in that case. Point (2) is still valid though. What I'm proposing is thus: candidateID := ServerID(req.ID)
if len(r.configurations.latest.Servers) > 0 && !inConfiguration(r.configurations.latest, candidateID) {
r.logger.Warn("ignoring vote request since node is not in configuration", "from", candidate)
return
}
if req.Term < r.getCurrentTerm() {
return
}
if req.Term > r.getCurrentTerm() {
r.logger.Debug("lost leadership because received a requestVote with a newer term")
r.setState(Follower)
r.setCurrentTerm(req.Term)
resp.Term = req.Term
}
if len(r.configurations.latest.Servers) > 0 && !hasVote(r.configurations.latest, candidateID) {
r.logger.Warn("rejecting vote request since node is not a voter", "from", candidate)
return
}
if leaderAddr, leaderID := r.LeaderWithID(); leaderAddr != "" && leaderAddr != candidate && !req.LeadershipTransfer {
r.logger.Warn("rejecting vote request since we have a leader", "from", candidate, "leader", leaderAddr, "leader-id", string(leaderID))
return
} |
@szechuen sorry for the delay, I was not able to go through your answer earlier. If understand your proposal right, you are suggesting to:
I think that would work. That said, I think based on the checks that we already have in place that would only cover differently the one case where we receive a Vote from a nonVoter with a term = current term. Am I missing something here? If that's the case I don't think that's a realistic case, as if a node is nonVoter from the perspective of the leader and is on the same term, that mean that node know that it's a nonVoter and would never send a request for vote in the first place. That said, I think it's a no harm to add that extra check to protect from the case where we have a node with the same term but with completely different history.
I think the right behaviour from Raft perspective is the one you observer, that mean that this node stay excluded from the cluster while the cluster stay stable. IMHO raft should not manage adding or removing nodes automatically, those changes need to be done by calling the API (AddVoter, AddNonVoter, RemoveServer) and any non leader server that is not part of the stable configuration, managed by the leader, should not be able to overwrite that configuration by hijacking the leadership. |
Yep this is mostly accurate, with the caveat that we are not accepting the vote immediately after stepping down in (3). We would only vote for the node if it is a voter.
The difference here is that we would step down for a higher-term non-voter, but not vote for it.
Yep fully agreed. My point is that we need to consider and make sure the remaining cluster still ends up in a stable state when that happens. |
… but don't grant a vote
@szechuen Thank you for the clarification, I can see the nuance now and it seems reasonable to me. |
Yeah I think that makes sense. Even though the leader will eventually step down anyway once it starts replicating to the higher-term node (i.e. no functional difference here), this is the right logic in preparation of the pre-vote optimization where nodes should not be disrupted by a higher-term vote request if there's a stable leader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dhiaayachi looks good to me although I have to admit despite following these PRs I still had to read through your PR description again to remember why this was needed and more correct than before!
My only inline feedback is to consider the comment on the non-obvious bit of code being added here to be even more explicit in acknowledging that this seems counter-intuitive but is there for a good reason!
// if we get a request for vote from a nonVoter and the request term is higher, | ||
// step down and update term, but reject the vote request | ||
// This could happen when a node, previously voter, is converted to non-voter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth a note or link back to this issue here? I suspect reading this I'd be puzzled about why we allow a non-voter to "disrupt" leaders at all because the nuance involved in this issue is high! Ideally people would view blame before changing this back to ignore votes from non-voters but we could potentially use this comment to flag the non-intuitive behaviour is for a good reason?
This PR change the check introduced in #477.
Based on the bug reported in #524, when a non voter join the cluster with a higher term (This can happen in the case of a returning node to the cluster after a restart, that was pruned by autopilot) it causes the cluster to become unstable.
The scenario where this happen is the following:
node0
,node1
,node2
. All nodes are votersCleanupDeadServers
It can't see the change (its configuration still have node0, node1, node2) and it still think it's aVoter
At the same time, node0 is still running as candidate (it still think it's part of the cluster and it's a voter) but, because of the fix introduced in #477, the request will be rejected which have the effect of increasing node0 term and make this unstable state permanent as node0 term is inflating and the cluster term will never catch-up to it.
The fix introduced in this PR break this loop, by allowing node0 to request a vote as a non voter which have the effect of levelling the term and let the cluster move forward. This is not ideal as it destabilize the cluster while node0 is guaranteed to not win the election.
An ideal fix would be to add pre-vote, #474 in conjunction to this to avoid term inflation and keep the cluster stable.