check if server is in configuration when receiving a voteRequest #526

dhiaayachi · 2022-09-14T14:33:38Z

This PR change the check introduced in #477.

Based on the bug reported in #524, when a non voter join the cluster with a higher term (This can happen in the case of a returning node to the cluster after a restart, that was pruned by autopilot) it causes the cluster to become unstable.

The scenario where this happen is the following:

Stable cluster with node0, node1,node2. All nodes are voters
node0 is partitioned from the cluster
node0 is removed from the cluster by auto-pilote CleanupDeadServersIt can't see the change (its configuration still have node0, node1, node2) and it still think it's a Voter
the remaning cluster config is (node1,node2) a new leader is elected on term c1
node0 transition to candidate and run elections until reaching c2>c1, this happen because it still think it's part of the cluster and it's a voter.
partion is gone, the node try to rejoin the cluster (For Consul, this would happen through joining the serf pool, which allow the current cluster leader to add it back to the cluster)
node0 is added by default as nonVoter, until raft see it caught up enough to transition to voter
cluster replicate to node0 to try to make it aware of the new configuration and catch it up by performing an appendEntries CMD
node0 reject the appendEntries CMD and the cluster start a new election
a new leader is elected (not node0)
cluster replicate to node0 to try to make it aware of the new configuration and catch it up by performing an appendEntries CMD
node0 reject the appendEntries CMD and the cluster start a new election
...

At the same time, node0 is still running as candidate (it still think it's part of the cluster and it's a voter) but, because of the fix introduced in #477, the request will be rejected which have the effect of increasing node0 term and make this unstable state permanent as node0 term is inflating and the cluster term will never catch-up to it.

The fix introduced in this PR break this loop, by allowing node0 to request a vote as a non voter which have the effect of levelling the term and let the cluster move forward. This is not ideal as it destabilize the cluster while node0 is guaranteed to not win the election.

An ideal fix would be to add pre-vote, #474 in conjunction to this to avoid term inflation and keep the cluster stable.

…eceiving a voteRequest

raft_test.go

ncabatoff · 2022-09-14T14:46:39Z

raft_test.go

+		t.Fatalf("err: %v", err)
+	}
+
+	//set that follower term to higher term to simulate a partitioning


Do we need to simulate a partitioning? Isn't that what c.Disconnect is doing?

Yes disconnect would prohibit communication from the specified node to any node and from any node to that node.

raft_test.go

… the bug manifest Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>

szechuen · 2022-09-15T17:11:17Z

Thanks for the new PR!

I'd still recommend reordering the term check as proposed in #525 (comment) instead of modifying the voter/server check, though, for 2 reasons:

Raft should not be dependent on an external system (Serf) to add back the disconnected server as a non-voter for the cluster to make progress. Without which, inConfiguration will continue to be false and trap us in the livelock.
The cluster needs to make progress but it shouldn't have to grant vote to a non-voter in order to do so. As long as its higher term count is matched, the remaining voters can still win and maintain a stable leadership which the disconnected server would follow. Granting votes to a server that has been explicitly added as a non-voter would in fact be an unexpected behavior which defeats the purpose of the ServerStabilizationTime config provided by raft-autopilot.

Any thoughts?

dhiaayachi · 2022-09-16T14:07:28Z

Hey @szechuen
I'm not sure I fully understand the following statement:

Raft should not be dependent on an external system (Serf) to add back the disconnected server as a non-voter for the cluster to make progress. Without which, inConfiguration will continue to be false and trap us in the livelock.

Only a leader can add a server to the configuration and is responsible to replicate that configuration to other nodes. In that case I think it would not be possible to have the scenario you describe here happening, as this check only verify that a request for vote come from a server already in the configuration, which should be always the case from the perspective of the leader. Am I missing something?

szechuen · 2022-09-19T23:29:36Z

Hi @dhiaayachi, sorry for the late reply.

What I meant by (1) was that we need to consider the behavior when the disconnected server is removed but not subsequently added back to the cluster as a non-voter (by Serf). This is not hypothetical since Vault does not run Serf and only adds a server as part of its initial bootstrap: https://github.com/hashicorp/vault/blob/a2d818bf0ad59bac9a44b2e2a62bc2ae4bad49dd/vault/logical_system_raft.go#L351-L356

However, I'm mistaken that this would result in a livelock since the leader would not try to replicate to the removed server, and hence does not trigger the higher term step-down. On the contrary, only updating our term in requestVote if the requesting server is in configuration is the better behavior so that the cluster does not get repeatedly disrupted, since the disconnected server will keep restarting elections because we don't replicate to it.

The scenario can be reproduced by omitting https://github.com/hashicorp/raft/pull/526/files#diff-0022e96fabdba704c82038d060a73c969456f9d3a0ae768652d03c74e9fa36c1R2679-R2683 from the test. I think the best thing we can do is to maintain a stable leadership with the remaining servers and expect a manual rejoin of the disconnected server in that case.

Point (2) is still valid though. What I'm proposing is thus:

candidateID := ServerID(req.ID)

if len(r.configurations.latest.Servers) > 0 && !inConfiguration(r.configurations.latest, candidateID) {
  r.logger.Warn("ignoring vote request since node is not in configuration", "from", candidate)
  return
}

if req.Term < r.getCurrentTerm() {
  return
}
if req.Term > r.getCurrentTerm() {
  r.logger.Debug("lost leadership because received a requestVote with a newer term")
  r.setState(Follower)
  r.setCurrentTerm(req.Term)
  resp.Term = req.Term
}

if len(r.configurations.latest.Servers) > 0 && !hasVote(r.configurations.latest, candidateID) {
  r.logger.Warn("rejecting vote request since node is not a voter", "from", candidate)
  return
}
if leaderAddr, leaderID := r.LeaderWithID(); leaderAddr != "" && leaderAddr != candidate && !req.LeadershipTransfer {
  r.logger.Warn("rejecting vote request since we have a leader", "from", candidate, "leader", leaderAddr, "leader-id", string(leaderID))
  return
}

dhiaayachi · 2022-09-21T19:44:42Z

@szechuen sorry for the delay, I was not able to go through your answer earlier.

If understand your proposal right, you are suggesting to:

if node is not in config reject the vote request , which actually what is introduced in this PR
If a node have a lower term reject the vote request
If a node have higher term we would step down and accept the vote to permit to the cluster to make progress and solve the issue in hand and that would be for voters or nonVoters.
if the node is nonVoter reject the vote (that leave us with only when we have equal term, as lower term would have been rejected in 2. and higher term would have been accepted in 3)
if we have a leader refuse the Vote req

I think that would work. That said, I think based on the checks that we already have in place that would only cover differently the one case where we receive a Vote from a nonVoter with a term = current term. Am I missing something here?

If that's the case I don't think that's a realistic case, as if a node is nonVoter from the perspective of the leader and is on the same term, that mean that node know that it's a nonVoter and would never send a request for vote in the first place. That said, I think it's a no harm to add that extra check to protect from the case where we have a node with the same term but with completely different history.

What I meant by (1) was that we need to consider the behaviour when the disconnected server is removed but not subsequently added back to the cluster as a non-voter (by Serf). This is not hypothetical since Vault does not run Serf and only adds a server as part of its initial bootstrap

I think the right behaviour from Raft perspective is the one you observer, that mean that this node stay excluded from the cluster while the cluster stay stable.

IMHO raft should not manage adding or removing nodes automatically, those changes need to be done by calling the API (AddVoter, AddNonVoter, RemoveServer) and any non leader server that is not part of the stable configuration, managed by the leader, should not be able to overwrite that configuration by hijacking the leadership.
If someone want to implement a way of automatically adding and removing servers to raft it should be implemented separately by calling the API. For example that mechanism in Consul is managed by 2 components Serf and Autopilot (Serf would trigger adding the servers, Autopilot would remove them if detected dead). I'm not sure for Vault as I'm not familiar with the code base, but I would say that Vault servers need to rely on an external mechanism to add back dead servers cleaned up by raft-autopilot (maybe raft-autopilot?).

szechuen · 2022-09-22T17:43:00Z

If understand your proposal right, you are suggesting to:

if node is not in config reject the vote request , which actually what is introduced in this PR

If a node have a lower term reject the vote request

If a node have higher term we would step down and accept the vote to permit to the cluster to make progress and solve the issue in hand and that would be for voters or nonVoters.

if the node is nonVoter reject the vote (that leave us with only when we have equal term, as lower term would have been rejected in 2. and higher term would have been accepted in 3)

if we have a leader refuse the Vote req

Yep this is mostly accurate, with the caveat that we are not accepting the vote immediately after stepping down in (3). We would only vote for the node if it is a voter.

I think that would work. That said, I think based on the checks that we already have in place that would only cover differently the one case where we receive a Vote from a nonVoter with a term = current term. Am I missing something here?

The difference here is that we would step down for a higher-term non-voter, but not vote for it.

I think the right behaviour from Raft perspective is the one you observer, that mean that this node stay excluded from the cluster while the cluster stay stable.

IMHO raft should not manage adding or removing nodes automatically, those changes need to be done by calling the API (AddVoter, AddNonVoter, RemoveServer) and any non leader server that is not part of the stable configuration, managed by the leader, should not be able to overwrite that configuration by hijacking the leadership.
If someone want to implement a way of automatically adding and removing servers to raft it should be implemented separately by calling the API. For example that mechanism in Consul is managed by 2 components Serf and Autopilot (Serf would trigger adding the servers, Autopilot would remove them if detected dead). I'm not sure for Vault as I'm not familiar with the code base, but I would say that Vault servers need to rely on an external mechanism to add back dead servers cleaned up by raft-autopilot (maybe raft-autopilot?).

Yep fully agreed. My point is that we need to consider and make sure the remaining cluster still ends up in a stable state when that happens.

… but don't grant a vote

dhiaayachi · 2022-09-22T19:03:04Z

@szechuen Thank you for the clarification, I can see the nuance now and it seems reasonable to me.
That said, I think the suggested implementation would break the case where a leader is already in place because it would make the node step down and increase term even if we have a stable leader.
I tried to implement a slightly different logic based on your input, see my last commit.

szechuen · 2022-09-22T22:38:35Z

Yeah I think that makes sense. Even though the leader will eventually step down anyway once it starts replicating to the higher-term node (i.e. no functional difference here), this is the right logic in preparation of the pre-vote optimization where nodes should not be disrupted by a higher-term vote request if there's a stable leader.

banks

Thanks @dhiaayachi looks good to me although I have to admit despite following these PRs I still had to read through your PR description again to remember why this was needed and more correct than before!

My only inline feedback is to consider the comment on the non-obvious bit of code being added here to be even more explicit in acknowledging that this seems counter-intuitive but is there for a good reason!

banks · 2022-09-29T17:01:58Z

raft.go

+	// if we get a request for vote from a nonVoter  and the request term is higher,
+	// step down and update term, but reject the vote request
+	// This could happen when a node, previously voter, is converted to non-voter


Do you think it's worth a note or link back to this issue here? I suspect reading this I'd be puzzled about why we allow a non-voter to "disrupt" leaders at all because the nuance involved in this issue is high! Ideally people would view blame before changing this back to ignore votes from non-voters but we could potentially use this comment to flag the non-intuitive behaviour is for a good reason?

check if server is in configuration, and not have vote rights, when r…

db7ed9e

…eceiving a voteRequest

ncabatoff reviewed Sep 14, 2022

View reviewed changes

raft_test.go Show resolved Hide resolved

ncabatoff reviewed Sep 14, 2022

View reviewed changes

raft_test.go Outdated Show resolved Hide resolved

change test term to 20 to reduce the chance that the test passes when…

b8e8e45

… the bug manifest Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>

non voter with higher term vote request would make the node step-down…

2d614b5

… but don't grant a vote

szechuen approved these changes Sep 22, 2022

View reviewed changes

szechuen mentioned this pull request Sep 23, 2022

Unstable leadership when running server is demoted/removed without its participation #524

Closed

dhiaayachi mentioned this pull request Sep 27, 2022

Non voter with higher term should not cause cluster instability #525

Closed

fix test to check we use the right term

826fce5

banks approved these changes Sep 29, 2022

View reviewed changes

add more details in the comment and a reference to the PR

22e0e00

ncabatoff approved these changes Oct 3, 2022

View reviewed changes

dhiaayachi merged commit 6b4e320 into main Oct 3, 2022

dhiaayachi deleted the non_voter_loop_fix branch October 3, 2022 18:46

dhiaayachi mentioned this pull request Oct 5, 2022

bump raft version to v1.3.11 hashicorp/consul#14897

Merged

This was referenced Oct 27, 2022

Backport of bump raft version to v1.3.11 into release/1.12.x hashicorp/consul#15174

Closed

Backport of bump raft version to v1.3.11 into release/1.13.x hashicorp/consul#15175

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check if server is in configuration when receiving a voteRequest #526

check if server is in configuration when receiving a voteRequest #526

dhiaayachi commented Sep 14, 2022 •

edited

Loading

ncabatoff Sep 14, 2022

dhiaayachi Sep 14, 2022

szechuen commented Sep 15, 2022

dhiaayachi commented Sep 16, 2022

szechuen commented Sep 19, 2022

dhiaayachi commented Sep 21, 2022

szechuen commented Sep 22, 2022

dhiaayachi commented Sep 22, 2022

szechuen commented Sep 22, 2022

banks left a comment

banks Sep 29, 2022

check if server is in configuration when receiving a voteRequest #526

check if server is in configuration when receiving a voteRequest #526

Conversation

dhiaayachi commented Sep 14, 2022 • edited Loading

ncabatoff Sep 14, 2022

Choose a reason for hiding this comment

dhiaayachi Sep 14, 2022

Choose a reason for hiding this comment

szechuen commented Sep 15, 2022

dhiaayachi commented Sep 16, 2022

szechuen commented Sep 19, 2022

dhiaayachi commented Sep 21, 2022

szechuen commented Sep 22, 2022

dhiaayachi commented Sep 22, 2022

szechuen commented Sep 22, 2022

banks left a comment

Choose a reason for hiding this comment

banks Sep 29, 2022

Choose a reason for hiding this comment

dhiaayachi commented Sep 14, 2022 •

edited

Loading