Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pre-vote implementation where leader's pre-vote is rejected #605

Merged
merged 4 commits into from
Aug 22, 2024

Conversation

k-jingyang
Copy link
Contributor

@k-jingyang k-jingyang commented Aug 17, 2024

Fixes #606

@k-jingyang k-jingyang requested review from a team as code owners August 17, 2024 02:37
@k-jingyang k-jingyang requested review from rboyer and removed request for a team August 17, 2024 02:37
Copy link

hashicorp-cla-app bot commented Aug 17, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes

Have you signed the CLA already but the status is still pending? Recheck it.

@jmurret jmurret requested a review from dhiaayachi August 17, 2024 15:35
@dhiaayachi
Copy link
Contributor

Hey @k-jingyang
Thank you for reporting this and providing a fix. I will go ahead and merge this once the CI pass.

Just making sure I fully grasp the impact of this bug. Before this fix, a leader who loses leadership, while pre-vote is on, will not be able to be reelected and leadership will likely transfer to another node but this won't introduce any extra scenarios where leadership become unstable or no leader is able to be elected.

@k-jingyang
Copy link
Contributor Author

k-jingyang commented Aug 19, 2024

Hey @k-jingyang Thank you for reporting this and providing a fix. I will go ahead and merge this once the CI pass.

Just making sure I fully grasp the impact of this bug. Before this fix, a leader who loses leadership, while pre-vote is on, will not be able to be reelected and leadership will likely transfer to another node but this won't introduce any extra scenarios where leadership become unstable or no leader is able to be elected.

Thanks! I realised that the CI pipeline is failing, I tried to run make test and it failed even on the current master. Not sure if I'm missing something here.

Yes, you are correct. However, the situation becomes more serious in the following scenario:

  • 3 node cluster
  • 1 node is down, 2 nodes are still up (node A leader, node B)
  • node A somehow think that it lost leadership
  • node B can still heartbeat to node A, so it doesn't call for an election)
  • node A tries to call for an election again, but pre-vote prevents it from doing so, as node B keeps rejecting it as per the bug

This was what we encountered. I didn't manage to dig into how node A thought that it lost leadership.

So even though a majority in the cluster (2 out of 3) is still up, there are no actual leaders

@dhiaayachi
Copy link
Contributor

@k-jingyang
I'm looking into the tests, it seems that tests are broken for some reason not related to this PR.

Yes, you are correct. However, the situation becomes more serious in the following scenario:

  • 3 node cluster
  • 1 node is down, 2 nodes are still up (node A leader, node B)
  • node A somehow think that it lost leadership
  • node B can still heartbeat to node A, so it doesn't call for an election)
  • node A tries to call for an election again, but pre-vote prevents it from doing so, as node B keeps rejecting it as per the bug

This was what we encountered. I didn't manage to dig into how node A thought that it lost leadership.

So even though a majority in the cluster (2 out of 3) is still up, there are no actual leaders

I'm not sure I understand? If node A think it's not a leader anymore it should stop sending heartbeats to all the other nodes which will lead to node B timing out the heartbeat and becoming a Candidate and eventually become a leader. Node B being a follower should not be sending heartbeats to node A, Am I missing something?

@k-jingyang
Copy link
Contributor Author

@k-jingyang I'm looking into the tests, it seems that tests are broken for some reason not related to this PR.

Yes, you are correct. However, the situation becomes more serious in the following scenario:

  • 3 node cluster
  • 1 node is down, 2 nodes are still up (node A leader, node B)
  • node A somehow think that it lost leadership
  • node B can still heartbeat to node A, so it doesn't call for an election)
  • node A tries to call for an election again, but pre-vote prevents it from doing so, as node B keeps rejecting it as per the bug

This was what we encountered. I didn't manage to dig into how node A thought that it lost leadership.
So even though a majority in the cluster (2 out of 3) is still up, there are no actual leaders

I'm not sure I understand? If node A think it's not a leader anymore it should stop sending heartbeats to all the other nodes which will lead to node B timing out the heartbeat and becoming a Candidate and eventually become a leader. Node B being a follower should not be sending heartbeats to node A, Am I missing something?

I believe you're right. My understanding of the issue could be slightly wrong. Let me dig into the logs to see why node B still thinks node A was the leader even though node A is a Candidate.

@k-jingyang
Copy link
Contributor Author

k-jingyang commented Aug 20, 2024

@dhiaayachi,

After digging through the logs, I don't have a definitive conclusion. You're right that only the leader should be heartbeat-ing to the follower, and once node A lost leadership, it should not heartbeat anymore.

What I can be sure of:

  1. node B rejected node A's pre-vote because it still thinks node A was the leader (the bug)
# 1 of our log line
raft: rejecting pre-vote request since we have a leader from:"" leader:"NODE_A_IP" leader-id:NODE_A
  1. Even as a Candidate, node A has logs regarding failed heartbeats to the downed node
# we log Stats() every second
# t=1
{..., "state":"Candidate","term":"1049"}

# t=2
raft: failed to heartbeat to peer=DOWNED_NODE backoff time:10m error:dial tcp DOWNED_NODE connect: connection refused
  • Since we don't seem to log successful heartbeats and heartbeat metrics has expired from our telemetry systems. I'm not sure if this meant that successful heartbeats were still ongoing to node B or not (even when node A is a Candidate). If so, this would explain why node B did not trigger an election. And if so, this would also indicate some form of bug or edge case

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k-jingyang thanks for helping us understand this. I also think this fix is worth merging and should be possible soon, but it's really useful to us to understand the impact of the issue it might cause to help us decide how urgently we need to get the fix tested and merged into our products etc!

In your logs above how far in wall-clock time apart are the t=1 and t=2 logs? The old leader when it switches to candidate state would shut down replication goroutines which are the things sending the heartbeats and logging that error. If they are milliseconds apart I suspect that this is just a timing thing and the replication threads just happened to try sending a heartbeat around the time the leader changed, failed but the error didn't get logged until after the switch to candidate. If there are many seconds passing while the old node is a candidate but still attempting multiple times to send heartbeats then yep that sounds more concerning. Would you be able to post your whole logs either here or privately if we provide an email address?

raft.go Show resolved Hide resolved
@k-jingyang
Copy link
Contributor Author

k-jingyang commented Aug 20, 2024

@banks

The time between when the node became a candidate and when it failed to heartbeat were more than 10s apart. Regarding logs, it's a bit difficult to export, and tbh, it wouldn't be very helpful without query support. Let me check if I can share an investigation doc.

Somehow, I also refuse to believe that the library has such an edge case without someone facing this given how widely used this is. 😕

Let me spend some time to try to understand the issue a bit deeper.

Copy link
Contributor

@dhiaayachi dhiaayachi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again @k-jingyang! The tests should be better now if you rebase the PR on top of the latest changes from main. I left few suggestions.

raft.go Show resolved Hide resolved
raft_test.go Show resolved Hide resolved
@k-jingyang
Copy link
Contributor Author

@dhiaayachi, you're welcome! I've addressed your comments! 😄

Copy link
Contributor

@dhiaayachi dhiaayachi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for the contribution @k-jingyang.

I will wait for @banks to resolve the remaining conversation and merge this!

@k-jingyang
Copy link
Contributor Author

Hey @banks, would it be possible to provide an email address so that I can share an investigation doc?

@banks
Copy link
Member

banks commented Aug 22, 2024 via email

@dhiaayachi dhiaayachi merged commit 497108f into hashicorp:main Aug 22, 2024
13 checks passed
@k-jingyang
Copy link
Contributor Author

@banks Thanks! I've sent the doc over.

@dhiaayachi Can I know if we have an estimated date for the v1.7.1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Follower rejecting leader's pre-vote
3 participants