Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step down leadership on establishLeader failures #8470

Closed
notnoop opened this issue Jul 20, 2020 · 3 comments · Fixed by #12293
Closed

Step down leadership on establishLeader failures #8470

notnoop opened this issue Jul 20, 2020 · 3 comments · Fixed by #12293
Assignees
Milestone

Comments

@notnoop
Copy link
Contributor

notnoop commented Jul 20, 2020

Raft currently manages nomad cluster leadership: Raft conducts the leadership election and then informs nomad when leadership is gained or lost. This works very well the vast majority of the time. However, if nomad fails unexpectedly in establishLeadership method, it's possible that raft believes a server to be the leader, while the nomad server does not perform its leadership duties, and the cluster is wedged. It's possible to fix the cluster by terminating the server to force an election.

On failure, the leadership loop should instead force a step down. The raft library offers a raft.LeadershipTransfer, which nomad can use in these cases. The documentation indicates that we need to upgrade to Raft 3 as well, but it's not very clear. (See #7208 for related issue on defaulting to Raft v3)

Consul hit the same issue: it's noted in hashicorp/consul#5047 and fixed in hashicorp/consul#5247

@ketzacoatl
Copy link
Contributor

Has this gotten triage review or is the raftv3 migration on the roadmap in any capacity?

@lgfa29 lgfa29 self-assigned this Mar 14, 2022
@lgfa29 lgfa29 added this to the 1.3.0 milestone Mar 14, 2022
@lgfa29
Copy link
Contributor

lgfa29 commented Mar 14, 2022

Hi @ketzacoatl 👋

Yes, the plan is to have this in the next Nomad release.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants