Default to raft protocol 3, handling the upgrade from protocol 2 #7208

langmartin · 2020-02-20T18:55:40Z

Background

Raft protocol 3 is a requirement for autopilot, so we should default
to it in 1.0. We should avoid putting too much effort into managing
this protocol upgrade, it's unlikely to be repeated. We should put
enough effort into it that upgrading to 1.0 should not be unusually
manual or dangerous.

The Bug

Follower B is stopped
Follower C is stopped
B' is restarted on version 1.0
It's serf tags are gossiped to the leader A
A removes B. The future returns when the configuration change is committed
A adds B'
A is stopped
C' is restarted. C' only knows about servers {A, B} both of which are now gone

The Fix

The bug won't happen if a rolling upgrade waits until the previous
membership change has been committed and then next upgrades a machine
that has an up-to-date log. For simplicity of tooling, we could just
wait until all cluster members have caught up before marking the
cluster safe to upgrade.

There are two implementation options: provide operator tooling to
allow the cluster administrator to safely perform a manual rolling
upgrade, or handle the protocol upgrade internally so that no
additional operator burden is imposed.

Operator Tooling

We could bundle a tool with 1.0 that checks the commit index of every
machine in the cluster to determine if it's safe to perform the next
upgrade. This could just be a query that hits every server in the
cluster to ensure that the index changing the configuration has been
accepted on all servers in the cluster. In an ordinary operator
upgrade where the servers are updated with a rolling upgrade, this
check will be true in less time than it takes for stats dashboards to
catch up, so a typical operator would just experience a fairly simple
but necessary check at every step of the upgrade process.

This check could be generalized to be a new requirement for the
upgrade process, where it may be a good place to ship data
version/consistency preflight checks in the future.

The main risk of operator tooling is that this 1.0 release, which
communicates stability, will require a new manual upgrade step and
could cause a cluster to need to be recovered if upgraded too quickly.

Automatic Upgrade

upgrade all the servers to 1.0
the leader waits until ServersMeetMinimumVersion shows that all servers are running 1.0
the leader calls new UpgradeProtocol RPC on a follower
the leader's raft library commits to raft a RemoveServer configuration change for that follower
that follower receives the RPC and
1. disconnects
2. upgrades it's raft instance
3. rejoins the cluster
the leader adds the protocol 3 instance to the raft cluster
the leader waits until follower has its log up to date
the leader repeats 3-7 for all remaining protocol 2 followers
the leader removes itself from the cluster and upgrades locally

More notes:

the raft protocol 2 instance must leave the cluster first to avoid
the illegal state of two raft servers having the same Addr
the raft protocol 3 instance is added to the raft cluster with
raft.AddStaging, but that method is a stub that just adds does an
AddVoter (there's a todo comment in the raft libarary). Checking
the instance's log index is the only way to ensure that it's caught
up.
waiting for follower to become integrated with the cluster before
moving on to the next server is necessary to prevent RemoveServer
from decreasing the quorum size. We want to keep the quorum size
constant while upgrading.

The text was updated successfully, but these errors were encountered:

github-actions · 2022-10-11T02:43:58Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

langmartin modified the milestone: 0.11.0 Feb 20, 2020

langmartin added type/enhancement theme/raft labels Feb 20, 2020

notnoop mentioned this issue Jul 20, 2020

Step down leadership on establishLeader failures #8470

Closed

tgross added this to Needs Roadmapping in Nomad - Community Issues Triage Feb 12, 2021

tgross mentioned this issue Mar 3, 2021

API/command to initiate graceful change in leadership #7376

Closed

tgross removed this from Needs Roadmapping in Nomad - Community Issues Triage Mar 4, 2021

tgross mentioned this issue Nov 24, 2021

raft: default to protocol v3 #11572

Merged

tgross added this to the 1.3.0 milestone Nov 24, 2021

tgross self-assigned this Nov 24, 2021

tgross closed this as completed in #11572 Feb 3, 2022

github-actions bot locked as resolved and limited conversation to collaborators Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default to raft protocol 3, handling the upgrade from protocol 2 #7208

Default to raft protocol 3, handling the upgrade from protocol 2 #7208

langmartin commented Feb 20, 2020 •

edited

Loading

github-actions bot commented Oct 11, 2022

Default to raft protocol 3, handling the upgrade from protocol 2 #7208

Default to raft protocol 3, handling the upgrade from protocol 2 #7208

Comments

langmartin commented Feb 20, 2020 • edited Loading

Background

The Bug

The Fix

Operator Tooling

Automatic Upgrade

github-actions bot commented Oct 11, 2022

langmartin commented Feb 20, 2020 •

edited

Loading