Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: default to protocol v3 #11572

Merged
merged 3 commits into from
Feb 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changelog/11572.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
```release-note:improvement
raft: The default raft protocol version is now 3.
```

```release-note:deprecation
Raft protocol version 2 is deprecated and will be removed in Nomad 1.4.0.
```
1 change: 1 addition & 0 deletions command/agent/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -953,6 +953,7 @@ func DefaultConfig() *Config {
Enabled: false,
EnableEventBroker: helper.BoolToPtr(true),
EventBufferSize: helper.IntToPtr(100),
RaftProtocol: 3,
StartJoin: []string{},
ServerJoin: &ServerJoin{
RetryJoin: []string{},
Expand Down
2 changes: 1 addition & 1 deletion website/content/docs/configuration/server.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ server {
required as the agent internally knows the latest version, but may be useful
in some upgrade scenarios.

- `raft_protocol` `(int: 2)` - Specifies the Raft protocol version to use when
- `raft_protocol` `(int: 3)` - Specifies the Raft protocol version to use when
communicating with other Nomad servers. This affects available Autopilot
features and is typically not required as the agent internally knows the
latest version, but may be useful in some upgrade scenarios.
Expand Down
80 changes: 80 additions & 0 deletions website/content/docs/upgrade/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -153,3 +153,83 @@ differences may require specific steps.
[node-status]: /docs/commands/node/status
[server-members]: /docs/commands/server/members
[upgrade-specific]: /docs/upgrade/upgrade-specific

## Upgrading to Raft Protocol 3

This section provides details on upgrading to Raft Protocol 3. Raft
protocol version 3 requires Nomad running 0.8.0 or newer on all
servers in order to work. Raft protocol version 2 will be removed in
Nomad 1.4.0.

To see the version of the Raft protocol in use on each server, use the
`nomad operator raft list-peers` command.

Note that the format of `peers.json` used for outage recovery is
different when running with the latest Raft protocol. See [Manual
Recovery Using
peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
for a description of the required format.

When using Raft protocol version 3, servers are identified by their
`node-id` instead of their IP address when Nomad makes changes to its
internal Raft quorum configuration. This means that once a cluster has
been upgraded with servers all running Raft protocol version 3, it
will no longer allow servers running any older Raft protocol versions
to be added.

### Upgrading a Production Cluster to Raft Version 3

For production raft clusters with 3 or more memebrs, the easiest way
to upgrade servers is to have each server leave the cluster, upgrade
its [`raft_protocol`] version in the `server` stanza, and then add it
back. Make sure the new server joins successfully and that the cluster
is stable before rolling the upgrade forward to the next server. It's
also possible to stand up a new set of servers, and then slowly stand
down each of the older servers in a similar fashion.

For in-place raft protocol upgrades, perform the following for each
server, leaving the leader until last to reduce the chance of leader
elections that will slow down the process:

* Stop the server
* Run `nomad server force-leave $server_name`
* Update the `raft_protocol` in the server's configuration file to 3.
* Restart the server
* Run `nomad operator raft list-peers` to verify that the `raft_vsn`
for the server is now 3.
* On the server, run `nomad agent-info` and check that the
`last_log_index` is of a similar value to the other servers. This
step ensures that raft is healthy and changes are replicating to the
new server.

### Upgrading a Single Server Cluster to Raft Version 3

If you are running a single Nomad server, restarting it in-place will
result in that server not being able to elect itself as a leader. To
avoid this, create a new [`raft.peers`][peers-json] file before
restarting the server with the new configuration. If you have `jq`
installed you can run the following script on the server's host to
write the correct `raft.peers` file:

```
#!/usr/bin/env bash

NOMAD_DATA_DIR=$(nomad agent-info -json | jq -r '.config.DataDir')
NOMAD_ADDR=$(nomad agent-info -json | jq -r '.stats.nomad.leader_addr')
NODE_ID=$(cat "$NOMAD_DATA_DIR/server/node-id")

cat <<EOF > "$NOMAD_DATA_DIR/server/raft/peers.json"
[
{
"id": "$NODE_ID",
"address": "$NOMAD_ADDR",
"non_voter": false
}
]
EOF
```

After running this script, update the `raft_protocol` in the server's
configuration to 3 and restart the server.

[peers-json]: https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson
51 changes: 16 additions & 35 deletions website/content/docs/upgrade/upgrade-specific.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,18 @@ upgrade. However, specific versions of Nomad may have more details provided for
their upgrades as a result of new features or changed behavior. This page is
used to document those details separately from the standard upgrade flow.

## Nomad 1.3.0

#### Raft Protocol Version 2 Deprecation

Raft protocol version 2 will be removed from Nomad in the next major
release of Nomad, 1.4.0.

In Nomad 1.3.0, the default raft protocol version has been updated to
3. If the [`raft_protocol_version`] is not explicitly set, upgrading a
server will automatically upgrade that server's raft protocol. See the
[Upgrading to Raft Protocol 3] guide.

## Nomad 1.2.4

#### `nomad eval status -json` deprecated
Expand Down Expand Up @@ -959,7 +971,7 @@ will be interpolated properly. Please see the
### Raft Protocol Version Compatibility

When upgrading to Nomad 0.8.0 from a version lower than 0.7.0, users will need
to set the [`raft_protocol`](/docs/configuration/server#raft_protocol) option in
to set the [`raft_protocol`] option in
their `server` stanza to 1 in order to maintain backwards compatibility with the
old servers during the upgrade. After the servers have been migrated to version
0.8.0, `raft_protocol` can be moved up to 2 and the servers restarted to match
Expand Down Expand Up @@ -997,40 +1009,6 @@ In order to enable all
servers in a Nomad cluster must be running with Raft protocol version 3 or
later.

#### Upgrading to Raft Protocol 3

This section provides details on upgrading to Raft Protocol 3 in Nomad 0.8 and
higher. Raft protocol version 3 requires Nomad running 0.8.0 or newer on all
servers in order to work. See [Raft Protocol Version
Compatibility](/docs/upgrade/upgrade-specific#raft-protocol-version-compatibility)
for more details. Also the format of `peers.json` used for outage recovery is
different when running with the latest Raft protocol. See [Manual Recovery Using
peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
for a description of the required format.

Please note that the Raft protocol is different from Nomad's internal protocol
as shown in commands like `nomad server members`. To see the version of the Raft
protocol in use on each server, use the `nomad operator raft list-peers`
command.

The easiest way to upgrade servers is to have each server leave the cluster,
upgrade its `raft_protocol` version in the `server` stanza, and then add it
back. Make sure the new server joins successfully and that the cluster is stable
before rolling the upgrade forward to the next server. It's also possible to
stand up a new set of servers, and then slowly stand down each of the older
servers in a similar fashion.

When using Raft protocol version 3, servers are identified by their `node-id`
instead of their IP address when Nomad makes changes to its internal Raft quorum
configuration. This means that once a cluster has been upgraded with servers all
running Raft protocol version 3, it will no longer allow servers running any
older Raft protocol versions to be added. If running a single Nomad server,
restarting it in-place will result in that server not being able to elect itself
as a leader. To avoid this, either set the Raft protocol back to 2, or use
[Manual Recovery Using
peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
to map the server to its node ID in the Raft quorum configuration.

### Node Draining Improvements

Node draining via the [`node drain`][drain-cli] command or the [drain
Expand Down Expand Up @@ -1224,6 +1202,8 @@ deleted and then Nomad 0.3.0 can be launched.
[preemption]: /docs/internals/scheduling/preemption
[proxy_concurrency]: /docs/job-specification/sidecar_task#proxy_concurrency
[`sidecar_task.config`]: /docs/job-specification/sidecar_task#config
[raft protocol version]: /docs/configuration/server#raft_protocol
[`raft protocol`]: /docs/configuration/server#raft_protocol
[reserved]: /docs/configuration/client#reserved-parameters
[task-config]: /docs/job-specification/task#config
[tls-guide]: https://learn.hashicorp.com/tutorials/nomad/security-enable-tls
Expand All @@ -1248,3 +1228,4 @@ deleted and then Nomad 0.3.0 can be launched.
[cap_add_exec]: /docs/drivers/exec#cap_add
[cap_drop_exec]: /docs/drivers/exec#cap_drop
[`log_file`]: /docs/configuration#log_file
[Upgrading to Raft Protocol 3]: /docs/upgrade#upgrading-to-raft-protocol-3