From 4db2f53f3c52c7224e990269a10b30c0a94151c3 Mon Sep 17 00:00:00 2001 From: Tim Gross Date: Thu, 2 Dec 2021 13:56:16 -0500 Subject: [PATCH] documentation improvements from code review, changelog --- .changelog/11572.txt | 7 ++ website/content/docs/upgrade/index.mdx | 80 +++++++++++++++++++ .../content/docs/upgrade/upgrade-specific.mdx | 66 +++------------ 3 files changed, 96 insertions(+), 57 deletions(-) create mode 100644 .changelog/11572.txt diff --git a/.changelog/11572.txt b/.changelog/11572.txt new file mode 100644 index 000000000000..04921537b804 --- /dev/null +++ b/.changelog/11572.txt @@ -0,0 +1,7 @@ +```release-note:improvement +raft: The default raft protocol version is now 3. +``` + +```release-note:deprecation +Raft protocol version 2 is deprecated and will be removed in Nomad 1.4.0. +``` diff --git a/website/content/docs/upgrade/index.mdx b/website/content/docs/upgrade/index.mdx index fc16b1ce3c94..ec762aabc6cd 100644 --- a/website/content/docs/upgrade/index.mdx +++ b/website/content/docs/upgrade/index.mdx @@ -153,3 +153,83 @@ differences may require specific steps. [node-status]: /docs/commands/node/status [server-members]: /docs/commands/server/members [upgrade-specific]: /docs/upgrade/upgrade-specific + +## Upgrading to Raft Protocol 3 + +This section provides details on upgrading to Raft Protocol 3. Raft +protocol version 3 requires Nomad running 0.8.0 or newer on all +servers in order to work. Raft protocol version 2 will be removed in +Nomad 1.4.0. + +To see the version of the Raft protocol in use on each server, use the +`nomad operator raft list-peers` command. + +Note that the format of `peers.json` used for outage recovery is +different when running with the latest Raft protocol. See [Manual +Recovery Using +peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson) +for a description of the required format. + +When using Raft protocol version 3, servers are identified by their +`node-id` instead of their IP address when Nomad makes changes to its +internal Raft quorum configuration. This means that once a cluster has +been upgraded with servers all running Raft protocol version 3, it +will no longer allow servers running any older Raft protocol versions +to be added. + +### Upgrading a Production Cluster to Raft Version 3 + +For production raft clusters with 3 or more memebrs, the easiest way +to upgrade servers is to have each server leave the cluster, upgrade +its [`raft_protocol`] version in the `server` stanza, and then add it +back. Make sure the new server joins successfully and that the cluster +is stable before rolling the upgrade forward to the next server. It's +also possible to stand up a new set of servers, and then slowly stand +down each of the older servers in a similar fashion. + +For in-place raft protocol upgrades, perform the following for each +server, leaving the leader until last to reduce the chance of leader +elections that will slow down the process: + +* Stop the server +* Run `nomad server force-leave $server_name` +* Update the `raft_protocol` in the server's configuration file to 3. +* Restart the server +* Run `nomad operator raft list-peers` to verify that the `raft_vsn` + for the server is now 3. +* On the server, run `nomad agent-info` and check that the + `last_log_index` is of a similar value to the other servers. This + step ensures that raft is healthy and changes are replicating to the + new server. + +### Upgrading a Single Server Cluster to Raft Version 3 + +If you are running a single Nomad server, restarting it in-place will +result in that server not being able to elect itself as a leader. To +avoid this, create a new [`raft.peers`][peers-json] file before +restarting the server with the new configuration. If you have `jq` +installed you can run the following script on the server's host to +write the correct `raft.peers` file: + +``` +#!/usr/bin/env bash + +NOMAD_DATA_DIR=$(nomad agent-info -json | jq -r '.config.DataDir') +NOMAD_ADDR=$(nomad agent-info -json | jq -r '.stats.nomad.leader_addr') +NODE_ID=$(cat "$NOMAD_DATA_DIR/server/node-id") + +cat < "$NOMAD_DATA_DIR/server/raft/peers.json" +[ + { + "id": "$NODE_ID", + "address": "$NOMAD_ADDR", + "non_voter": false + } +] +EOF +``` + +After running this script, update the `raft_protocol` in the server's +configuration to 3 and restart the server. + +[peers-json]: https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson diff --git a/website/content/docs/upgrade/upgrade-specific.mdx b/website/content/docs/upgrade/upgrade-specific.mdx index c0bfdc7e49bc..75ec5e9c1a50 100644 --- a/website/content/docs/upgrade/upgrade-specific.mdx +++ b/website/content/docs/upgrade/upgrade-specific.mdx @@ -15,12 +15,15 @@ used to document those details separately from the standard upgrade flow. ## Nomad 1.3.0 -#### Default Raft Protocol Version +#### Raft Protocol Version 2 Deprecation -In Nomad 1.3.0, the default raft protocol version has been updated -to 3. If the [`raft_protocol_version`] is not explicitly set, -upgrading a server will automatically upgrade that server's raft -protocol. See the [Upgrading to Raft Protocol 3] guide below. +Raft protocol version 2 will be removed from Nomad in the next major +release of Nomad, 1.4.0. + +In Nomad 1.3.0, the default raft protocol version has been updated to +3. If the [`raft_protocol_version`] is not explicitly set, upgrading a +server will automatically upgrade that server's raft protocol. See the +[Upgrading to Raft Protocol 3] guide. ## Nomad 1.2.2 @@ -973,57 +976,6 @@ In order to enable all servers in a Nomad cluster must be running with Raft protocol version 3 or later. -#### Upgrading to Raft Protocol 3 - -This section provides details on upgrading to Raft Protocol 3 in Nomad 0.8 and -higher. Raft protocol version 3 requires Nomad running 0.8.0 or newer on all -servers in order to work. See [Raft Protocol Version -Compatibility](/docs/upgrade/upgrade-specific#raft-protocol-version-compatibility) -for more details. Also the format of `peers.json` used for outage recovery is -different when running with the latest Raft protocol. See [Manual Recovery Using -peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson) -for a description of the required format. - -Please note that the Raft protocol is different from Nomad's internal protocol -as shown in commands like `nomad server members`. To see the version of the Raft -protocol in use on each server, use the `nomad operator raft list-peers` -command. - -When using Raft protocol version 3, servers are identified by their `node-id` -instead of their IP address when Nomad makes changes to its internal Raft quorum -configuration. This means that once a cluster has been upgraded with servers all -running Raft protocol version 3, it will no longer allow servers running any -older Raft protocol versions to be added. - -~> **Warning:** If you are running a single Nomad server, restarting it -in-place will result in that server not being able to elect itself as -a leader. To avoid this, either set the Raft protocol back to 2, or -use [Manual Recovery Using -peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson) -to map the server to its node ID in the Raft quorum configuration. - -The easiest way to upgrade servers is to have each server leave the cluster, -upgrade its [`raft_protocol`] version in the `server` stanza, and then add it -back. Make sure the new server joins successfully and that the cluster is stable -before rolling the upgrade forward to the next server. It's also possible to -stand up a new set of servers, and then slowly stand down each of the older -servers in a similar fashion. - -For in-place raft protocol upgrades, perform the following for each -server, leaving the leader until last to reduce the chance of leader -elections that will slow down the process: - -* Stop the server -* Run `nomad server force-leave $server_name` -* Update the `raft_protocol` in the server's configuration file to 3. -* Restart the server -* Run `nomad operator raft list-peers` to verify that the `raft_vsn` - for the server is now 3. -* On the server, run `nomad agent-info` and check that the - `last_log_index` is of a similar value to the other servers. This - step ensures that raft is healthy and changes are replicating to the - new server. - ### Node Draining Improvements Node draining via the [`node drain`][drain-cli] command or the [drain @@ -1243,4 +1195,4 @@ deleted and then Nomad 0.3.0 can be launched. [cap_add_exec]: /docs/drivers/exec#cap_add [cap_drop_exec]: /docs/drivers/exec#cap_drop [`log_file`]: /docs/configuration#log_file -[Upgrading to Raft Protocol 3]: /docs/upgrade/upgrade-specific#upgrading-to-raft-protocol-3 +[Upgrading to Raft Protocol 3]: /docs/upgrade#upgrading-to-raft-protocol-3