Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: default to protocol v3 #11572

Merged
merged 3 commits into from
Feb 3, 2022
Merged

raft: default to protocol v3 #11572

merged 3 commits into from
Feb 3, 2022

Conversation

tgross
Copy link
Member

@tgross tgross commented Nov 24, 2021

DO NOT MERGE TILL 1.3.0

Many of Nomad's Autopilot features require raft protocol version 3. Set the default raft protocol to 3, and improve the upgrade documentation.

Fixes #7208. Our concerns about the corner case of a server which is shut down during the upgrade can be alleviated by operators upgrading one server at a time, as we generally recommend for updates. I've tentatively marked this issue for 1.3.0 because it seems like it's got enough risk associated with it that it should be in a major update.


I've tested out the docs by using the 3-server cluster described in our Vagrant file, with the following configuration file (replace the IPs for server02 and server03 as usual):

server config
log_level  = "debug"
data_dir   = "/var/nomad/data"
bind_addr  = "192.168.56.11"
plugin_dir = "/opt/nomad/plugins"

server {
  enabled          = true
  bootstrap_expect = 3
  # raft_protocol    = 2

  server_join {
    retry_join = ["192.168.56.12", "192.168.56.13"]
    retry_interval = "10s"
  }

}

client {
  enabled = false
}

acl {
  enabled = true
}

Note all server names:

$ nomad server members
Name                   Address        Port  Status  Leader  Protocol  Build  Datacenter  Region
nomad-server01.global  192.168.56.11  4648  alive   false   2         1.2.1  dc1         global
nomad-server02.global  192.168.56.12  4648  alive   true    2         1.2.1  dc1         global
nomad-server03.global  192.168.56.13  4648  alive   false   2         1.2.1  dc1         global

For server01:

  • comment out config file on server01 raft_protocol = 2, so that it's now set to the default 3
  • on server01: sudo systemctl stop nomad
  • nomad server force-leave nomad-server01.global
  • on server01: sudo systemctl start nomad
  • verify the raft version on this node:
curl -H "X-Nomad-Token: $NOMAD_TOKEN" -s "$NOMAD_ADDR/v1/agent/members" | jq -r '.Members[] | select(.Addr == "$SERVER_01_ADDR") | .Tags.raft_vsn'
3

Also note that server raft configuration has changed to show UUID for member ID:

$ curl -H "X-Nomad-Token: $NOMAD_TOKEN" -s "$NOMAD_ADDR/v1/agent/self" | jq -r .stats.raft.latest_configuration
[{Suffrage:Voter ID:192.168.56.12:4647 Address:192.168.56.12:4647} {Suffrage:Voter ID:192.168.56.13:4647 Address:192.168.56.13:4647} {Suffrage:Voter ID:3f1e8921-f376-e791-18d8-5ee2ce6b99a1 Address:192.168.56.11:4647}]

Repeat for the other two servers. Once we're all done:

$ curl -H "X-Nomad-Token: $NOMAD_TOKEN" -s "$NOMAD_ADDR/v1/agent/self" | jq -r .stats.raft.latest_configuration
[{Suffrage:Voter ID:192.168.56.13:4647 Address:192.168.56.13:4647} {Suffrage:Voter ID:3f1e8921-f376-e791-18d8-5ee2ce6b99a1 Address:192.168.56.11:4647} {Suffrage:Voter ID:d33ed365-d8d6-30aa-ff1e-b096fb75e66f Address:192.168.56.12:4647}]

@tgross tgross added this to the 1.3.0 milestone Nov 24, 2021
@tgross tgross self-assigned this Nov 24, 2021
Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a minor inline comment but I am not sure if it's important, otherwise, LGTM!

website/content/docs/upgrade/upgrade-specific.mdx Outdated Show resolved Hide resolved
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concern about the single server case, but I think we can move forward. This is way overdue and putting it off is only going to make things worse if we want to improve the standalone server case.

website/content/docs/upgrade/upgrade-specific.mdx Outdated Show resolved Hide resolved
tgross and others added 3 commits February 3, 2022 14:29
Many of Nomad's Autopilot features require raft protocol version
3. Set the default raft protocol to 3, and improve the upgrade
documentation.
@tgross
Copy link
Member Author

tgross commented Feb 3, 2022

I've just rebased this on main and will merge once CI is ✅

@tgross tgross merged commit e3009f1 into main Feb 3, 2022
@tgross tgross deleted the raft_default_v3 branch February 3, 2022 20:03
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default to raft protocol 3, handling the upgrade from protocol 2
3 participants