Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAULT-14048: raft-autopilot appears to refuse to remove a node which has left and wouldn't impact stability #19472

Merged
merged 27 commits into from
Apr 3, 2023

Conversation

peteski22
Copy link

@peteski22 peteski22 commented Mar 7, 2023

This bug was detected during manual testing for the 1.13.0 release and as a precaution the following PRs were reverted:

After some investigation the problem seems to be that the NodeType isn't being supplied by Vault when autopilot asks it for its KnownServers during pruning. As a result there are no potential voters flagged in the voter registry, we have a maxRemoval which is < 1 which triggers the debug message about removing a majority of voting servers being unsafe and autopilot refuses to remove the failed voting server.

This PR adds the NodeType for the leader and, where possible, for the follower nodes. It also restores the autopilot version to v0.2.0.

Files of note:

  • go.mod
  • physical/raft/raft_autopilot.go
  • vault/external_tests/raft/raft_autopilot_test.go

Most other changed files are due to exporting GetRaftBackend.

Tests added:

  • TestDelegate_KnownServers_NodeType (Ent)
  • TestRaft_Autopilot_DeadServerCleanup

Following a successfully merged PR I will raise an Enterprise specific PR to update the packages and add another test.

@peteski22 peteski22 added this to the 1.13.1 milestone Mar 7, 2023
@peteski22 peteski22 added bug Used to indicate a potential bug core Issues and Pull-Requests specific to Vault Core labels Mar 7, 2023
Copy link
Collaborator

@ncabatoff ncabatoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the absence of a new test because we don't think it's necessary, or because we're not sure how to write one?

changelog/19472.txt Outdated Show resolved Hide resolved
physical/raft/raft_autopilot.go Outdated Show resolved Hide resolved
@peteski22 peteski22 self-assigned this Mar 14, 2023
@peteski22 peteski22 removed this from the 1.13.1 milestone Mar 24, 2023
@peteski22 peteski22 marked this pull request as ready for review March 24, 2023 20:45
@peteski22 peteski22 requested review from a team, jasonodonnell and raskchanky March 24, 2023 20:46
@peteski22 peteski22 added this to the 1.13.2 milestone Mar 24, 2023
@kubawi kubawi requested a review from a team as a code owner March 27, 2023 10:15
@peteski22 peteski22 requested a review from a team March 27, 2023 10:44
helper/testhelpers/testhelpers_oss.go Outdated Show resolved Hide resolved
physical/raft/raft_autopilot.go Show resolved Hide resolved
vault/external_tests/raft/raft_autopilot_test.go Outdated Show resolved Hide resolved
@peteski22 peteski22 requested a review from a team March 28, 2023 16:40
@peteski22 peteski22 enabled auto-merge (squash) April 3, 2023 13:33
physical/raft/raft_autopilot.go Outdated Show resolved Hide resolved
physical/raft/raft_autopilot.go Show resolved Hide resolved
physical/raft/raft_autopilot.go Outdated Show resolved Hide resolved
vault/external_tests/raft/raft_autopilot_test.go Outdated Show resolved Hide resolved
vault/external_tests/raft/raft_autopilot_test.go Outdated Show resolved Hide resolved
vault/raft.go Outdated Show resolved Hide resolved
@peteski22 peteski22 merged commit 2054ffc into main Apr 3, 2023
@peteski22 peteski22 deleted the VAULT-14048/fix-autopilot-refuse-removal branch April 3, 2023 15:59
@peteski22 peteski22 restored the VAULT-14048/fix-autopilot-refuse-removal branch April 29, 2023 13:23
@peteski22 peteski22 modified the milestones: 1.13.2, 1.13.3 Apr 29, 2023
@peteski22 peteski22 deleted the VAULT-14048/fix-autopilot-refuse-removal branch May 2, 2023 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug core Issues and Pull-Requests specific to Vault Core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants