Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leadership transfer cmd #14132

Merged
merged 49 commits into from
Nov 14, 2022
Merged

Leadership transfer cmd #14132

merged 49 commits into from
Nov 14, 2022

Conversation

dhiaayachi
Copy link
Collaborator

Description

This is an attempt to implement a command that would trigger a raft leadership transfer.
The idea about this trace back to a customer issue where we needed to trigger leader transfer to be able to upgrade a leader node.

This dependent to a change in raft that fix some of the gaps in leadership transfer, those gaps would make the transfer fail. hashicorp/raft#487

I made an attempt in naming the sub command but open to any suggestions.

Testing & Reproduction steps

Added tests:

  • RPC call without and with ACL
  • API call negative test (with one agent)
  • CMD call negative test (with one agent)

@dhiaayachi dhiaayachi changed the title Leadership transfer cmd 2 Leadership transfer cmd Aug 10, 2022
@github-actions github-actions bot added theme/api Relating to the HTTP API interface theme/cli Flags and documentation for the CLI interface labels Aug 10, 2022
@dhiaayachi dhiaayachi requested a review from mkeeler August 10, 2022 19:40
@dhiaayachi
Copy link
Collaborator Author

@mkeeler I undusted this PR, if you happen to have some spare time. I had to create a new one and close #12182 because of some git weirdness but I went though all your comments

Could we add one more test for OSS to verify the ACL enforcement and one enterprise test to verify that validateEnterpriseToken enforcement

Added an OSS test I will add one for enterprise after it get merged.

Instead of structs.DCSpecificRequest could we use a new request type which also includes an optional parameter representing the raft.ServerID of a target server to transfer leadership to.

Added the ID to the request and threaded through to the API and the CLI.

We could call LeadershipTransfer on Raft itself. This other function is no longer necessary as every 1.9+ cluster will unconditionally use leadership transfers for autopilot purposes so we can rely on it being available here.

I changed attemptLeadershipTransfer to be able to call the raft API with or without an ID. I left the version check, I can remove it if you think it's not useful anymore

@jkirschner-hashicorp
Copy link
Contributor

@dhiaayachi : what are your thoughts on aligning the naming of this command more with the existing Vault name for similar functionality: step-down rather than leader-transfer?

Vault provides an operator-level CLI command and HTTP API endpoint for forcing the current leader to gracefully step down. Someone familiar with this capability in Vault might look for something with the same name in Consul (internally, at least one person has asked if Consul had a similar "step down" capability).

That said, I can see the advantages of consul operator raft leader-transfer instead of consul operator raft step-down. Semantically, it's not raft that's stepping down... the leader is stepping down. Perhaps we could keep the name leader-transfer but, in the docs, mention that this is analogous to vault operator step-down in Vault?

Just wanted to share these thoughts with you to consider.


Side question: when running this command, can the leader immediately be re-elected? Or is a different server agent always elected (if available)?

Copy link
Contributor

@jkirschner-hashicorp jkirschner-hashicorp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some thoughts / optional suggestions.

Thanks for seeing this through, @dhiaayachi !

command/registry.go Outdated Show resolved Hide resolved
.changelog/14132.txt Show resolved Hide resolved
Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code generally looks good.

The biggest concern is that we would be adding additional net-new RPC endpoints that are not gRPC. I would love to see protobuf definitions for the request/response types and a gRPC service. The HTTP/cli bits as well as the internal attemptLeaderTransfer code shouldn't need alteration.

agent/consul/operator_raft_endpoint_test.go Outdated Show resolved Hide resolved
agent/consul/server.go Outdated Show resolved Hide resolved
agent/structs/structs.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jkirschner-hashicorp jkirschner-hashicorp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhiaayachi : left a few comments on the docs pages. Thanks!

website/content/api-docs/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/api-docs/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/api-docs/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/api-docs/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/commands/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/commands/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/commands/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/commands/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/commands/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/commands/operator/raft.mdx Outdated Show resolved Hide resolved
dhiaayachi and others added 3 commits October 31, 2022 12:36
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
@dhiaayachi dhiaayachi added the type/docs Documentation needs to be created/updated/clarified label Oct 31, 2022
Copy link
Contributor

@boruszak boruszak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the edits to api-docs/operator/raft.mdx and commands/operator/raft.mdx`. Please implement the suggestions - and be sure to align both pages to either "node id" or "server id."

Please let me know if you need any additional review or assistance.

Approving on behalf of consul-docs.

website/content/api-docs/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/api-docs/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/api-docs/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/api-docs/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/api-docs/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/api-docs/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/commands/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/commands/operator/raft.mdx Outdated Show resolved Hide resolved
website/content/commands/operator/raft.mdx Outdated Show resolved Hide resolved
dhiaayachi and others added 2 commits November 14, 2022 11:39
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
@dhiaayachi dhiaayachi merged commit 225ae55 into main Nov 14, 2022
@dhiaayachi dhiaayachi deleted the leadership-transfer-cmd-2 branch November 14, 2022 20:35
@jkirschner-hashicorp
Copy link
Contributor

The leadership transfer command will be available starting in Consul 1.15.0!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/api Relating to the HTTP API interface theme/cli Flags and documentation for the CLI interface type/docs Documentation needs to be created/updated/clarified
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants