-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrate autopilot implementation to raft-autopilot #14441
Conversation
Nomad's original autopilot was importing from a private package in Consul. It has been moved out to a shared library. Switch Nomad to use this library so that we can eliminate the import of Consul, which is necessary to build Nomad ENT with the current version of the Consul SDK. This also will let us pick up autopilot improvements shared with Consul more easily.
7d59d25
to
0163f19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! It may be good to test (if you haven't yet) interoperability between servers using the old Consul autopilot and this one, like it would happen during a rolling cluster upgrade.
@@ -40,7 +40,6 @@ require ( | |||
github.com/gorilla/websocket v1.5.0 | |||
github.com/gosuri/uilive v0.0.4 | |||
github.com/grpc-ecosystem/go-grpc-middleware v1.3.0 | |||
github.com/hashicorp/consul v1.7.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
// immediately so we'll spawn a goroutine for it.) | ||
func (d *AutopilotDelegate) RemoveFailedServer(failedSrv *autopilot.Server) { | ||
go func() { | ||
err := d.server.RemoveFailedNode(failedSrv.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Name
here the agent's name
? Would it be possible to remove by ID instead in case the cluster has servers with duplicate names (which I think is allow, though not a good idea 😅)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the serf.Member.Name
, which I think is the same thing. I originally had this by ID (which is what Consul does) but our existing RemoveFailedNode
code only accepts the name and not the ID. I didn't want to change the behavior here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -40,7 +40,6 @@ require ( | |||
github.com/gorilla/websocket v1.5.0 | |||
github.com/gosuri/uilive v0.0.4 | |||
github.com/grpc-ecosystem/go-grpc-middleware v1.3.0 | |||
github.com/hashicorp/consul v1.7.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳
// TODO: replace this with our own helper | ||
"github.com/hashicorp/consul/sdk/testutil/retry" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's on my list 😞
Deployment failed with the following error:
|
Deployment failed with the following error:
|
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Nomad's original autopilot was importing from a private package in Consul. It
has been moved out to a shared library. Switch Nomad to use this library so that
we can eliminate the import of Consul, which is necessary to build Nomad ENT
with the current version of the Consul SDK. This also will let us pick up
autopilot improvements shared with Consul more easily.
Fixes #9570 (associated with https://github.com/hashicorp/nomad-enterprise/pull/836)