Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul/connect: dynamically select envoy sidecar at runtime #8945

Merged
merged 1 commit into from
Oct 13, 2020

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented Sep 22, 2020

As newer versions of Consul are released, the minimum version of Envoy
it supports as a sidecar proxy also gets bumped. Starting with the upcoming
Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current
versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the
default implementation of Connect sidecar proxy.

This PR introduces a change such that each Nomad Client will query its
local Consul for a list of Envoy proxies that it supports (hashicorp/consul#8545)
and then launch the Connect sidecar proxy task using the latest supported version
of Envoy. If the SupportedProxies API component is not available from
Consul, Nomad will fallback to the old version of Envoy supported by old
versions of Consul.

Setting the meta configuration option meta.connect.sidecar_image or
setting the connect.sidecar_task stanza will take precedence as is
the current behavior for sidecar proxies.

Setting the meta configuration option meta.connect.gateway_image
will take precedence as is the current behavior for connect gateways.

meta.connect.sidecar_image and meta.connect.gateway_image may make
use of the special ${NOMAD_envoy_version} variable interpolation, which
resolves to the newest version of Envoy supported by the Consul agent.

Addresses #8585 #7665

@shoenig
Copy link
Member Author

shoenig commented Sep 22, 2020

TODO

  • e2e testing
  • changelog entry

@shoenig shoenig marked this pull request as ready for review September 22, 2020 20:22
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only comments are just dreaming up possible rough edges. Not blockers

// Set the resulting image.
h.logger.Trace("setting task envoy image", "image", image)
request.Task.Config["image"] = image
response.Done = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should actually skip this to ensure we re-run it in the event Envoy fails? Seems cheap enough we might as well.

Bootstrap has a similar problem though, so it might not be worth worrying about here if there's a chance we're just going to fail later due to bootstrap changes not being picked up or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the opposite - if Envoy gets restarted the generated envoy config and the envoy version that gets launched should continue to work together. It's kind of interesting in that Envoy will try twice - interpreting the config at versions N and N-1, but it'd be more consistent to just have the version and bootstrap hooks behave the same. OTOH, maybe now it would make sense to query the version and regenerate the config on every sidecar task restart.


// We either need to acquire Consul's preferred Envoy version or fallback
// to the legacy default. Query Consul and use the (possibly empty) result.
proxies, err := h.proxiesClient.Proxies()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Consul is being restarted when this is called it will fail the task and go into a restart loop which... seems ok?

In configurations where Consul does not start before Nomad there could be spurious task failures, but I think that's already true for Connect? Bootstrap has that awkward bulitin retry that avoids spurious task failures, but it's not a very robust approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The envoy bootstrap hook's retry block I believe really targets the race condition between launching the sidecar and Nomad actually registering the service to be proxied - potentially a common occurrence. I'm thinking /v1/agent/self response failures are either rare enough or catastrophic enough that letting the task do a restart in those cases is probably fine

Copy link
Member

@nickethier nickethier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, I really like the way this turned out!

As newer versions of Consul are released, the minimum version of Envoy
it supports as a sidecar proxy also gets bumped. Starting with the upcoming
Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current
versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the
default implementation of Connect sidecar proxy.

This PR introduces a change such that each Nomad Client will query its
local Consul for a list of Envoy proxies that it supports (hashicorp/consul#8545)
and then launch the Connect sidecar proxy task using the latest supported version
of Envoy. If the `SupportedProxies` API component is not available from
Consul, Nomad will fallback to the old version of Envoy supported by old
versions of Consul.

Setting the meta configuration option `meta.connect.sidecar_image` or
setting the `connect.sidecar_task` stanza will take precedence as is
the current behavior for sidecar proxies.

Setting the meta configuration option `meta.connect.gateway_image`
will take precedence as is the current behavior for connect gateways.

`meta.connect.sidecar_image` and `meta.connect.gateway_image` may make
use of the special `${NOMAD_envoy_version}` variable interpolation, which
resolves to the newest version of Envoy supported by the Consul agent.

Addresses #8585 #7665
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants