Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Use strict DNS for mesh gateways with hostnames into release/1.17.x #19396

Conversation

hc-github-team-consul-core
Copy link
Contributor

Backport

This PR is auto-generated from #19268 to be assessed for backporting due to the inclusion of the label backport/1.17.

The below text is copied from the body of the original PR.


Description

This fixes #17557. In an attempt to support mesh gateways fronted by AWS load balancers, a code path for peered mesh gateways was introduced in #14917 that leverages envoy clusters backed by the LOGICAL_DNS cluster discovery type. Problematically, when replicas of mesh gateways exist in a peered connection, the dialing peer will hit this code path and attempt to add multiple endpoints for the targeted mesh gateways. Envoy, however, doesn't support multiple endpoints using LOGICAL_DNS and will start spitting out errors applying the xDS it receives from Consul.

In Kubernetes, when a mesh gateway restarts then, it will never finish initializing and get marked as healthy, so its pod will continually restart and the gateway becomes unusable.

Because this requires using hostnames rather than IP addresses for the WAN addresses registered for mesh gateways, it likely impacts mostly Consul users on AWS, where hostnames are used for LoadBalancer services and thus registered for LoadBalancer type mesh gateways. This will also affect users who manually (or with annotations) register mesh gateways with mutiple FQDNs.

Note that this appears to only affect the dialing cluster in a peered connection, the accepting clusters use a different code path that only ever uses a single mesh gateway target and doesn't attempt to load-balance between multiple mesh gateways.

Testing & Reproduction steps

I was able to recreate this pretty easily outside of AWS by pinning the FQDN of the mesh gateways in the accepting cluster via something like:

meshGateway:
  enabled: true
  replicas: 2
  wanAddress:
    source: "Static"
    static: "gateway.nanosleep.cloud"

which gives this for my dialing cluster:

curl https://${DC2_CONSUL}/v1/peerings ... | jq
...
"PeerServerAddresses": [
  "gateway.nanosleep.cloud:443",
  "gateway.nanosleep.cloud:443"
],
...

And dialing cluster Consul logs then show:

2023-10-17T22:37:29.016Z [ERROR] agent.envoy.xds.mesh_gateway: got error response from envoy proxy: service_id=default/default/consul-consul-mesh-gateway-6ff745887b-5c5s2 typeUrl=type.googleapis.com/envoy.config.cluster.v3.Cluster xdsVersion=v3 nonce=00000006 error="rpc error: code = Internal desc = Error adding/updating cluster(s) server.dc1.peering.303380e1-f1a6-fb04-4ca6-c562e4951539.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint"

and dialing cluster mesh gateway:

2023-10-17T22:36:20.838Z+00:00 [warning] envoy.config(14) gRPC config for type.googleapis.com/envoy.config.cluster.v3.Cluster rejected: Error adding/updating cluster(s) server.dc1.peering.303380e1-f1a6-fb04-4ca6-c562e4951539.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint

Swapping to STRICT_DNS allows the mesh gateway to finish configuration and boot properly.

Links

Strict DNS in envoy.

PR Checklist

  • updated test coverage
  • external facing docs updated
  • appropriate backport labels added
  • not a security concern

Overview of commits

@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/net-4786/mesh-strict-dns/secondly-pleasant-thrush branch from 2b12be1 to d3e927a Compare October 26, 2023 20:08
@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/net-4786/mesh-strict-dns/secondly-pleasant-thrush branch from 1a0bbf1 to 369da8e Compare October 26, 2023 20:08
@github-actions github-actions bot added the theme/envoy/xds Related to Envoy support label Oct 26, 2023
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto approved Consul Bot automated PR

@hc-github-team-consul-core hc-github-team-consul-core merged commit b7055a0 into release/1.17.x Oct 26, 2023
88 of 89 checks passed
@hc-github-team-consul-core hc-github-team-consul-core deleted the backport/net-4786/mesh-strict-dns/secondly-pleasant-thrush branch October 26, 2023 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/envoy/xds Related to Envoy support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants