-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/envoy: Set the dns lookup family on externalName type clusters #2894
internal/envoy: Set the dns lookup family on externalName type clusters #2894
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2894 +/- ##
==========================================
+ Coverage 74.91% 74.94% +0.02%
==========================================
Files 87 87
Lines 5582 5604 +22
==========================================
+ Hits 4182 4200 +18
- Misses 1310 1314 +4
Partials 90 90
|
@stevesloka Can you give more detail about why v4-only is the right approach here? What was the underlying problem? |
@jpeach the issue describes the problem in more detail. The TLDR is: When configuring a route to an externalName service, requests don't get fulfilled properly, but return 503 errors. When Envoy does a DNS lookup to some external resources, a ipv6 address is returned for the external resource and proxying fails with a 503 error. Setting this field ensures an IPv4 is returned so Envoy proxies correctly. I tested this in a GKE cluster as well as my home lab, both showed the same issue and was also resolved with the same fix. Chatting with @moderation in an Envoy issue (envoyproxy/envoy#13037 (comment)), he verified that v3 resolved the issue which we haven't yet moved to, or by setting the dns lookup family might also help. |
Based on https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cluster.proto#envoy-api-enum-cluster-dnslookupfamily, it sounds to me like this should be working properly (i.e. fallback to v4 resolution if v6 fails), so perhaps an Envoy bug? I guess my only question re: this change is, are there any scenarios where specifying v4-only would cause issues for users by not supporting v6? |
Thanks for linking the Envoy issue. It's not clear to me from the issue whether the problem was that Envoy couldn't connect to the v6 target for some reason, or whether it hit an internal bug first. Possibly, the bug is that it does't fall back to v4 on a connection failure? |
To clarify I don't think changing to v3 fixed the issue, it was the addition of |
I had some more discussion in the upstream Envoy issue and think this isn't 100% an Envoy issue. The Envoy setting Since some sites are returning ipv6 addresses, it's then up to the network capabilities of the cluster's network to allow the routing to work. I think this is why we're seeing the errors in the issue referenced. To fully resolve I think we should add a config setting which will allow users to determine which setting to pass to envoy, be that the ipv4, ipv6, or auto (which should remain the default). @youngnick @skriss thoughts? |
I think that's the right way to do this, thanks @stevesloka. |
4377594
to
8c95549
Compare
Ok this is updated with a new config file option. I named it "cluster", but not sure if that makes sense or not. This new setting defines what is used for the |
8c95549
to
c4aa289
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach looks reasonable to me -- it seems unlikely that this would need to be toggled on a per-route basis so a global config file setting makes sense.
Need to update configuration.md
as well.
|
||
// ClusterConfig holds various configurable Envoy cluster values that can | ||
// be set in the config file. | ||
ClusterConfig `yaml:"cluster,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have a better suggestion for naming here. I guess this could just be a top-level field in the config file too. 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that's where I started, but if we ever needed to add another "cluster" type config it's nice to have this section for it.
c4aa289
to
a9b8e70
Compare
a9b8e70
to
6cadac7
Compare
Still need to update |
6cadac7
to
e2730d0
Compare
@skriss added configuration.md as well. =) |
Changes look good; you've got some merge conflicts now. |
e2730d0
to
efa1a86
Compare
@@ -1,33 +1,27 @@ | |||
<p>Packages:</p> | |||
<ul> | |||
<li> | |||
<a href="#projectcontour.io%2fv1">projectcontour.io/v1</a> | |||
<a href="#projectcontour.io%2fv1alpha1">projectcontour.io/v1alpha1</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this file was changed. Maybe another PR didn't update properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe an unstable sort within the doc generator? Looks like it switched the order of v1 and v1alpha1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I check out your branch and re-run the generate, it undoes this change..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I spotted the issue in the API docs generator, TLDR it's using a map somewhere along the line which potentially throws away the sort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just remove this file from the PR since you didn't actually change anything in the API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
It'd be nice to figure out how to resolve the API ref docs sorting issue separately now that we have >1 group, otherwise we're gonna get a lot of noise from this going fwd. We can try getting a patch upstream (looks like there might be an open PR with a fix that we could make some noise on), else we may need to fork the tool.
Are we sure that this makes sense as a global config option? To me, it seems likely that you may need to set this differently for different targets, which argues for an annotation. I could see the utility of setting a default in addition to that though. |
I think this is most likely to be an install-wide problem, caused by the shenanigans around AAAA and A records when you've got V4 and V6 around, so that's why I supported the global config first. I think that if we need more configurability, we can add it later, with this as the default. |
It's an external service, managed by an entirely separate org. That separate org is the one publishing DNS and both the org and the I remember work paths are factors in determining whether IPv6 connectivity works. It's extremely likely that only one out of many external services would exhibit similar problems. At least spin off an issue for the per-service config. What was the root cause in the problem that triggered this PR? Was it a local install problem or a external service problem? |
You're right, it was an external service problem. |
The problem I ran into is the local cluster wouldn't route ipv6. The external service returned an ipv6 address first since that's what we configured envoy to do when using the default of "auto". That's said, this is a local cluster problem. Nothing to do with the external one. I could see a different issue to add seperate configs for each service type, but not needed today. |
So, I've thought about this some more, and I'm still supportive of this config being a global config, for two reasons:
|
Adds a config option for DnsLookupFamily allowing users to define what dns lookup family is used on any cluster that is referenced via an externalName type cluster. This ensures that lookups to external resources are resolved correctly. Fixes projectcontour#2873 Signed-off-by: Steve Sloka <slokas@vmware.com>
efa1a86
to
35e524b
Compare
Adds a config file option
Cluster.DnsLookupFamily
which allows users to define what dns lookup family is used for anyexternalName type cluster. This ensures that lookups to external resources are resolved correctly.
Fixes #2873
Signed-off-by: Steve Sloka slokas@vmware.com