-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: DNS address resolution quirks (AAAA records inconsistency) #25321
Comments
/cc @mikioh |
Can you test at tip? The Go resolver has been largely rewritten. |
Just tested with master 65c365b, for reference:
Results at tipWith AAAA queries returned from server,
With AAAA queries returned from server,
With AAAA queries NOT returned from server,
With AAAA queries NOT returned from server,
ConclusionMy conclusion is that problem is still present. |
@gdm85, it looks like there might have been two bugs.
Does that sound right? If so, I think the problem may be a bug in Dial's fallback logic. |
@iangudger yes, I missed that but with 65c365b the top-right scenario has indeed been fixed. As for the 2nd bug, I have dug a bit deeper. These are the relevant sysctl params (notice the second one):
For the records: on this (and similar) boxes no interface has any IPv6 address enabled/used. I have patched a unit test to quickly determine what diff --git a/src/net/ipsock_test.go b/src/net/ipsock_test.go
index aede354..204cda3 100644
--- a/src/net/ipsock_test.go
+++ b/src/net/ipsock_test.go
@@ -7,6 +7,9 @@ package net
import (
"reflect"
"testing"
+ "internal/poll"
+ "runtime"
+ "syscall"
)
var testInetaddr = func(ip IPAddr) Addr { return &TCPAddr{IP: ip.IP, Port: 5682, Zone: ip.Zone} }
@@ -280,3 +283,59 @@ func TestAddrListPartition(t *testing.T) {
}
}
}
+
+// Probe probes IPv4, IPv6 and IPv4-mapped IPv6 communication
+// capabilities which are controlled by the IPV6_V6ONLY socket option
+// and kernel configuration.
+//
+// Should we try to use the IPv4 socket interface if we're only
+// dealing with IPv4 sockets? As long as the host system understands
+// IPv4-mapped IPv6, it's okay to pass IPv4-mapeed IPv6 addresses to
+// the IPv6 interface. That simplifies our code and is most
+// general. Unfortunately, we need to run on kernels built without
+// IPv6 support too. So probe the kernel to figure it out.
+func TestIPv6(t *testing.T) {
+ s, err := sysSocket(syscall.AF_INET, syscall.SOCK_STREAM, syscall.IPPROTO_TCP)
+ switch err {
+ case syscall.EAFNOSUPPORT, syscall.EPROTONOSUPPORT:
+ case nil:
+ poll.CloseFunc(s)
+ t.Log("p.ipv4Enabled = true")
+ }
+ var probes = []struct {
+ laddr TCPAddr
+ value int
+ }{
+ // IPv6 communication capability
+ {laddr: TCPAddr{IP: ParseIP("::1")}, value: 1},
+ // IPv4-mapped IPv6 address communication capability
+ {laddr: TCPAddr{IP: IPv4(127, 0, 0, 1)}, value: 0},
+ }
+ switch runtime.GOOS {
+ case "dragonfly", "openbsd":
+ // The latest DragonFly BSD and OpenBSD kernels don't
+ // support IPV6_V6ONLY=0. They always return an error
+ // and we don't need to probe the capability.
+ probes = probes[:1]
+ }
+ for i := range probes {
+ s, err := sysSocket(syscall.AF_INET6, syscall.SOCK_STREAM, syscall.IPPROTO_TCP)
+ if err != nil {
+ continue
+ }
+ defer poll.CloseFunc(s)
+ syscall.SetsockoptInt(s, syscall.IPPROTO_IPV6, syscall.IPV6_V6ONLY, probes[i].value)
+ sa, err := probes[i].laddr.sockaddr(syscall.AF_INET6)
+ if err != nil {
+ continue
+ }
+ if err := syscall.Bind(s, sa); err != nil {
+ continue
+ }
+ if i == 0 {
+ t.Log("p.ipv6Enabled = true")
+ } else {
+ t.Log("p.ipv4MappedIPv6Enabled = true")
+ }
+ }
+} Result of running this test:
Relevant lines:
So in order to reproduce the bug(s), the machine must have IPv6 disabled but IPv6 can bind on IPv4. To toggle AAAA functionality server-side (DNS) I am using unbound with a modified python filter (I got some clues from https://github.com/berstend/unbound-no-aaaa). If needed I can reproduce the tests with Go 1.10.1 and master, however - before running further tests - I am also looking at identifying all the test matrix dimensions to eventually create some sort of tests suite. Temptatively:
Ideally we should be able to identify desired behaviour for all the (valid) permutations; for the records, on Linux all permutations are valid (might not be the case on Windows/BSD/etc). Another relevant aspect is the total resolution time spent. Edit: I notice now that the func |
Probably this is why the golang apps compiled without CGO (netdns) enabled are throwing
Can the Go netdns be fixed in a way that it would be falling back to IPv4 if it fails to bind to IPv6? Or, rather, it would not even try IPv6 in the first place if it sees the There are more people starting to use Samsung DeX, they are going to get blocked by this issue, forced to recompile the software, rather than just directly using it. |
+1. FTR: I experience this on the Google Kubernetes Engine, where IPv6 is generally not available, when using pure Go applications from inside the cluster and connecting to some Google Cloud endpoints with IPv6 records (such as www.googleapis.com). We receive ever so sporadic »cannot assign requested address« errors from inside the cluster, and Google Cloud Support pointed us in the direction of this issue. Using the cgo-based resolver would require a different build configuration, which is something I would rather avoid. Parallelly, an issue has been raised in the Google Kubernetes Engine issue tracker to teach kube-dns to drop the AAAA records if IPv6 is not available inside a cluster. |
This has been affecting us in our environment as well, which is containerized linux/amd64 with IPv6 completely disabled:
For us, it occasionally manifests when calling the Lets Encrypt ACME API:
We've tried with CGO enabled and disabled at build time, with both resolvers (netgo vs cgo) at runtime, and also with/without setting The issue persists for us, even with Go 1.16. DebuggingHere's what I found when trying to dig in on the dial side, hopefully it helps other folks running into this: Based on the discussion above, at first I thought it was an issue with ordering of the resolved addrs, and that the TCP dialer was not properly handling this case properly. However, after reading through a lot of the To me, this implies that the Go resolver sometimes ONLY returns IPv6 addresses, when it should also return IPv4 in addition; it seems like the resolver sometimes just isn't including A records at all, leaving only the AAAA record's IPv6 address as the only option. Considering it's happening with both the netgo and cgo resolvers, it's also possible the environment's resolver was misbehaving. Perhaps this situation just happens less frequently with the CGO resolver, however. It's hard to say if the A record actually failed to resolve, or something else is causing it to not be included in the returned list. Possible workaroundsI considered working around this by setting the dialer's The one option that I do believe works is explicitly specifying the Further, setting a custom Update - April 12, 2021We ended up setting a custom dialer where we explicitly pass |
The dns issue sometime involves multiple layers. By using a slightly modified code here, https://play.golang.org/p/4q-AeFKhj_o, uncommented the log statements and seeing no drop in neither IPv4 nor IPv6. Having a look into the http transport and it appeared to be that the network tcp is being used as the dial parameter opposing to specific version of IP network. It would result in both ipv4 and ipv6 name resolution and this test was running in Ubuntu so the CGO mode would have been using glibc to resolve the name based on the implementation of cgoLookupIPCNAME and with tcp network, not tcp4 nor tcp6, it should just pass the AF_UNSPEC to glibc getaddrinfo where it relies on the dns options to control the parallelism of version 4 and 6 resolution, with single-request set, as per the resolv.conf reference, we would clearly see if any of those version missing but didn't see anything, Go resolver also supports the single-request. Based on this, my conclusion now lean toward the latest update from @anitgandhi where it may seem related to some failure to resolve DNS record which I suspected that it might be related to DNS server itself.
|
Yeah I meant to leave an update on my previous comment, that as of Go 1.17, explicitly passing In our environment (IPv6 disabled), using |
We observed intermittent failures during deployment, due to Go resolving Google API domains into IPv6 addresses, even though the Cloud Shell environment has IPv6 disabled. Until the Go issue (golang/go#25321) has been resolved, we have to patch the `/etc/hosts` file on the Cloud Shell machine to ensure that these domains are resolved using IPv4 only. PiperOrigin-RevId: 495598377
We observed intermittent failures during deployment, due to Go resolving Google API domains into IPv6 addresses, even though the Cloud Shell environment has IPv6 disabled. Until the Go issue (golang/go#25321) has been resolved, we have to patch the `/etc/hosts` file on the Cloud Shell machine to ensure that these domains are resolved using IPv4 only. PiperOrigin-RevId: 495598377
We observed intermittent failures during deployment, due to Go resolving Google API domains into IPv6 addresses, even though the Cloud Shell environment has IPv6 disabled. Until the Go issue (golang/go#25321) has been resolved, we have to patch the `/etc/hosts` file on the Cloud Shell machine to ensure that these domains are resolved using IPv4 only. PiperOrigin-RevId: 495598377
We observed intermittent failures during deployment, due to Go resolving Google API domains into IPv6 addresses, even though the Cloud Shell environment has IPv6 disabled. Until the Go issue (golang/go#25321) has been resolved, we have to patch the `/etc/hosts` file on the Cloud Shell machine to ensure that these domains are resolved using IPv4 only. PiperOrigin-RevId: 495598377
We observed intermittent failures during deployment, due to Go resolving Google API domains into IPv6 addresses, even though the Cloud Shell environment has IPv6 disabled. Until the Go issue (golang/go#25321) has been resolved, we have to patch the `/etc/hosts` file on the Cloud Shell machine to ensure that these domains are resolved using IPv4 only. PiperOrigin-RevId: 495779871
This is a workaround for golang/go#25321, and is related to hashicorp/terraform-provider-google#6782
…upported This is a workaround for golang/go#25321, and is related to hashicorp/terraform-provider-google#6782
This bug reports is about an inconsistency on how resolution is handled between the Go resolver and the CGO one.
I do not expect a bugfix (although probably beneficial, but I leave that estimation to others) but at least an understanding of why the Go resolver behaves this way.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Latest release is 1.10.2 at the time of writing; not tested, by reading the release notes, nothing should have changed on the relevant code.
What operating system and processor architecture are you using (
go env
)?Test setup
issue25321.go can be obtained from https://play.golang.org/p/kE_Unq4VvkO
IPv6 is disabled on this box; the DNS server may or may not return
AAAA
records (I have a toggle for that).When AAAA answers are allowed:
When they are not allowed:
But in both cases, an A query works:
Test results
Reminder: IPv6 is always disabled, only the netdns resolver and the responses of the DNS are varying for the below tests.
go run issue25321.go
2018/05/10 00:52:01 dial failed www.googleapis.com Get http://[2a00:1450:4001:821::200a]:80/: dial tcp [2a00:1450:4001:821::200a]:80: connect: cannot assign requested address
2018/05/10 00:52:11 HTTP failed www.googleapis.com Get http://www.googleapis.com/: dial tcp [2a00:1450:4001:821::200a]:80: connect: cannot assign requested address
2018/05/10 00:43:23 dial failed www.googleapis.com lookup www.googleapis.com on 127.0.0.1:53: read udp 127.0.0.1:59700->127.0.0.1:53: i/o timeout
2018/05/10 00:43:33 HTTP failed www.googleapis.com Get http://www.googleapis.com/: dial tcp: lookup www.googleapis.com on 127.0.0.1:53: read udp 127.0.0.1:43538->127.0.0.1:53: i/o timeout
GODEBUG=netdns=cgo go run issue25321.go
2018/05/10 00:47:23 dial failed www.googleapis.com Get http://[2a00:1450:4001:81d::200a]:80/: dial tcp [2a00:1450:4001:81d::200a]:80: connect: cannot assign requested address
2018/05/10 00:47:23 OK www.googleapis.com 192.168.1.12:57248 -> 216.58.207.74:80
2018/05/10 00:43:07 OK www.googleapis.com 192.168.1.12:57156 -> 216.58.207.74:80
Forgive the horrible representation, but there are two log lines at most in those table cells, you can see them better by copy/pasting their content.
Worthy of note: in the case of CGO resolver and AAAA answers allowed, first there is a failure (dialer) and then a success (HTTP request).
Another note: resolving
www.bing.com
is not affected by this problem, so problem must be related to how the records are returned from the DNS.Expected results
The expected result for all the 4 combinations would be (since IPv6 is disabled on this box): do not try
AAAA
and use anA
record, like on the bottom-right cell of the tests matrix.Questions arising from this test
AAAA
records somehow? (I am inclined to think so)AAAA
is returned? this would be somehow the most serious part of the bug (if acknowledged), although it should first be determined if it is not a problem of the DNS (server-side)Related
The text was updated successfully, but these errors were encountered: