Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tun: replace the DNS client in netstack with net.Resolver #43

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

benburkert
Copy link

Hello,

I used the new netstack package to setup an in-process tunnel for
tunneling SSH connections over WireGuard. It worked quite well, I was
very happy with how easy it was. Here are some changes I made as part of
my work. The biggest one is replacing the custom DNS client with an
instance of net.Resolver.

The TUN type includes a DNS client for sending hostname lookup queries
to the DNS server(s) over the WireGuard connection. The builtin client
resolver in the net package also uses the dnsmessage package to
implement a DNS client, which can bypass the servers specified in
resolv.conf when a custom Dial func is provided. This is a patch to
replace the netstack package's DNS client with an instance of
net.Resolver configured to connect to the specified servers instead of
the system's configured DNS servers (via resolv.conf). It was tested on
and with support from Fly.io.

Also included is a change to update the modules in the netstack package
and fixes for the client & server examples, along with a change to add a
copyright file to the netstack package to fix https://pkg.go.dev/
support.

@zx2c4
Copy link
Member

zx2c4 commented Mar 2, 2021

Thanks for the patches.

First, some house keeping -- this repo uses Signed-off-by: lines. Can you add yours using git commit -s --amend --no-edit?

With regards to the DNS change, I'm a bit lost in the diff, but my current understand is:

  • net.Resolver lets you override the Dial function.
  • You use this to hijack the dialing, replacing the DNS server passed by net with the one specified by the user for tnet.
  • If the user doesn't specify any tnet resolvers, you allow the ones passed by net.Resolver to be used.

Is this more or less what's happening? If so, I'm wondering what happens when there are no system resolvers specified in /etc/resolv.conf. Will net.Resolver still pass an IP to be hijacked to the Dial function, or will it just fail? What is the behavior when a tnet has 1 DNS server, but the system specifies multiple DNS servers? Will tnet's 1 server be queried multiple times?

@benburkert
Copy link
Author

benburkert commented Mar 2, 2021

Thanks, I added the signoffs, I also noticed some wonkiness with go mod tidy, it seems to make a change that breaks go build for me. Still trying to track down what's going on there.

I realized I forgot to document some of the behavioral changes this PR introduces. It's not easy to describe without an excruciatingly long explainer on how DNS works in Go apps, so here goes.

The net package has two ways of looking up a hostname: native OS functions/syscalls or pure Go resolver. Since the host is unaware of the netstack networking layer, if lookups are handled by the OS they will not be sent over the WireGuard connection, and internal names won't be resolved. So for any new connections over the WireGuard tunnel, it makes sense to also tunnel DNS queries. Using a net.Resolver with PreferGo set and Dial set to tnet's will force this behavior, but unfortunately that's not the end of it.

Even with the pure Go resolver configured, the local DNS config (e.g. resolv.conf, nsswitch.conf) is still used to determine the DNS server to contact. (If no config is present, I believe it default to "localhost:53".) This can cause the pure Go resolver to lookup a hostname over the WireGuard tunnel, but attempt to contact the server specified in the host config. On my machine, it was attempting to send DNS queries to 192.168.1.1:53, which was not part of the allowed ips address range.

So the dialDNS method attempts to correct this by swapping out the DNS server IP supplied by the system config for the DNS server IP(s) specified in the CreateNetTUN parameter. Unfortunately, the net package does not expose the DNS server(s) it discovers via reading system configs, so this dialDNS method instead replaces any connection attempts with a random IP in dnsServers, unless the address is already in dnsServers. (The range over a map makes the order random.)

The previous version of Dial was resolving all IPs for the address with the custom resolver implemented in LookupContextHost. That version would query each DNS server in dnsServers until it received a valid response. This version uses the net.Resolver to do the lookup, but it overrides the IP address that net.Resolver attempts to dial, via the dialDNS method. This causes a subtle change in behavior: LookupContextHost returns the first valid record from a DNS server, dialDNS returns the first valid connection to a DNS server. If the record cannot be resolved by this server, it won't attempt to query another server. That behavior could be preserved by adding a wrapper around the call to tnet.resolver.LookupHost, if it is important to keep.

Let me know if you would like any of these details documented in the commit message.

@zx2c4-bot zx2c4-bot force-pushed the master branch 5 times, most recently from 5861be1 to 2f788f7 Compare March 6, 2021 15:48
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
@zx2c4
Copy link
Member

zx2c4 commented Mar 6, 2021

Thanks, I added the signoffs, I also noticed some wonkiness with go mod tidy, it seems to make a change that breaks go build for me. Still trying to track down what's going on there.

gVisor has a separate branch for use with real Go, since they have some forked Go. I fixed things up in slightly different ways for both the pkgsite license and the api bump. Thanks for bringing that all to my attention.

I realized I forgot to document some of the behavioral changes this PR introduces. It's not easy to describe without an excruciatingly long explainer on how DNS works in Go apps, so here goes.

Okay, so your explanation matches my three bullet points. I think we're on the same page. I worry, though, isn't there a problem with:

So the dialDNS method attempts to correct this by swapping out the DNS server IP supplied by the system config for the DNS server IP(s) specified in the CreateNetTUN parameter. Unfortunately, the net package does not expose the DNS server(s) it discovers via reading system configs, so this dialDNS method instead replaces any connection attempts with a random IP in dnsServers, unless the address is already in dnsServers. (The range over a map makes the order random.)

I asked about a corner case of this in my prior message when I wrote:

I'm wondering what happens when there are no system resolvers specified in /etc/resolv.conf.

It looks like in that case, lookups will fail. But lookups shouldn't fail for that reason.

So as much as I'd like to stop open coding DNS resolver logic in this module and use something in the library, it doesn't look like Go's resolver is flexible enough to make that work. I briefly looked into what it'd take to hijack Go's internal resolver logic using unsafe, but it looks like there's a lot of build tags and such in there complicating it. I wonder if this would be a good proposal for Go 1.17?

zx2c4 added 4 commits March 6, 2021 10:50
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
It should be enough to check for the trailing zero name.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
@benburkert
Copy link
Author

I'm wondering what happens when there are no system resolvers specified in /etc/resolv.conf.

127.0.0.1:53 & [::1]:53 are used when resolv.conf doesn't include a nameserver or when there is an error reading the config.

I briefly looked into what it'd take to hijack Go's internal resolver logic using unsafe, but it looks like there's a lot of build tags and such in there complicating it. I wonder if this would be a good proposal for Go 1.17?

I think hijacking the internal resolver would be overstepping if it forces all DNS lookups for the process to go over the tunnel. And you can already do this (minus loading the resolv.conf config) by modifying net. DefaultResolver. Specifying a config for a net.Resolver would be great, but I think it's been avoided b/c of how a lot of the configuration can be platform specific.

Use the net.Resolver DNS client to send domain lookups to the server
specified as a CreateNetTUN parameter. The net package's DNS client
handles DNS request and response parsing when PreferGo is true.

Like the previous DNS client it replaces, the net.Resolver instances
also sends DNS queries over the WireGuard connection. Tested on and with
support from Fly.io.

Signed-off-by: Ben Burkert <ben@benburkert.com>
@zx2c4
Copy link
Member

zx2c4 commented Mar 6, 2021

I'm wondering what happens when there are no system resolvers specified in /etc/resolv.conf.

127.0.0.1:53 & [::1]:53 are used when resolv.conf doesn't include a nameserver or when there is an error reading the config.

Oh, great! That's very promising. So we always get some request. Two questions:

  • How's this work on Windows, or non _unix platforms?
  • What happens if the tnet has more DNS servers than resolvconf? I see that currently you try dialing all of them, for each call to dialDNS, but this has two problems:
    • UDP dialing won't fail fast like TCP does, so it'll always return a valid conn and not error out.
    • If it does error, then it means that dialDNS might be called for the next resolv.conf address, in which case you now are making N*M connections.

I briefly looked into what it'd take to hijack Go's internal resolver logic using unsafe, but it looks like there's a lot of build tags and such in there complicating it. I wonder if this would be a good proposal for Go 1.17?

I think hijacking the internal resolver would be overstepping if it forces all DNS lookups for the process to go over the tunnel. And you can already do this (minus loading the resolv.conf config) by modifying net. DefaultResolver. Specifying a config for a net.Resolver would be great, but I think it's been avoided b/c of how a lot of the configuration can be platform specific.

Yea, I wasn't thinking hijacking DefaultResolver, but just an instance of Resolver. But it looks like resolverConfig might be global anyway, which is a bummer.

@benburkert
Copy link
Author

  • How's this work on Windows, or non _unix platforms?

I don't think it would, because AFAIK none of them support the PreferGo stuff.

  • UDP dialing won't fail fast like TCP does, so it'll always return a valid conn and not error out.

Yes, but if subsequent lookup fails with a "temporary" DNS error (like a timeout), the resolver will redial then resend the query, until it runs out of servers. But dialDNS could pick the same server on a redial.

  • If it does error, then it means that dialDNS might be called for the next resolv.conf address, in which case you now are making N*M connections.

The max is 3*len(dnsServers) connections for TCP and 3 for UDP, because the dnsConfig allows at most 3 servers.

None of this seems ideal, so perhaps a better approach is to keep the DNS lookups handled by Net, but to use a github.com/miekg/dns client?

@benburkert
Copy link
Author

benburkert commented Mar 6, 2021

After thinking a bit more about this a bit more, I'm no longer in favor of using a different DNS client. What I really want is a way to override the LookupHost method, if I so choose. That way there doesn't have to be a DNS server running on the other side of the connection for internal DNS to work (e.g. tunnel DOH instead). Using a different DNS client is a very round-about way of allowing for this.

What do you think about extracting the DNS client here into an exported type with just a LookupHost(context.Context, string) ([]string, error) method, and swapping the dnsServers argument for a LookupHostfunc argument? That way, a net.Resolver.LookupHost could be used in place of this client. Or just export a TunnelDNSTo(dnsServers []net.IP) func(....) func instead of the entire DNS client type.

@zx2c4-bot zx2c4-bot force-pushed the master branch 2 times, most recently from 6b5293b to 6005c57 Compare March 9, 2021 04:32
@zx2c4-bot zx2c4-bot force-pushed the master branch 6 times, most recently from 58beb0f to 54dbe24 Compare April 12, 2021 21:35
@zhsj
Copy link
Contributor

zhsj commented Jul 22, 2022

Hi,

Not sure if it's right place to chime in. I also want to config the netstack to use DoH/DoT.

Something that has changed over the time. PreferGo is supported on Windows since go1.19. Though I don't think we must use net.Resolver.

How about adding a resolver interface, like LookupHost(ctx context.Context, host string) (addrs []string, err error), and allow it to be passed to CreateNetTUN. To allow the resolve traffic to go through netstack, the DialContext probably should be added to that LookupHost function as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants