Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DoH improvements #405

Merged
merged 16 commits into from
Jan 7, 2020
Merged

DoH improvements #405

merged 16 commits into from
Jan 7, 2020

Conversation

kimbo
Copy link
Contributor

@kimbo kimbo commented Dec 23, 2019

Here are some improvements/additions to #393.

This adds a few things that @rthalley requested to dns.query.https() (see comment), as well as some other things I thought would be useful.

I added or modified for the following arguments to dns.query.https():

  • q - used to be called url, renamed to q
  • source (source address)
  • source_port
  • af - address family to use
  • verify - defaults to True, determines whether or not to verify server TLS certificates
  • where - a URL or an IP address (used to only allow for URLs)
  • path - default is /dns-query, the path to use when where is an IP address
  • port - default is 443, the port to use when where is an IP address

About the where parameter, formerly called url. Before it had to be a URL, but now it can be either a full URL (e.g. https://dns.google/dns-query) or an IP address (e.g. 8.8.8.8). If it's the latter, the URL will be built like so: https://<where>:<port><path>. If where is not an IP address, it is assumed that port and path and included in where and they will not be used to build the URL. I added this functionality because I think it makes sense for https() to have a similar API as the other functions in dns.query (udp(), tcp(), and tls()).

Added the requests package for all its goodness.
Added requests-toolbelt to set source port/address. The only thing it uses from there is a HttpAdapter, which isn't too long and could be put directly into dnspython somewhere if that's desirable.

Also added a few tests for GET, POST, and passing in an IP address. I don't really like the idea of hard-coding in DNS resolvers for tests, so might be good to change eventually.

I haven't touched dns.resolver yet. Figured I'd submit this first to see if there are any thoughts about it.

This PR allows for where to be a plain http url. Obviously that would defeat the main purpose of DoH, but it could be useful to someone in setting up/debugging/implementing a DoH service. I don't feel strongly either way.

dns/query.py Outdated Show resolved Hide resolved
@rthalley
Copy link
Owner

I'm mostly good with these, but I am not happy about setting urllib3.util.connection.allowed_gai_family as that is not thread-safe. This could cause a dnspython thread in some larger program to break the program. Is there something else we can do?

@kimbo
Copy link
Contributor Author

kimbo commented Dec 24, 2019

I'm mostly good with these, but I am not happy about setting urllib3.util.connection.allowed_gai_family as that is not thread-safe. This could cause a dnspython thread in some larger program to break the program. Is there something else we can do?

Yeah that's a good point. Here are some thoughts, though none of them sound that great to me.

There are some ideas in this SO post, but most of them are similarly hackish solutions. One thing that might be worthwhile is to make a little wrapper for requests that allows us to pass in source, source_port, and af directly to requests.get(), requests.post(), and so on. Found this snippet linked in one of the answers that adds the family parameter to allow setting the address family.

Another option is to use a mutex (e.g. threading.Lock()). Sounds really slow though, which would probably defeat the purpose of the multi-threading.

@kimbo
Copy link
Contributor Author

kimbo commented Dec 24, 2019

@rthalley Another option is to not support the af parameter altogether. Thoughts about this:

  • lots of people won't ever need to set the af param
  • many DoH providers have a list of hostnames/IP addresses that are not hard to find
  • if someone wants to set the address family to, e.g. AF_INET, they could manually do the A lookup first (e.g. using dns.query.udp()) and pass the IP from the answer in to dns.query.https() as the where param, thereby bypassing getaddrinfo(). There are a few downsides of this, one of which is they might have to include verify=False much of the time because not everyone uses the SAN extension.

@rthalley
Copy link
Owner

I'm leaning towards "no af parameter" at the moment, as the "wrapper" for requests is kind of painfully large. I don't want to do any of the monkeypatching solutions. Using a mutex doesn't really work, as that would only protect dnspython threads from each other, but if an application used requests for dnspython and something else, there would still be trouble.

We can always add an af parameter later if the requests API gets fixed.

@rthalley
Copy link
Owner

If I had realized the af was a hard thing for requests or urllib, I wouldn't have asked for it :)

@kimbo
Copy link
Contributor Author

kimbo commented Dec 24, 2019

@rthalley K I'll remove it.

not supporting setting address family for DoH until there is a better
way to do it with the requests api.
@kimbo
Copy link
Contributor Author

kimbo commented Dec 24, 2019

@kimbo
Copy link
Contributor Author

kimbo commented Dec 26, 2019

Looking at this again. If someone wants to do a lot of DNS lookups over HTTPS, this is going to be really slow. Mainly because there's a brand new session for every query (aka opening and closing a new connection every time). I ran some perf tests and compared this implementation of dns.query.https() to dns.query.udp(). After I saw how slow dns.query.https() is, I tested out dns.query.tcp() and dns.query.tls(), suspecting they would be similarly slow. However, dns.query.tcp() was almost exactly as fast as dns.query.udp(), and dns.query.tls() was slower than dns.query.udp(), but not nearly as slow as dns.query.https(). My guess is dns.query.tcp() is re-using the same connection (i.e. leaving it open).

Ideas to fix this:

  1. Pass in a list (or any iterable) of dns.message.Message and look them all up before closing the session.
  2. Add a session parameter to dns.query.https() so you can pass in a request.sessions.Session object.

I tried the first way on Cloudflare, Google, and Quad9 with tcp, tls, and https. Ended up with a HUGE speedup for https in all cases, a medium-to-large-sized speedup for tls (but sometimes Google would close the connection after a while), and almost no speedup for tcp. Basically, if you can to 1000 dns queries over the same tcp connection, it gets pretty close to the same as doing them over udp.

@rthalley
Copy link
Owner

The UDP and TCP code deals with the needs of higher performance callers by providing send_udp/tcp() and receive_tcp/udp(). Could we have send_https() and receive_https() but with a requests Session instead of a socket? Then caller can take over the connection management.

@kimbo
Copy link
Contributor Author

kimbo commented Dec 28, 2019

I'll try and get on this in the next week or so

@filips123
Copy link
Contributor

Many DoH clients also support "bootstrap address". Without that address, when you pass resolver URL, it will be resolved using system DNS resolver. However, if bootstrap address is provided, it will make HTTP query to that address instead of getting address with DNS. This makes DoH independent of system resolver. For some more details, see this (description of how Firefox does resolving).

@kimbo
Copy link
Contributor Author

kimbo commented Dec 30, 2019

@rthalley Lmk what you think about the send_https() function (see the test in 27e24bf, for example). I don't think there's a possibility to make a receive_https() function, as the send_https() function already returns the response.

Originally I was thinking it would make more sense to modify dns.query.https(), so I made it have a required session parameter. I'm not so sure that's a good idea. Should I remove it or make it optional? If we made it optional, when they don't provide it, we would create a new session and close it all within https(); when they did provide it, we would just use it (and not close it).

I also added a bootstrap address option per @filips123 comment. It uses another HTTPAdapter class from the requests-toolbelt package.

This is getting bigger than I was first envisioning, so I don't want to do too much more on this PR. Just lmk what needs to be fixed.

@rthalley rthalley merged commit 0ed9948 into rthalley:master Jan 7, 2020
nrhall pushed a commit to nrhall/dnspython that referenced this pull request Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants