Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AF_UNSPEC for name resolution #389

Merged
merged 2 commits into from
Apr 8, 2022
Merged

Conversation

jeckersb
Copy link
Contributor

This reverts most of 1ad97fb, but
keeps tests which still have general applicability.

The reason the original change was made was to try and work around a
bug[1] in the eventlet library. Eventlet monkey-patches the
socket.getaddrinfo function and replaces it with its own async,
eventlet-aware implementation. The reason name resolution was broken
in the first place is because eventlet was consulting DNS first, and
then if that failed, falling back to /etc/hosts, which is just flat
out incorrect behavior.

It's important to note that this was only when running py-amqp under
eventlet, and only for specific versions of eventlet that have long
been fixed. So this workaround is not even needed anymore.

With "normal" (non-eventlet) use, socket.getaddrinfo instead calls
into the glibc getaddrinfo implementation, which ultimately uses
libnss to resolve hostnames.

However, there is an issue with the original workaround when using the
default (glibc) getaddrinfo. The workaround (current) implementation
explicitly forces resolution to use AF_INET (IPv4) and then only if
that does not succeed, it in turn will try with AF_INET6 (IPv6). This
generally works well for IPv4-only hosts, but can be unnecessarily
slow for dual-stack IPv4/IPv6 hosts.

Consider the following:

  • We want to connect to example.org

  • The /etc/hosts file contains an IPv6 entry:

    example.org f00d::1

  • The /etc/nsswitch.conf file contains typical (simplified) hosts
    config:

    hosts: files dns

In this case, the current code will involve nss iterating through the
modules:

  • files (with AF_INET): fails, because there is no IPv4 address in
    /etc/hosts

  • dns (with AF_INET): may or may not succeed per-site, depending on
    how DNS is configured. If DNS is slow/misconfigured, this may incur
    a delay and block for a significant amount of time.

  • files (with AF_INET6): succeeds, and getaddrinfo returns f00d::1.

Now in the same scenario as before, with this fix which reverts back
to using AF_UNSPEC instead:

  • files (with AF_UNSPEC) succeeds, and getaddrinfo returns f00d::1.

There is no need to involve DNS at all. Even a well-configured,
quick-to-respond DNS server is going to be many orders of magnitude
slower than consulting with /etc/hosts which libnss keeps cached in
memory.

[1] https://bugs.launchpad.net/neutron/+bug/1696094/comments/22

This reverts most of 1ad97fb, but
keeps tests which still have general applicability.

The reason the original change was made was to try and work around a
bug[1] in the eventlet library.  Eventlet monkey-patches the
socket.getaddrinfo function and replaces it with its own async,
eventlet-aware implementation.  The reason name resolution was broken
in the first place is because eventlet was consulting DNS first, and
then if that failed, falling back to /etc/hosts, which is just flat
out incorrect behavior.

It's important to note that this was *only* when running py-amqp under
eventlet, and *only* for specific versions of eventlet that have long
been fixed.  So this workaround is not even needed anymore.

With "normal" (non-eventlet) use, socket.getaddrinfo instead calls
into the glibc getaddrinfo implementation, which ultimately uses
libnss to resolve hostnames.

However, there is an issue with the original workaround when using the
default (glibc) getaddrinfo.  The workaround (current) implementation
explicitly forces resolution to use AF_INET (IPv4) and then only if
that does not succeed, it in turn will try with AF_INET6 (IPv6).  This
generally works well for IPv4-only hosts, but can be unnecessarily
slow for dual-stack IPv4/IPv6 hosts.

Consider the following:

- We want to connect to example.org

- The /etc/hosts file contains an IPv6 entry:

    example.org f00d::1

- The /etc/nsswitch.conf file contains typical (simplified) hosts
  config:

    hosts: files dns

In this case, the current code will involve nss iterating through the
modules:

- files (with AF_INET): fails, because there is no IPv4 address in
  /etc/hosts

- dns (with AF_INET): may or may not succeed per-site, depending on
  how DNS is configured.  If DNS is slow/misconfigured, this may incur
  a delay and block for a significant amount of time.

- files (with AF_INET6): succeeds, and getaddrinfo returns f00d::1.

Now in the same scenario as before, with this fix which reverts back
to using AF_UNSPEC instead:

- files (with AF_UNSPEC) succeeds, and getaddrinfo returns f00d::1.

There is no need to involve DNS at all.  Even a well-configured,
quick-to-respond DNS server is going to be many orders of magnitude
slower than consulting with /etc/hosts which libnss keeps cached in
memory.

[1] https://bugs.launchpad.net/neutron/+bug/1696094/comments/22
auvipy
auvipy previously requested changes Mar 24, 2022
Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the intent of the patch, however can you provide some additional unit & integration tests for the changes proposed?

@jeckersb
Copy link
Contributor Author

I've added a unit test to ensure AF_UNSPEC is used with getaddrinfo. I'm not sure if an integration test makes sense here.

@auvipy
Copy link
Member

auvipy commented Mar 24, 2022

just for sanity, can you dig it to see if integration test is viable in this case?

@jeckersb
Copy link
Contributor Author

just for sanity, can you dig it to see if integration test is viable in this case?

Sorry, I somehow missed your reply here.

No, I don't think it makes sense to have an integration test for this. What would we even be testing it against? Standing up a DNS server and making sure it replies the way we expect? I think the unit test ensuring that we are using AF_UNSPEC is enough, and to trust that the resolver is working as documented.

@jeckersb
Copy link
Contributor Author

I'm a little bit concerned about the security implications of this patch. At least according to this answer on StackOverflow it seems that using AF_UNSPEC exposes most Linux machines to malicious DNS attacks.

It's 2022, nobody should still be using a vulnerable, unpatched glibc from 2015.

@auvipy auvipy requested a review from matusvalo April 4, 2022 09:20
Copy link
Member

@matusvalo matusvalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not expert for low level TCP/IP but seems OK and integration tests passed.

@auvipy auvipy merged commit 98f6d36 into celery:master Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants