Skip to content
This repository has been archived by the owner on Jan 25, 2024. It is now read-only.

Imcomplete gethostbyname response #4

Closed
picnoir opened this issue Mar 7, 2023 · 8 comments · Fixed by twosigma/nsncd#71
Closed

Imcomplete gethostbyname response #4

picnoir opened this issue Mar 7, 2023 · 8 comments · Fixed by twosigma/nsncd#71
Labels
bug Something isn't working

Comments

@picnoir
Copy link
Member

picnoir commented Mar 7, 2023

There's something wrong with the getaddrinfo operation.

Running hostname --fqdn and dumping the nscd socket:

NSNCD:

mars 07 14:24:18 framework sockdump[96606]: 14:24:18.768 >>> process hostname [98811 -> 70129] len 18(18)
mars 07 14:24:18 framework sockdump[96606]: 0000  02 00 00 00 0d 00 00 00  06 00 00 00 68 6f 73 74  ............host
mars 07 14:24:18 framework sockdump[96606]: 0010  73 00                                             s.
mars 07 14:24:18 framework sockdump[96606]: 14:24:18.768 >>> process hostname [98811 -> 70129] len 22(22)
mars 07 14:24:18 framework sockdump[96606]: 0000  02 00 00 00 05 00 00 00  0a 00 00 00 66 72 61 6d  ............fram
mars 07 14:24:18 framework sockdump[96606]: 0010  65 77 6f 72 6b 00                                 ework.


NSCD:

mars 07 14:25:53 framework sockdump[96606]: 14:25:53.784 >>> process hostname [101011 -> 100886] len 22(22)
mars 07 14:25:53 framework sockdump[96606]: 0000  02 00 00 00 05 00 00 00  0a 00 00 00 66 72 61 6d  ............fram
mars 07 14:25:53 framework sockdump[96606]: 0010  65 77 6f 72 6b 00                                 ework.
mars 07 14:25:53 framework sockdump[96606]: 14:25:53.784 >>> process nscd [100886 -> 101011] len 90(90)
mars 07 14:25:53 framework sockdump[96606]: 0000  02 00 00 00 01 00 00 00  1c 00 00 00 01 00 00 00  ................
mars 07 14:25:53 framework sockdump[96606]: 0010  0a 00 00 00 10 00 00 00  01 00 00 00 00 00 00 00  ................
mars 07 14:25:53 framework sockdump[96606]: 0020  66 72 61 6d 65 77 6f 72  6b 2e 61 6c 74 65 72 6e  framework.altern
mars 07 14:25:53 framework sockdump[96606]: 0030  61 74 69 76 65 62 69 74  2e 66 72 00 0a 00 00 00  ativebit.fr.....
mars 07 14:25:53 framework sockdump[96606]: 0040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 01  ................
mars 07 14:25:53 framework sockdump[96606]: 0050  66 72 61 6d 65 77 6f 72  6b 00                    framework.

Nixpkgs issue: NixOS/nixpkgs#196934

@picnoir picnoir added the bug Something isn't working label Mar 7, 2023
@picnoir
Copy link
Member Author

picnoir commented Mar 7, 2023

We're responding with the request's hostname:

canon_name: hostname.to_string(),
. That's probably the root issue here.

There's likely more informations in the canonname field of the AddrInfo struct https://docs.rs/dns-lookup/latest/dns_lookup/struct.AddrInfo.html

@picnoir
Copy link
Member Author

picnoir commented Mar 7, 2023

Wait a minute, it actually seems like nsncd is not responding to hostname at all actually!

Playing a bit more with nscd and this sock dump systemd service:

    systemd.services.sockdump = {
      wantedBy = [ "multi-user.target" ];
      path = [
        # necessary for bcc to unpack kernel headers and invoke modprobe
        pkgs.gnutar
        pkgs.xz.bin
        pkgs.kmod
      ];
      environment.PYTHONUNBUFFERED = "1";

      serviceConfig = {
        ExecStart = "${pkgs.sockdump}/bin/sockdump /var/run/nscd/socket";
        Restart = "on-failure";
        RestartSec = "1";
        Type = "simple";
      };
    };

I don't see any nsncd response on the socket (while I do see nscd's ones). I can still resolve test.localhost though, indicating the NSS setup still works. No error logs in nsncd.

Is it:

  1. Nsncd being completely broken somehow (it used to work when we released the thing).
  2. An issue with sockdump.

I need to instrument my nsncd a bit more to debug that further. I'll come back to this.

@picnoir
Copy link
Member Author

picnoir commented Mar 7, 2023

So, it seems like hostname --fqdn is using the GETHOSTBYNAMEv6 operation in the end.

Sadly for us, the getaddrinfo function provided by the dns_lookup crate is not returning the canonical name :(

@picnoir
Copy link
Member Author

picnoir commented Mar 7, 2023

Okay, I found the issue. In GETHOSTBYNAMEV6 (and v4), we're using getaddrinfo instead of gethostbyname (https://man7.org/linux/man-pages/man3/gethostbyname.3.html).

There's no bindings for that function in nix nor in dns_lookup. Looks like we need to write the ffi call ourselves.

@picnoir
Copy link
Member Author

picnoir commented Mar 7, 2023

So, overall, to fix this:

  1. write the gethostbyname FFI call to a Nix fork. EDIT: let's vendor that in to start with. We'd need to port that across different Unix archs if we were up to upstream that to Nix directly.
  2. temporarily point Nsncd to this Nix fork.
  3. when the Nix PR gets merged, move Nsncd back to mainline Nix.

Bonus points for:

  • Write a getaddrinfo FFI to Nix.
  • Drop the dns_lookup crate dependency.

@picnoir
Copy link
Member Author

picnoir commented May 18, 2023

^ Yet another bug report in Nixpkgs, we procrastinated this long enough, it needs to be fixed.

Flokli and I booked a pairing session to fix that tomorrow afternoon.

@picnoir picnoir changed the title Imcomplete GETAI response Imcomplete gethostbyname response May 18, 2023
@RaitoBezarius
Copy link
Member

Just encountered this bug in production :<

@picnoir
Copy link
Member Author

picnoir commented Jun 8, 2023

Erf. A fix is on the way there: https://github.com/nix-community/nsncd/tree/nin/ghbn

Doing this on the side, I can't give a clear ETA, but massive plot twist aside, we're almost there.

picnoir pushed a commit that referenced this issue Oct 8, 2023
We internally used getai to restpond to the gethostbyname operations.
Sadly, it did not behave as expected and broke some tools (like
hostname --fqdn, see #4.

We FFI the right Glibc gethostbyname_2r (now deprecated) function and
use it to back the GETHOSTBYNAME and GETHOSTBYNAME6 Nscd interfaces.

Fixes  #4
picnoir added a commit that referenced this issue Oct 8, 2023
We internally used getai to restpond to the gethostbyname operations.
Sadly, it did not behave as expected and broke some tools (like
hostname --fqdn, see #4.

We FFI the right Glibc gethostbyname_2r (now deprecated) function and
use it to back the GETHOSTBYNAME and GETHOSTBYNAME6 Nscd interfaces.

Fixes  #4
picnoir added a commit that referenced this issue Oct 8, 2023
We internally used getai to restpond to the gethostbyname operations.
Sadly, it did not behave as expected and broke some tools (like
hostname --fqdn, see #4.

We FFI the right Glibc gethostbyname_2r (now deprecated) function and
use it to back the GETHOSTBYNAME and GETHOSTBYNAME6 Nscd interfaces.

Fixes  #4
picnoir added a commit that referenced this issue Oct 8, 2023
We internally used getai to restpond to the gethostbyname operations.
Sadly, it did not behave as expected and broke some tools (like
hostname --fqdn, see #4.

We FFI the right Glibc gethostbyname_2r (now deprecated) function and
use it to back the GETHOSTBYNAME and GETHOSTBYNAME6 Nscd interfaces.

Fixes  #4
picnoir added a commit that referenced this issue Oct 8, 2023
We internally used getai to restpond to the gethostbyname operations.
Sadly, it did not behave as expected and broke some tools (like
hostname --fqdn, see #4.

We FFI the right Glibc gethostbyname_2r (now deprecated) function and
use it to back the GETHOSTBYNAME and GETHOSTBYNAME6 Nscd interfaces.

Fixes  #4
Mic92 pushed a commit that referenced this issue Oct 11, 2023
We internally used getai to restpond to the gethostbyname operations.
Sadly, it did not behave as expected and broke some tools (like
hostname --fqdn, see #4.

We FFI the right Glibc gethostbyname_2r (now deprecated) function and
use it to back the GETHOSTBYNAME and GETHOSTBYNAME6 Nscd interfaces.

Fixes  #4
picnoir added a commit that referenced this issue Oct 11, 2023
We internally used getai to restpond to the gethostbyname operations.
Sadly, it did not behave as expected and broke some tools (like
hostname --fqdn, see #4.

We FFI the right Glibc gethostbyname_2r (now deprecated) function and
use it to back the GETHOSTBYNAME and GETHOSTBYNAME6 Nscd interfaces.

Fixes  #4
picnoir added a commit that referenced this issue Oct 11, 2023
We internally used getai to restpond to the gethostbyname operations.
Sadly, it did not behave as expected and broke some tools (like
hostname --fqdn, see #4.

We FFI the right Glibc gethostbyname_2r (now deprecated) function and
use it to back the GETHOSTBYNAME and GETHOSTBYNAME6 Nscd interfaces.

Fixes  #4
picnoir added a commit that referenced this issue Oct 16, 2023
We internally used getai to restpond to the gethostbyname operations.
Sadly, it did not behave as expected and broke some tools (like
hostname --fqdn, see #4.

We FFI the right Glibc gethostbyname_2r (now deprecated) function and
use it to back the GETHOSTBYNAME and GETHOSTBYNAME6 Nscd interfaces.

Fixes  #4

Reviewed-by: Jörg Thalheim <joerg@thalheim.io>
picnoir added a commit to picnoir/nsncd that referenced this issue Oct 19, 2023
We internally used getai to restpond to the gethostbyname operations.
Sadly, it did not behave as expected and broke some tools (like
hostname --fqdn, see nix-community#4.

We FFI the right Glibc gethostbyname_2r (now deprecated) function and
use it to back the GETHOSTBYNAME and GETHOSTBYNAME6 Nscd interfaces.

Using sockburp, we realized the hostent serialization function was
bogus: we totally forgot to serialize the aliases. This commit fixes
this and makes sure we're producing bit-to-bit identical results with
Nscd for gethostbyname/getaddrinfo.

Fixes  nix-community#4
picnoir added a commit to picnoir/nsncd that referenced this issue Oct 19, 2023
We internally used getai to restpond to the
gethostbyname/gethostbyaddr operations. Sadly, it does not behave as
expected and breaks some tools (like hostname --fqdn, see
nix-community#4.

We FFI the right Glibc gethostbyname_2r/gethostbyaddr_2r (now
deprecated) functions and use it to back the GETHOSTBYNAME,
GETHOSTBYNAME6, GETHOSTBYADDR, and GETHOSTBYADDR6 Nscd interfaces.

Using sockburp, we realized the hostent serialization function was
bogus: we totally forgot to serialize the aliases. This commit fixes
this and makes sure we're producing bit-to-bit identical results with
Nscd for gethostbyname/getaddrinfo.

Took me three try to get this right. This is actually the third
full rewrite.

The Nscd behaviour for these two legacy functions is *really*
confusing. We're supposed to ignore the herrno (herrno != errno!!) and
set it to 0 if gethostbyaddr/name returns a non-null hostent. If we
end up with a null hostent, we return the herrno together with a dummy
hostent header.

I tried to keep things as safe as possible by extracting the glibc
hostent to a proper rust structure. This structure mirrors the libc
hostent in a Rust idiomatic way. We should probably try to upstream
the FFI part of this commit to the Nix crate at some point.

Fixes nix-community#4
picnoir added a commit that referenced this issue Oct 19, 2023
We internally used getai to restpond to the
gethostbyname/gethostbyaddr operations. Sadly, it does not behave as
expected and breaks some tools (like hostname --fqdn, see
#4.

We FFI the right Glibc gethostbyname_2r/gethostbyaddr_2r (now
deprecated) functions and use it to back the GETHOSTBYNAME,
GETHOSTBYNAME6, GETHOSTBYADDR, and GETHOSTBYADDR6 Nscd interfaces.

Using sockburp, we realized the hostent serialization function was
bogus: we totally forgot to serialize the aliases. This commit fixes
this and makes sure we're producing bit-to-bit identical results with
Nscd for gethostbyname/getaddrinfo.

Took me three try to get this right. This is actually the third
full rewrite.

The Nscd behaviour for these two legacy functions is *really*
confusing. We're supposed to ignore the herrno (herrno != errno!!) and
set it to 0 if gethostbyaddr/name returns a non-null hostent. If we
end up with a null hostent, we return the herrno together with a dummy
hostent header.

I tried to keep things as safe as possible by extracting the glibc
hostent to a proper rust structure. This structure mirrors the libc
hostent in a Rust idiomatic way. We should probably try to upstream
the FFI part of this commit to the Nix crate at some point.

Fixes #4
picnoir added a commit that referenced this issue Oct 19, 2023
We internally used getai to restpond to the
gethostbyname/gethostbyaddr operations. Sadly, it does not behave as
expected and breaks some tools (like hostname --fqdn, see
#4.

We FFI the right Glibc gethostbyname_2r/gethostbyaddr_2r (now
deprecated) functions and use it to back the GETHOSTBYNAME,
GETHOSTBYNAME6, GETHOSTBYADDR, and GETHOSTBYADDR6 Nscd interfaces.

Using sockburp, we realized the hostent serialization function was
bogus: we totally forgot to serialize the aliases. This commit fixes
this and makes sure we're producing bit-to-bit identical results with
Nscd for gethostbyname/getaddrinfo.

Took me three try to get this right. This is actually the third
full rewrite.

The Nscd behaviour for these two legacy functions is *really*
confusing. We're supposed to ignore the herrno (herrno != errno!!) and
set it to 0 if gethostbyaddr/name returns a non-null hostent. If we
end up with a null hostent, we return the herrno together with a dummy
hostent header.

I tried to keep things as safe as possible by extracting the glibc
hostent to a proper rust structure. This structure mirrors the libc
hostent in a Rust idiomatic way. We should probably try to upstream
the FFI part of this commit to the Nix crate at some point.

Fixes #4
picnoir added a commit that referenced this issue Oct 20, 2023
We internally used getai to restpond to the
gethostbyname/gethostbyaddr operations. Sadly, it does not behave as
expected and breaks some tools (like hostname --fqdn, see
#4.

We FFI the right Glibc gethostbyname_2r/gethostbyaddr_2r (now
deprecated) functions and use it to back the GETHOSTBYNAME,
GETHOSTBYNAME6, GETHOSTBYADDR, and GETHOSTBYADDR6 Nscd interfaces.

Using sockburp, we realized the hostent serialization function was
bogus: we totally forgot to serialize the aliases. This commit fixes
this and makes sure we're producing bit-to-bit identical results with
Nscd for gethostbyname/getaddrinfo.

Took me three try to get this right. This is actually the third
full rewrite.

The Nscd behaviour for these two legacy functions is *really*
confusing. We're supposed to ignore the herrno (herrno != errno!!) and
set it to 0 if gethostbyaddr/name returns a non-null hostent. If we
end up with a null hostent, we return the herrno together with a dummy
hostent header.

I tried to keep things as safe as possible by extracting the glibc
hostent to a proper rust structure. This structure mirrors the libc
hostent in a Rust idiomatic way. We should probably try to upstream
the FFI part of this commit to the Nix crate at some point.

Fixes #4
picnoir added a commit that referenced this issue Oct 20, 2023
We internally used getai to restpond to the
gethostbyname/gethostbyaddr operations. Sadly, it does not behave as
expected and breaks some tools (like hostname --fqdn, see
#4.

We FFI the right Glibc gethostbyname_2r/gethostbyaddr_2r (now
deprecated) functions and use it to back the GETHOSTBYNAME,
GETHOSTBYNAME6, GETHOSTBYADDR, and GETHOSTBYADDR6 Nscd interfaces.

Using sockburp, we realized the hostent serialization function was
bogus: we totally forgot to serialize the aliases. This commit fixes
this and makes sure we're producing bit-to-bit identical results with
Nscd for gethostbyname/getaddrinfo.

Took me three try to get this right. This is actually the third
full rewrite.

The Nscd behaviour for these two legacy functions is *really*
confusing. We're supposed to ignore the herrno (herrno != errno!!) and
set it to 0 if gethostbyaddr/name returns a non-null hostent. If we
end up with a null hostent, we return the herrno together with a dummy
hostent header.

I tried to keep things as safe as possible by extracting the glibc
hostent to a proper rust structure. This structure mirrors the libc
hostent in a Rust idiomatic way. We should probably try to upstream
the FFI part of this commit to the Nix crate at some point.

Fixes #4
picnoir added a commit to picnoir/nsncd that referenced this issue Oct 20, 2023
We internally used getai to restpond to the
gethostbyname/gethostbyaddr operations. Sadly, it does not behave as
expected and breaks some tools (like hostname --fqdn, see
nix-community#4.

We FFI the right Glibc gethostbyname_2r/gethostbyaddr_2r (now
deprecated) functions and use it to back the GETHOSTBYNAME,
GETHOSTBYNAME6, GETHOSTBYADDR, and GETHOSTBYADDR6 Nscd interfaces.

Using sockburp, we realized the hostent serialization function was
bogus: we totally forgot to serialize the aliases. This commit fixes
this and makes sure we're producing bit-to-bit identical results with
Nscd for gethostbyname/getaddrinfo.

Took me three try to get this right. This is actually the third
full rewrite.

The Nscd behaviour for these two legacy functions is *really*
confusing. We're supposed to ignore the herrno (herrno != errno!!) and
set it to 0 if gethostbyaddr/name returns a non-null hostent. If we
end up with a null hostent, we return the herrno together with a dummy
hostent header.

I tried to keep things as safe as possible by extracting the glibc
hostent to a proper rust structure. This structure mirrors the libc
hostent in a Rust idiomatic way. We should probably try to upstream
the FFI part of this commit to the Nix crate at some point.

Fixes nix-community#4
picnoir added a commit that referenced this issue Oct 30, 2023
This adds support for the GETAI, GETHOSTBYADDR, GETHOSTBYADDRv6,
GETHOSTBYNAME, GETHOSTBYNAMEv6 request types.

For the more complex GETAI lookup, we use the dns_lookup crate.

In previous iterations of this change, we also used the same underlying
getaddrinfo call to respond to the  gethostbyname/gethostbyaddr
operations.

Even though gethostbyname/gethostbyaddr officially are deprecated,
there's a lot of tools still using it, and relying on them behaving
differently.

So it's important to still implement it, with exactly the same
behaviour, to prevent some tools from breaking (like `hostname --fqdn`,
see #4).

We FFI the right Glibc gethostbyname_2r/gethostbyaddr_2r (now
deprecated) functions and use it to back the GETHOSTBYNAME,
GETHOSTBYNAME6, GETHOSTBYADDR, and GETHOSTBYADDR6 Nscd interfaces.

Took me three try to get this right. This is actually the third
full rewrite.

The Nscd behaviour for these two legacy functions is *really*
confusing. We're supposed to ignore the herrno (herrno != errno!!) and
set it to 0 if gethostbyaddr/name returns a non-null hostent. If we
end up with a null hostent, we return the herrno together with a dummy
hostent header.

I tried to keep things as safe as possible by extracting the glibc
hostent to a proper rust structure. This structure mirrors the libc
hostent in a Rust idiomatic way. We should probably try to upstream
the FFI part of this commit to the Nix crate at some point.

Fixes #4

Co-Authored-By: Florian Klink <flokli@flokli.de>
picnoir added a commit that referenced this issue Oct 30, 2023
This adds support for the GETAI, GETHOSTBYADDR, GETHOSTBYADDRv6,
GETHOSTBYNAME, GETHOSTBYNAMEv6 request types.

For the more complex GETAI lookup, we use the dns_lookup crate.

In previous iterations of this change, we also used the same underlying
getaddrinfo call to respond to the  gethostbyname/gethostbyaddr
operations.

Even though gethostbyname/gethostbyaddr officially are deprecated,
there's a lot of tools still using it, and relying on them behaving
differently.

So it's important to still implement it, with exactly the same
behaviour, to prevent some tools from breaking (like `hostname --fqdn`,
see #4).

We FFI the right Glibc gethostbyname_2r/gethostbyaddr_2r (now
deprecated) functions and use it to back the GETHOSTBYNAME,
GETHOSTBYNAME6, GETHOSTBYADDR, and GETHOSTBYADDR6 Nscd interfaces.

Took me three try to get this right. This is actually the third
full rewrite.

The Nscd behaviour for these two legacy functions is *really*
confusing. We're supposed to ignore the herrno (herrno != errno!!) and
set it to 0 if gethostbyaddr/name returns a non-null hostent. If we
end up with a null hostent, we return the herrno together with a dummy
hostent header.

I tried to keep things as safe as possible by extracting the glibc
hostent to a proper rust structure. This structure mirrors the libc
hostent in a Rust idiomatic way. We should probably try to upstream
the FFI part of this commit to the Nix crate at some point.

Fixes #4

Co-Authored-By: Florian Klink <flokli@flokli.de>
flokli added a commit that referenced this issue Oct 31, 2023
This adds support for the GETAI, GETHOSTBYADDR, GETHOSTBYADDRv6,
GETHOSTBYNAME, GETHOSTBYNAMEv6 request types.

For the more complex GETAI lookup, we use the dns_lookup crate.

In previous iterations of this change, we also used the same underlying
getaddrinfo call to respond to the  gethostbyname/gethostbyaddr
operations.

Even though gethostbyname/gethostbyaddr officially are deprecated,
there's a lot of tools still using it, and relying on them behaving
differently.

So it's important to still implement it, with exactly the same
behaviour, to prevent some tools from breaking (like `hostname --fqdn`,
see #4).

We FFI the right Glibc gethostbyname_2r/gethostbyaddr_2r (now
deprecated) functions and use it to back the GETHOSTBYNAME,
GETHOSTBYNAME6, GETHOSTBYADDR, and GETHOSTBYADDR6 Nscd interfaces.

Took me three try to get this right. This is actually the third
full rewrite.

The Nscd behaviour for these two legacy functions is *really*
confusing. We're supposed to ignore the herrno (herrno != errno!!) and
set it to 0 if gethostbyaddr/name returns a non-null hostent. If we
end up with a null hostent, we return the herrno together with a dummy
hostent header.

I tried to keep things as safe as possible by extracting the glibc
hostent to a proper rust structure. This structure mirrors the libc
hostent in a Rust idiomatic way. We should probably try to upstream
the FFI part of this commit to the Nix crate at some point.

Fixes #4

Co-Authored-By: Florian Klink <flokli@flokli.de>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants