Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hostname --fqdn broken with nsncd #261269

Closed
Majiir opened this issue Oct 15, 2023 · 6 comments · Fixed by #263634
Closed

hostname --fqdn broken with nsncd #261269

Majiir opened this issue Oct 15, 2023 · 6 comments · Fixed by #263634

Comments

@Majiir
Copy link
Contributor

Majiir commented Oct 15, 2023

Describe the bug

hostname --fqdn only returns the hostname, even when both networking.hostName and networking.domain are set.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Configure networking.hostName and networking.domain.
  2. Run hostname --fqdn.

Expected behavior

hostname --fqdn should return the FQDN defined by networking.hostName and networking.domain.

Additional context

The breakage happens when nsncd is used, which is now the default in NixOS after #214153. Stopping nsncd.service causes hostname --fqdn to return the correct value.

nixosTests.hostname.explicitDomain reproduces the issue. It broke at fbfe290, which is when nsncd became enabled by default. It's still broken on master which is b137e2f at time of writing.

Possibly related issues:

Notify maintainers

@flokli @lukegb @NickCao @rnhmjoj @NinjaTrappeur @mweinelt

Metadata

$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.56, NixOS, 23.05 (Stoat), 23.05.20231011.bd1cde4`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.13.5`
 - channels(root): `""`
 - channels(majiir): `""`
 - nixpkgs: `/nix/store/mbz4hixfgxq5b6vc0k3pp2iglcd4c353-source`
@lukegb
Copy link
Contributor

lukegb commented Oct 15, 2023

This seems to happen because the resolution flow (when nscd is running) looks like:

  1. hostname calls gethostname, which uses the nodename (uname -n), which is the short hostname
  2. hostname then calls gethostbyname, which invokes the nss machinery
  3. nscd gets the request and processes the request according to the NSS rules; on my NixOS system this is mymachines resolve files myhostname dns
  4. mymachines (from systemd) doesn't match
  5. resolve (from systemd) does match, and returns a synthesized record; systemd internally determines the hostname for this using only the nodename, so unless the nodename is already the FQDN then we fail here, and it does this before checking /etc/hosts.

I wonder if we can fix this by patching out the support in systemd-resolved for looking up the nodename, which will make it fall back to the result from /etc/hosts instead; on NixOS we can (probably) always assume that /etc/hosts will actually have the correct system hostname in it.

@flokli
Copy link
Contributor

flokli commented Oct 15, 2023

No, this is most likely nix-community/nsncd#9.

We wanted to polish that PR in the next days, and once merged, bump our nixpkgs pin (and resurrect the upstreaming attempts).

If you want, can you check if using nsncd from this PR fixes the problem for you?

@lukegb
Copy link
Contributor

lukegb commented Oct 15, 2023

Ah, actually, I think I'm just talking out of my ass. systemd-resolved seems to be able to cope with this properly when I query it over DNS, so it probably isn't to blame.

@Majiir
Copy link
Contributor Author

Majiir commented Oct 15, 2023

No, this is most likely nix-community/nsncd#9.
If you want, can you check if using nsncd from this PR fixes the problem for you?

nsncd tests fail at that PR. Setting doCheck = false gets a successful build, which then passes the hostname --fqdn test in nixosTests.hostname.explicitDomain, but it fails on the dnsdomainname test.

@picnoir
Copy link
Member

picnoir commented Oct 16, 2023

nsncd tests fail at that PR

This is a pretty weird behavior. What's (likely) failing here is the IPv6 resolution of localhost. The test does pass on some NixOS setup, like my desktop (and the Ubuntu CI) but does fail on my laptop. I'm still not sure what's happening, ::1 localhost is present in /etc/hosts. There's something wrong with some NixOS setup, it'd be nice to fix that as well. By any chance, did you disable systemd-resolved?

Thanks for the heads up wrt. dnsdomainname. I'll have a look today.

@Majiir
Copy link
Contributor Author

Majiir commented Oct 16, 2023

The behavior I described is from the Nix build environment. I used the nixpkgs derivation for nsncd and updated the source to the PR.

picnoir added a commit to picnoir/nixpkgs that referenced this issue Oct 26, 2023
Note: we decided to rewrite the history of the fork who somehow got
out of hand. Feature-wise, this version bump fixes the various host
faulty behaviour. See the
nix-community/nsncd#9 and
nix-community/nsncd#10 PRs for more details.

We're in the process of upstreaming this change to twosigma/nsncd,
however, upstream has been pretty slow to review our PRs so far. Since
the hostname bug surfaces quite regularly in the Nixpkgs issue
tracker, we decided to use the nix-community fork as canon for Nixpkgs
for now.

Fixes: NixOS#132646
Fixes: NixOS#261269
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants