Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolvconf.conf: Remove forced NSCD service restart #42569

Merged
merged 1 commit into from
Jul 6, 2018

Conversation

spacefrogg
Copy link
Contributor

Motivation for this change

Forcibly restarting NSCD is unnecessary and breaks setups that use SSSD for
authentication. NSCD is capable of detecting changes to /etc/resolv.conf and
invalidating its caches internally. Restarting NSCD/SSSD breaks user name and
UID resolution.

Modifying /etc/resolv.conf is correctly notified by the running nscd service which is watching that file.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

Forcibly restarting NSCD is unnecessary and breaks setups that use SSSD for
authentication. NSCD is capable of detecting changes to /etc/resolv.conf and
invalidating its caches internally. Restarting NSCD/SSSD breaks user name and
UID resolution.
@GrahamcOfBorg GrahamcOfBorg added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jun 25, 2018
@spacefrogg
Copy link
Contributor Author

cc @timor

@matthewbauer matthewbauer merged commit 2b4d722 into NixOS:master Jul 6, 2018
@jerith666
Copy link
Contributor

I've had to revert this locally.

I began noticing after a nixos-unstable update in July that my DNS was ... flaky. The main symptom was that, usually but not always, after connecting to my vpnc VPN, DNS lookups would still resolve to the public IPs rather than the private IPs. But lookups would fail intermittently in other ways as well.

I finally had time to strace one such failure, and in so doing learned of the existence of nscd. That gave me another thing to git log --grep for, which is how I found this commit. sudo systemctl restart nscd immediately fixed the DNS misbehaviour I was diagnosing. And reverting this commit seems to have cleared up all of the flakiness for me since I did it five days ago.

@spacefrogg
Copy link
Contributor Author

spacefrogg commented Sep 23, 2018 via email

@jerith666
Copy link
Contributor

/etc/resolv.conf is not a symlink (at least not for me). Here's how it gets modified when the connection is activated:

$ ls -l /etc/resolv.conf; cp -v /etc/resolv.conf pre; nmcli connection up {redacted}\ vpn; ls -l /etc/resolv.conf; diff -U100 pre /etc/resolv.conf ; nmcli connection down {redacted}\ vpn; rm pre
-rw-r--r-- 1 root root 63 Sep 25 21:50 /etc/resolv.conf
'/etc/resolv.conf' -> 'pre'
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/13)
-rw-r--r-- 1 root root 134 Sep 25 21:51 /etc/resolv.conf
--- pre 2018-09-25 21:50:59.853075674 -0400
+++ /etc/resolv.conf    2018-09-25 21:51:00.174074764 -0400
@@ -1,3 +1,6 @@
 # Generated by resolvconf
+search {redacted}.com
+nameserver 10.{redacted}.10
+nameserver 10.{redacted}.6
 nameserver 192.168.0.1
 options edns0
Connection '{redacted} vpn' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/13)

@spacefrogg
Copy link
Contributor Author

@jerith666 Thank you for the data. One other thing you should also see. The journal of nscd.service should show you that nscd recognized that /etc/resolv.conf was written to:

<date> <host> nscd[<pid>]: <pid> monitored file `/etc/resolv.conf` was written to

If this does not happen, this would explain why nscd does not resolve hosts correctly after switching to vpn. This would look like:

  • A bug in nscd's file change detection logic and/or
  • A filesystem bug hiding changes from nscd

As a mitigation, could you please do one more thing. Instead of restarting nscd, could you run nscd --invalidate hosts and see if that fixes your problem? If so, we could merge that back into resolvconf.conf instead of reverting the change altogether.

@jerith666
Copy link
Contributor

I do see the monitored file '/etc/resolv.conf' was written to line in journalctl -u nscd, so it seems like it should be working.

I'll try with 5214655 for a few days and see if the flakiness returns.

With that change, journalctl -u nscd says this:

Sep 26 22:26:08 alpha nscd[17574]: 17574 monitored file `/etc/resolv.conf` was written to
Sep 26 22:26:08 alpha nscd[17574]: 17574 monitoring file `/etc/resolv.conf` (5)
Sep 26 22:26:08 alpha nscd[17574]: 17574 monitoring directory `/etc` (2)

@jerith666
Copy link
Contributor

Flakiness still present. Very odd, here's some more detail:

I have a user-level systemd timer that essentially does:

nmcli connection up vpn
git fetch
nmcli connection down vpn

The fetch is pulling from multiple repos all on the same host behind that vpn. The flakiness manifests in the form of some of those repos failing. But it's all in the same git invocation!

Anyway, this flakiness was reliably absent when I was running with the full nscd restart in resolvconf.conf. Last night, after applying 5214655, it returned.

Here is some journalctl output from the run on the 26th (successful, with full nscd restart) and the 27th (failed, with nscd invalidate):

$ for d in 26 27; do journalctl --since "2018-09-$d 1:29" --until "2018-09-$d 1:35" | egrep 'alpha (kernel|NetworkManager|nm-dispatcher|nscd|sh|systemd|dbus)' | sed 's/Sep .. ..:..:..//' | sed 's/\[[0-9]*\]/[pid]/g' > $d.log; done
$ diff -w -U100 26.log 27.log
--- 26.log      2018-09-27 13:25:01.088972119 -0400
+++ 27.log      2018-09-27 13:25:01.146971937 -0400
@@ -1,103 +1,111 @@
  alpha systemd[pid]: Starting Fetch git.-redacted-.com...
- alpha sh[pid]: Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/15)
+ alpha kernel: tun: Universal TUN/TAP device driver, 1.6
+ alpha sh[pid]: Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/3)
  alpha NetworkManager[pid]: vpnc: can't open pidfile /nix/store/7f8k82k8ajn7z7y2lzlg9y75lqwdd9gg-NetworkManager-vpnc-gnome-1.2.6/var/run/NetworkManager/nm-vpnc-6f82b0c6-cee4-4390-aab1-d83180f1081e.pid for writing
- alpha nscd[pid]: 27695 monitored file `/etc/resolv.conf` was written to
- alpha systemd[pid]: Stopping Name Service Cache Daemon...
- alpha systemd[pid]: Stopped Name Service Cache Daemon.
- alpha systemd[pid]: Starting Name Service Cache Daemon...
- alpha dbus-daemon[pid]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.5' (uid=0 pid=1230 comm="/nix/store/hljilsl7g70nw9l91569ga57vrcq5awh-networ")
+ alpha nscd[pid]: 1243 monitored file `/etc/resolv.conf` was written to
+ alpha nscd[pid]: 1243 monitoring file `/etc/resolv.conf` (5)
+ alpha nscd[pid]: 1243 monitoring directory `/etc` (2)
+ alpha dbus-daemon[pid]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.5' (uid=0 pid=1223 comm="/nix/store/hljilsl7g70nw9l91569ga57vrcq5awh-networ")
  alpha systemd[pid]: Starting Network Manager Script Dispatcher Service...
- alpha nscd[pid]: 30440 monitoring file `/etc/passwd` (1)
- alpha nscd[pid]: 30440 monitoring directory `/etc` (2)
- alpha nscd[pid]: 30440 monitoring file `/etc/group` (3)
- alpha nscd[pid]: 30440 monitoring directory `/etc` (2)
- alpha nscd[pid]: 30440 monitoring file `/etc/hosts` (4)
- alpha nscd[pid]: 30440 monitoring directory `/etc` (2)
- alpha nscd[pid]: 30440 monitoring file `/etc/resolv.conf` (5)
- alpha nscd[pid]: 30440 monitoring directory `/etc` (2)
  alpha dbus-daemon[pid]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
  alpha systemd[pid]: Started Network Manager Script Dispatcher Service.
  alpha nm-dispatcher[pid]: req:1 'vpn-up' [tun0]: new request (1 scripts)
  alpha nm-dispatcher[pid]: req:1 'vpn-up' [tun0]: start running ordered scripts...
  alpha nm-dispatcher[pid]: req:2 'up' [tun0]: new request (1 scripts)
- alpha systemd[pid]: Started Name Service Cache Daemon.
  alpha nm-dispatcher[pid]: req:2 'up' [tun0]: start running ordered scripts...
  alpha sh[pid]: Fetching -redacted-
  alpha sh[pid]: From ssh://git.-redacted-.com/srv/git/code
  alpha sh[pid]:  -redacted git fetch output-
  alpha sh[pid]:  -redacted git fetch output-
  alpha sh[pid]:  -redacted git fetch output-
  alpha sh[pid]:  -redacted git fetch output-
  alpha sh[pid]:  -redacted git fetch output-
+ alpha sh[pid]:  -redacted git fetch output-
+ alpha sh[pid]:  -redacted git fetch output-
  alpha sh[pid]: Auto packing the repository in background for optimum performance.
  alpha sh[pid]: See "git help gc" for manual housekeeping.
  alpha sh[pid]: Fetching -redacted-
- alpha sh[pid]: Auto packing the repository in background for optimum performance.
- alpha sh[pid]: See "git help gc" for manual housekeeping.
+ alpha sh[pid]: ssh: Could not resolve hostname git.-redacted-.com: Name or service not known
+ alpha sh[pid]: fatal: Could not read from remote repository.
+ alpha sh[pid]: Please make sure you have the correct access rights
+ alpha sh[pid]: and the repository exists.
+ alpha sh[pid]: error: Could not fetch authoring-history
  alpha sh[pid]: Fetching -redacted-
- alpha sh[pid]: Auto packing the repository in background for optimum performance.
- alpha sh[pid]: See "git help gc" for manual housekeeping.
+ alpha sh[pid]: ssh: Could not resolve hostname git.-redacted-.com: Name or service not known
+ alpha sh[pid]: fatal: Could not read from remote repository.
+ alpha sh[pid]: Please make sure you have the correct access rights
+ alpha sh[pid]: and the repository exists.
+ alpha sh[pid]: error: Could not fetch buildprocess-history
  alpha sh[pid]: Fetching -redacted-
- alpha sh[pid]: Auto packing the repository in background for optimum performance.
- alpha sh[pid]: See "git help gc" for manual housekeeping.
+ alpha sh[pid]: ssh: Could not resolve hostname git.-redacted-.com: Name or service not known
+ alpha sh[pid]: fatal: Could not read from remote repository.
+ alpha sh[pid]: Please make sure you have the correct access rights
+ alpha sh[pid]: and the repository exists.
+ alpha sh[pid]: error: Could not fetch -redacted--root-history
  alpha sh[pid]: Fetching -redacted-
- alpha sh[pid]: Auto packing the repository in background for optimum performance.
- alpha sh[pid]: See "git help gc" for manual housekeeping.
+ alpha sh[pid]: ssh: Could not resolve hostname git.-redacted-.com: Name or service not known
+ alpha sh[pid]: fatal: Could not read from remote repository.
+ alpha sh[pid]: Please make sure you have the correct access rights
+ alpha sh[pid]: and the repository exists.
+ alpha sh[pid]: error: Could not fetch curriculum-history
  alpha sh[pid]: Fetching -redacted-
- alpha sh[pid]: Auto packing the repository in background for optimum performance.
- alpha sh[pid]: See "git help gc" for manual housekeeping.
+ alpha sh[pid]: ssh: Could not resolve hostname git.-redacted-.com: Name or service not known
+ alpha sh[pid]: fatal: Could not read from remote repository.
+ alpha sh[pid]: Please make sure you have the correct access rights
+ alpha sh[pid]: and the repository exists.
+ alpha sh[pid]: error: Could not fetch lib-history
  alpha sh[pid]: Fetching -redacted-
- alpha sh[pid]: Auto packing the repository in background for optimum performance.
- alpha sh[pid]: See "git help gc" for manual housekeeping.
+ alpha sh[pid]: ssh: Could not resolve hostname git.-redacted-.com: Name or service not known
+ alpha sh[pid]: fatal: Could not read from remote repository.
+ alpha sh[pid]: Please make sure you have the correct access rights
+ alpha sh[pid]: and the repository exists.
+ alpha sh[pid]: error: Could not fetch runtime-history
  alpha sh[pid]: Fetching -redacted-
- alpha sh[pid]: Auto packing the repository in background for optimum performance.
- alpha sh[pid]: See "git help gc" for manual housekeeping.
+ alpha sh[pid]: ssh: Could not resolve hostname git.-redacted-.com: Name or service not known
+ alpha sh[pid]: fatal: Could not read from remote repository.
+ alpha sh[pid]: Please make sure you have the correct access rights
+ alpha sh[pid]: and the repository exists.
+ alpha sh[pid]: error: Could not fetch scripts-history
  alpha sh[pid]: Fetching -redacted-
- alpha sh[pid]: Auto packing the repository in background for optimum performance.
- alpha sh[pid]: See "git help gc" for manual housekeeping.
+ alpha sh[pid]: ssh: Could not resolve hostname git.-redacted-.com: Name or service not known
+ alpha sh[pid]: fatal: Could not read from remote repository.
+ alpha sh[pid]: Please make sure you have the correct access rights
+ alpha sh[pid]: and the repository exists.
+ alpha sh[pid]: error: Could not fetch web-history
  alpha sh[pid]: Fetching -redacted-
  alpha sh[pid]: Auto packing the repository in background for optimum performance.
  alpha sh[pid]: See "git help gc" for manual housekeeping.
  alpha sh[pid]: Fetching -redacted-
  alpha sh[pid]: Auto packing the repository in background for optimum performance.
  alpha sh[pid]: See "git help gc" for manual housekeeping.
  alpha sh[pid]: Fetching -redacted-
  alpha sh[pid]: Auto packing the repository in background for optimum performance.
  alpha sh[pid]: See "git help gc" for manual housekeeping.
  alpha sh[pid]: Fetching -redacted-
  alpha sh[pid]: Auto packing the repository in background for optimum performance.
  alpha sh[pid]: See "git help gc" for manual housekeeping.
  alpha sh[pid]: Fetching -redacted-
  alpha sh[pid]: Auto packing the repository in background for optimum performance.
  alpha sh[pid]: See "git help gc" for manual housekeeping.
  alpha sh[pid]: Auto packing the repository in background for optimum performance.
  alpha sh[pid]: See "git help gc" for manual housekeeping.
- alpha sh[pid]: Connection '-redacted- vpn' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/15)
- alpha nscd[pid]: 30440 monitored file `/etc/resolv.conf` was written to
- alpha systemd[pid]: Stopping Name Service Cache Daemon...
- alpha systemd[pid]: Stopped Name Service Cache Daemon.
- alpha systemd[pid]: Starting Name Service Cache Daemon...
+ alpha sh[pid]: Connection '-redacted- vpn' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/3)
+ alpha nscd[pid]: 1243 monitored file `/etc/resolv.conf` was written to
+ alpha nscd[pid]: 1243 monitoring file `/etc/resolv.conf` (5)
+ alpha nscd[pid]: 1243 monitoring directory `/etc` (2)
  alpha systemd[pid]: Started Fetch git.-redacted-.com.
- alpha nscd[pid]: 30613 monitoring file `/etc/passwd` (1)
- alpha nscd[pid]: 30613 monitoring directory `/etc` (2)
- alpha nscd[pid]: 30613 monitoring file `/etc/group` (3)
- alpha nscd[pid]: 30613 monitoring directory `/etc` (2)
- alpha nscd[pid]: 30613 monitoring file `/etc/hosts` (4)
- alpha nscd[pid]: 30613 monitoring directory `/etc` (2)
- alpha nscd[pid]: 30613 monitoring file `/etc/resolv.conf` (5)
- alpha nscd[pid]: 30613 monitoring directory `/etc` (2)
- alpha dbus-daemon[pid]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.5' (uid=0 pid=1230 comm="/nix/store/hljilsl7g70nw9l91569ga57vrcq5awh-networ")
- alpha systemd[pid]: Starting Network Manager Script Dispatcher Service...
+ alpha dbus-daemon[pid]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.5' (uid=0 pid=1223 comm="/nix/store/hljilsl7g70nw9l91569ga57vrcq5awh-networ")
  alpha NetworkManager[pid]: vpnc: select: Interrupted system call
  alpha NetworkManager[pid]: vpnc: terminated by signal: 15
+ alpha systemd[pid]: Starting Network Manager Script Dispatcher Service...
  alpha dbus-daemon[pid]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
  alpha systemd[pid]: Started Network Manager Script Dispatcher Service.
  alpha nm-dispatcher[pid]: req:1 'vpn-down' [tun0]: new request (1 scripts)
  alpha nm-dispatcher[pid]: req:1 'vpn-down' [tun0]: start running ordered scripts...
- alpha systemd[pid]: Started Name Service Cache Daemon.
+ alpha NetworkManager[pid]: VPNC started in foreground...
  alpha nm-dispatcher[pid]: req:2 'down' [tun0]: new request (1 scripts)
  alpha nm-dispatcher[pid]: req:2 'down' [tun0]: start running ordered scripts...
- alpha NetworkManager[pid]: VPNC started in foreground...
- alpha dbus-daemon[pid]: [session uid=1002 pid=649] Activating service name='org.kde.kwalletd5' requested by ':1.6' (uid=1002 pid=725 comm="kded5 [kdeinit5]                                  ")
- alpha dbus-daemon[pid]: [session uid=1002 pid=649] Failed to activate service 'org.kde.kwalletd5': timed out (service_start_timeout=120000ms)
- alpha dbus-daemon[pid]: [session uid=1002 pid=649] Activating service name='org.kde.kwalletd5' requested by ':1.6' (uid=1002 pid=725 comm="kded5 [kdeinit5]                                  ")
+ alpha dbus-daemon[pid]: [session uid=1000 pid=2137] Activating service name='org.kde.kwalletd5' requested by ':1.6' (uid=1000 pid=2215 comm="kded5 [kdeinit5]                                  ")
+ alpha dbus-daemon[pid]: [session uid=1000 pid=2137] Successfully activated service 'org.kde.kwalletd5'
+ alpha dbus-daemon[pid]: [session uid=1000 pid=2137] Activating service name='org.kde.kwalletd' requested by ':1.27' (uid=1000 pid=4278 comm="/nix/store/d49nah2sa12mkd9nkrjll9khrmf0d4k8-kwalle")
+ alpha dbus-daemon[pid]: [session uid=1000 pid=2137] Activated service 'org.kde.kwalletd' failed: Failed to execute program org.kde.kwalletd: No such file or directory

@spacefrogg
Copy link
Contributor Author

Okay, looking around the Internet shows that nscd caching is a constant source of trouble, not just but also DNS. I'm sorry to trouble you further, but could you try to disable the nscd dns cache in /etc/nscd.conf and see if the problems go away? (Of course without restarting nscd after changing resolv.conf.)

@jerith666
Copy link
Contributor

Okay, trying with c880547, will report back results.

@jerith666
Copy link
Contributor

While making that commit, I also noticed that https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/services/system/nscd.nix#L61 seems to be trying to cause nscd to restart when /etc/hosts (and a few others) changes. But the unit file ends up with paths into the store, which of course will never change, so it has no effect:

$ systemctl cat nscd | grep -i restart-trigger
X-Restart-Triggers=/nix/store/lx2s0ama9994yiv1430yizk7s9fn5k4a-etc-hosts /nix/store/8danxnnq8h86csp8f7fflzllxwpp4a6h-etc-nsswitch.conf /nix/store/x1sg9m5hmy59jb1njgaaj2qqmggcvjwm-etc-nscd.conf

@jerith666
Copy link
Contributor

Sorry for the long radio silence here.

I tried with c880547 for a week or two. I didn't see any lookup failures in the nightly vpn + git fetch job. However, I did see occasional failures in another job that runs every 15 mins.

However, after reverting back to the original state (with libc_restart='${pkgs.systemd}/bin/systemctl try-restart --no-block nscd.service), I also saw a few failures in that every-15-mins job. So it's kind of inconclusive, alas. :-/

@arianvp arianvp mentioned this pull request Nov 26, 2018
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants