-
-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/nscd: DynamicUser and other cleanups #64268
Conversation
This postStart step was introduced on 2014-04-24 with the comment that "Nscd forks into the background before it's ready to accept connections." However, that was fixed upstream almost two months earlier, on 2014-03-03, with the comment that "This, along with setting the nscd service type to forking in its systemd configuration file, allows systemd to be certain that the nscd service is ready and is accepting connections." The fix was released several months later in glibc 2.20, which was merged in NixOS sometime before 15.09, so it certainly should be safe to remove this workaround by now.
Previously this module created both /var/db/nscd and /run/nscd using shell commands in a preStart script. Note that both of these paths are hard-coded in the nscd source. (Well, the latter is actually /var/run/nscd but /var/run is a symlink to /run so it works out the same.) /var/db/nscd is only used if the nscd.conf "persistent" option is turned on for one or more databases, which it is not in our default config file. I'm not even sure persistent mode can work under systemd, since `nscd --shutdown` is not synchronous so systemd will always unceremoniously kill nscd without reliably giving it time to mark the databases as unused. Nonetheless, if someone wants to use that option, they can ensure the directory exists using systemd.tmpfiles.rules. systemd can create /run/nscd for us with the RuntimeDirectory directive, with the added benefit of causing systemd to delete the directory on service stop or restart. The default value of RuntimeDirectoryMode is 755, the same as the mode which this module was using before. I don't think the `rm -f /run/nscd/nscd.pid` was necessary after NixOS switched to systemd and used its PIDFile directive, because systemd deletes the specified file after the service stops, and because the file can't persist across reboots since /run is a tmpfs. Even if the file still exists when nscd starts, it's only a problem if the pid it contains has been reused by another process, which is unlikely. Anyway, this change makes that deletion even less necessary, because now systemd deletes the entire /run/nscd directory when the service stops.
nscd doesn't create any files outside of /run/nscd unless the nscd.conf "persistent" option is used, which we don't do by default. Therefore it doesn't matter what UID/GID we run this service as, so long as it isn't shared with any other running processes. /run/nscd does need to be owned by the same UID that the service is running as, but systemd takes care of that for us thanks to the RuntimeDirectory directive. If someone wants to turn on the "persistent" option, they need to manually configure users.users.nscd and systemd.tmpfiles.rules so that /var/db/nscd is owned by the same user that nscd runs as. In an all-defaults boot.isContainer configuration of NixOS, this removes the only user which did not have a pre-assigned UID.
These options were being set to the same value as the defaults that are hardcoded in nscd. Delete them so it's clear which settings are actually important for NixOS. One exception is `threads 1`, which is different from the built-in default of 4. However, both values are equivalent because nscd forces the number of threads to be at least as many as the number of kinds of databases it supports, which is 5.
On first glance, this looks fine to me. |
@@ -67,6 +55,9 @@ in | |||
serviceConfig = | |||
{ ExecStart = "@${pkgs.glibc.bin}/sbin/nscd nscd"; | |||
Type = "forking"; | |||
User = "nscd"; | |||
DynamicUser = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that means that systemd will no longer try to use nscd to check the dynamic user? How do they implement that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, DynamicUser
should only work when nscd
is on, as nscd
dispatches to the correct nss_systemd.so
. So this indeed sounds like something that is not gonna work, given that nscd
is not running when nscd
is starting up :P
However, note I think that nscd
bypasses delegating to nscd
when doing any nss
related syscalls itself (for good reason; otherwise it would end up in an endless loop), so maybe all the nss
related syscalls in the nscd
binary hit the same codepath as requests from others programs that go through the nscd
socket? Because in that case, this will work.
Non of this is documented so we could find out by reading the C source code... Or just disable DynamicUser
for nscd
so that we're on the safe side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glibc always has to be able to answer passwd queries and such whether nscd is running or not. How does it know? It tries opening the socket that nscd listens on, and if that fails, it does the lookup itself. (There's also a glibc-internal function which nscd itself calls to prevent even checking for the socket, as you noted.)
So this works fine: systemd does the lookup of a user and group named nscd
before it has started the service, the socket doesn't work, so glibc falls back to having libnss_files.so
handle the query in-process. If you want to configure a static user for nscd, you can do it in /etc/passwd
and systemd will find it.
Also: any code in the systemd
package doesn't need nscd to find its own nsswitch modules, because they're already in its library path. So without nscd, systemd would only get the wrong answers for these passwd/group lookups on systems configured with nsswitch modules that are neither the glibc builtins nor systemd's extras.
That means the real constraint here is that you can't override this DynamicUser
setting by defining an nscd
user in LDAP, Google OS Login, etc. Maybe that's worth documenting, but it's really a nonsense thing to even try doing...
This does make me want to try getting the username nscd
on some large LDAP-based network somewhere and see how much I can break though. 😈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Any service started under DynamicUser
must have either libnss_systemd.so
in its LD_LIBRARY_PATH
or have access to the nscd
socket in order for DynamicUser
to work.
My worry was because nscd
itself doesn't access the socket it wouldn't work. But obviously nscd
has libnss_systemd.so
in LD_LIBRARY_PATH
so it will indeed work.
So indeed, it seems we can leave DynamicUser = true
with no issues.
On a related note, I think you can drop the User = nscd
option by the way. Because the service name is nscd.service
and DynamicUser = true
, systemd already automatically sets User = nscd
The user and group name to use may be configured via
User= and Group= (see above). If these options are not used and
dynamic user/group allocation is enabled for a unit, the name of
the dynamic user/group is implicitly derived from the unit name. If
the unit name without the type suffix qualifies as valid user name
it is used directly, otherwise a name incorporating a hash of it is
used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about removing User
, thanks! I've retested with that change and indeed the selected username is still "nscd", as expected.
Thanks to @arianvp for pointing out that when DynamicUser is true, systemd defaults the value of User to be the name of the unit, which in this case is already "nscd".
I just tried running Those tests didn't instantiate at all on master; I've submitted a fix for that in #64387. But after fixing that, those tests all pass on master but two of them fail on my branch. In particular, running this in the test driver: print($client2->execute('id -u test-ldap-user')) returns:
So the client configuration where The
So I'm... puzzled. |
Okay, I got something vaguely useful from: print($client2->execute('/nix/store/.../bin/strace -f -p $(pgrep nscd) & sleep 1; getent passwd test-ldap-user; kill -INT %1')) Specifically:
So apparently when I want to investigate this a bit further, but if nscd absolutely must be started as root, then I think I'll have to drop the DynamicUser patch for that reason. |
NixOS usually needs nscd just to have a single place where LD_LIBRARY_PATH can be set to include all NSS modules, but nscd is also useful if some of the NSS modules need to read files which are only accessible by root. For example, nixos/modules/config/ldap.nix needs this when users.ldap.enable = true; users.ldap.daemon.enable = false; and users.ldap.bind.passwordFile exists. In that case, the module creates an /etc/ldap.conf which is only readable by root, but which the NSS module needs to read in order to find out what LDAP server to connect to and with what credentials. If nscd is started as root and configured with the server-user option in nscd.conf, then it gives each NSS module the opportunity to initialize itself before dropping privileges. The initialization happens in the glibc-internal __nss_disable_nscd function, which pre-loads all the configured NSS modules for passwd, group, hosts, and services (but not netgroup for some reason?) and, for each loaded module, calls an init function if one is defined. After that finishes, nscd's main() calls nscd_init() which ends by calling finish_drop_privileges(). There are provisions in systemd for using DynamicUser with a service which needs to drop privileges itself, so this patch does that.
Okay, I fixed it! Today I learned that systemd has an option for using DynamicUser but still starting the service as root and letting the service drop privileges when it's ready to. Now all the LDAP tests pass, as do the other tests I tried previously. So I think this PR is ready to merge now. |
When mutableUsers = false and all users and groups have assigned UID/GIDs, we can precompute exactly what update-users-groups.pl would output for /etc/passwd and /etc/group. In those cases we don't need to run it at activation time; we can generate the right files at build time and just symlink them into place. Note that most NixOS configs can't currently trigger this optimization, because normally everyone uses the nscd module, which defines a user with a null UID. I've separately proposed fixes for that issue in pull request NixOS#64268. There's no point generating /etc/shadow this way because the generated files are already world-readable in the Nix store, but that isn't a big deal as long as suitably strong passwords, salts, and hashes are used. That said, for users with 'passwordFile' or 'password' set, we can't construct the password hash at build time. But we can limit the work at activation time to building /etc/shadow, and only for users who need it. Tested by instantiating a minimal configuration.nix with: boot.isContainer = true; services.nscd.enable = false; users.mutableUsers = false; users.users.root.initialHashedPassword = ""; and running the resulting container. It builds and boots fine, and I can log in as root without giving a password, as expected. I also tested with several permutations of users with 'password', 'passwordFile', etc set.
Great! that looks very handy. Could you perhaps add a comment to that Also, LGTM |
Good call @arianvp; I've added a commit with a little detail about why it uses the combination of DynamicUser and "!". Now I just need somebody to merge it 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice set of changes. I'll test this on my machine.
@GrahamcOfBorg test login systemd
Seems you can't order our trusty bot around from a review. @GrahamcOfBorg test login systemd |
@abbradar, thanks for reviewing! FWIW: tests.systemd failed on aarch64-linux for the bot in the same way that it does on my machine for x86_64-linux. For me, it fails with or without these patches. But it doesn't fail for the bot on x86_64-linux. I don't understand why the |
Yeah, I tested it locally and it succeeded for me. |
I tested it on several machines and it works nicely; @jameysharp given that you work on our LDAP support I suppose you tried random PAM modules with this too? |
I think you mean NSS, not PAM, right? As far as I know nscd doesn't interact with PAM at all. I don't want to give any false impressions: I don't personally use LDAP. It just got in my way while trying to eliminate activation scripts, which I'm doing as part of something else entirely... I'm several layers down in yak shaving at this point. So beyond running the existing NixOS LDAP tests, no, I have not tried other NSS (or PAM) modules with these patches. That said, I expect that if the LDAP modules can work, any other NSS modules will too, because:
PAM configurations can be a lot more complicated so I'd definitely be more concerned if these patches touched that subsystem. |
Ah, yes, I meant NSS, I'm just thinking for quite a long time already how to improve our PAM modules and you reminded me of that :D
Yeah, that's my idea too - if we tested LDAP with NixOS tests other modules should run too. I thought you used this in daily life too because of your LDAP work.
Anyway, this seems to be tested nicely. I use it on several of my machines too. I suppose we can merge it in several days.
…On July 17, 2019 8:39:24 PM GMT+03:00, Jamey Sharp ***@***.***> wrote:
I think you mean NSS, not PAM, right? As far as I know nscd doesn't
interact with PAM at all.
I don't want to give any false impressions: I don't personally use
LDAP. It just got in my way while trying to eliminate activation
scripts, which I'm doing as part of something else entirely... I'm
several layers down in yak shaving at this point.
So beyond running the existing NixOS LDAP tests, no, I have not tried
other NSS (or PAM) modules with these patches.
That said, I expect that if the LDAP modules can work, any other NSS
modules will too, because:
- NSS modules can't be relying on creating files in `/run/nscd` because
they're supposed to work when you aren't using `nscd` as well, so
anything we do to that directory can't matter to them;
- and the LDAP modules sometimes rely on reading config files as root,
so if they can do that, other modules can access any resources they
need at startup too.
PAM configurations can be a lot more complicated so I'd definitely be
more concerned if these patches touched that subsystem.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#64268 (comment)
--
Nikolay.
|
Per review comments from @arcnmx and @infinisil on NixOS#64594, don't use auto-detection to determine whether to prebuild /etc/passwd and /etc/group. Instead offer a new config option, by analogy with users.enforceIdUniqueness: users.enforceStaticIds provides an assertion that all users and groups have an ID statically assigned. If this option is turned on and the assertions pass, _and_ users.mutableUsers is turned off, then /etc/passwd and /etc/group will be generated at configuration build time instead of activation time. This allows admins to declare their intent and get a build failure if that intent is violated. Adding a configuration option also provided a natural place to document this feature (thanks again to @arcnmx for pointing that out) so this commit adds some documentation accordingly. I didn't want to rebase this PR but I've tested this commit both as-is and after a merge from current git master, by instantiating a minimal configuration.nix with several variations on: boot.isContainer = true; users.mutableUsers = false; users.enforceStaticIds = true; users.users.root.initialHashedPassword = ""; On this commit, the new assertions fire unless this option is also set: services.nscd.enable = false; Since NixOS#64268 has been merged to master, after merging master onto this branch, that option is no longer necessary.
Motivation for this change
I was looking at which users and groups have a
null
value foruid
orgid
, and there was only one in a default-configurationboot.isContainer
build of NixOS:nscd
. As far as I can tell, that service can use systemd'sDynamicUser
directive instead.While I was checking that, I found that some very old code in this module doesn't need to exist any more, if I'm not mistaken. So I've included separate commits for those dead-code removals.
Things done
I ran:
nix-build release.nix -A tests.login.x86_64-linux
(passed)nix-build release.nix -A tests.systemd.x86_64-linux
(passed except the "file system with x-initrd.mount is not unmounted" test fails for me, but it does that without my patches too)I also built the following expression against this branch and booted it with
systemd-nspawn
:Within that container, I verified that:
getent
can talk tonscd
,nscd
is running under a dynamic UID,/run/nscd
and its contents are owned by that UID,/var/db/nscd
doesn't exist but everything still works,users-groups.json
has no"uid"
or"gid"
that isnull
.sandbox
innix.conf
on non-NixOS)nix-shell -p nix-review --run "nix-review wip"
./result/bin/
)nix path-info -S
before and after)