-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prebuild /etc/passwd and /etc/group if possible #64594
Conversation
The same code was used for the system activation script and the user activation script; factor it out to a common function.
There are several empty snippets in a standard configuration. They are present only for backward compatibility with other snippets which may have declared dependencies on them. This patch eliminates the per-snippet boilerplate for empty snippets. If, someday, we get to the point where there are no non-empty activation snippets, then this will lead to an empty string where the snippets used to be. Note that I didn't want newlines inserted where empty scripts used to be either, so rather than using textClosureMap with its built-in "\n" separators, I switched to using the underlying textClosureList, and adding an extra newline only for non-empty snippets.
After opening this PR I discovered that:
I've force-pushed fixes for these issues. |
Factor the "lines of colon-separated fields" format out to a function, so the subuid and subgid formatting can work at a more abstract level of lists of lists of fields. I find this easier to read and reason about, and it's also reusable for other files in /etc.
If environment.etc is being used to set up /etc/passwd or /etc/group, then we can't rely on getpwnam and getgrnam until we've already put all the new /etc files in place. The easiest way to fix that is to make all the symlinks and do all the copies first, and then go back to each copied file and fix its ownership and permissions. Since we're deferring ownership changes, we have to also defer permissions changes; otherwise an unprivileged service could lose access to its config files during this window. Leaving files readable by everyone for a bit isn't a security problem because the source of these files is already world-readable in the Nix store.
In order to create /etc/passwd and /etc/group via environment.etc, the "etc" snippet becomes a dependency of "users", which is the snippet that other snippets depend on. But when "users" creates those files instead, "etc" needs to depend on "users". This patch splits "users" into two snippets: - users-before-etc does everything users currently does, and etc depends on it. - users is now an empty snippet which exists only to support ordering constraints for other snippets. It depends on etc. This resolves the potential circular dependency.
In a subsequent commit I'll eliminate update-users-groups in the case where the contents of /etc/passwd and /etc/group can be precomputed at build time. But that requires that either there are no createHome users or that creating homedirs is done by something other than that script. So in this commit I've implemented the latter approach. Tested by instantiating a minimal configuration.nix with: boot.isContainer = true; services.mingetty.autologinUser = "root"; both with and without a `isNormalUser = true` user. Without any createHome users, the "users" snippet is empty so it does not show up in the activation script and /home does not get created. With a normal user (which implies createHome), the "users" snippet appears in the activation script, and in a container which did not previously contain /home, the snippet successfully creates /home and the user's homedir under it. After changing the mode and ownership of the user's homedir, re-running the activation script preserves the new mode but resets the ownership, which is consistent with the perl version.
Although this feature isn't documented, it looks like update-users-groups.pl is supposed to recognize a user's primary group "name" that's composed entirely of digits as a GID instead. However it did this only for single-digit GIDs, which I can't imagine is what was intended. So this patch generalizes it to accept GIDs of any length.
When mutableUsers = false and all users and groups have assigned UID/GIDs, we can precompute exactly what update-users-groups.pl would output for /etc/passwd and /etc/group. In those cases we don't need to run it at activation time; we can generate the right files at build time and just symlink them into place. Note that most NixOS configs can't currently trigger this optimization, because normally everyone uses the nscd module, which defines a user with a null UID. I've separately proposed fixes for that issue in pull request NixOS#64268. There's no point generating /etc/shadow this way because the generated files are already world-readable in the Nix store, but that isn't a big deal as long as suitably strong passwords, salts, and hashes are used. That said, for users with 'passwordFile' or 'password' set, we can't construct the password hash at build time. But we can limit the work at activation time to building /etc/shadow, and only for users who need it. Tested by instantiating a minimal configuration.nix with: boot.isContainer = true; services.nscd.enable = false; users.mutableUsers = false; users.users.root.initialHashedPassword = ""; and running the resulting container. It builds and boots fine, and I can log in as root without giving a password, as expected. I also tested with several permutations of users with 'password', 'passwordFile', etc set.
If /etc/passwd or /etc/group change while nscd is running, it will automatically invalidate its caches for those files---unless someone has gone to the trouble of turning off the `check-files` option for those databases, in which case I'd hope they know what they're doing. (And if somebody builds a kernel without inotify, the check won't happen immediately, but again that's really their own fault.) So update-users-groups.pl doesn't need to call `nscd --invalidate` after writing either file. Removing these calls lets us remove glibc.bin from the PATH that activation snippets use. Also, NixOS supports configurations without nscd, in which case these calls are clearly unnecessary, albeit harmless. Furthermore, it takes a fairly careful reading of Perl documentation to determine that these uses of the `system` function do not invoke /bin/sh. Since the activation snippet that calls this script doesn't declare a dependency on the "binsh" snippet, I was concerned about whether this was safe. Turns out it is safe, but deleting the calls means nobody has to wonder.
Also my commit 6bab2aef337b23b3c0b9f08f774fb48c63d242b4, "just symlink subuid/subgid", broke tests.login. I think the existing implementation is broken and the tests aren't checking the right thing, but since that's a separate issue I've opened #64606 and dropped this patch from this series. |
I need to go through this in detail in order to be able to say anything sensible about it. |
@jameysharp this sentence I don't fully understand what you mean here. Could you go into a bit more detail what |
@arianvp, sure, happy to. Here's the problematic bit in current master: nixpkgs/nixos/modules/services/system/nscd.nix Lines 42 to 45 in 215497e
Because this This PR detects that case, where at least one user has My other PR takes advantage of the fact that Question: Do I need to document the constraints on when static |
This looks like a nice change!
I'd like an option that acts as an So, I wonder how this change interacts with ID scarcity and @infinisil's recent conclusion that static IDs should stop being assigned entirely. I know I've encountered services that do not have IDs assigned that I've manually done so for the purpose of having stable IDs for mutable data in /var/lib/etc, though wasn't sure whether to report it as the warning in I'm not sure if I have a personal opinion on this (maybe stable IDs are important or maybe IDs should just be a system-local implementation detail) but I will say avoiding activation scripts is always a fantastic benefit. It seems unfortunate if taking advantage of this PR will boil down to "you'll just have to maintain your own static id map" but maybe that's the best that can be done without reserving a bigger range for everything (in which case there should probably be a well-documented way/range to do this?) |
Yeah so as described in the issue, we don't have many static ips left in the normal range (below 400), and NixOS has had good support for non-static uid's for some time now. So I'd like for NixOS to move to what I describe in #60732 (comment), dynamic uid's for all future services. And regarding this PR, even if we ignore that issue, the benefits of it (<1MB less closure size only if all users have a static uid) just don't outweigh the costs (complicating user management code a lot). |
If we want IDs to be transient I think it's worth considering moving to systemd's
I'd argue the benefits come more from immutability, reducing impure activation script logic, and not having to maintain local system state rather than the impact on closure size. Moving the user management logic to build time feels more like a reduction of complexity; unfortunately having to support mutableUsers and backward compatibility makes that point moot. It's also maybe worth noting that the actual closure size impact could be more like 50MB if one were able to remove perl from the system, though that is a bit of a pipe dream. |
Yeah I also just replied in #60732 (comment), I also very much prefer The reduction in complexity would be nice yeah, but unfortunately there's no way this is going to happen, because as you said we can't just remove support for |
Oh no! I'm super discouraged to hear this. NixOS is so close to being usable as a static disk image builder; please don't do things that make it harder to use it that way. For my use cases I don't see any way to use NixOS if I can't configure out usage of the For what it's worth, the proof of concept configuration I have here actually doesn't have Perl (or Python, or Ruby, or node.js) in its closure. Together with things like this PR, a lot of the current Perl-scripted boot process can be replaced with stuff that systemd has built-in. So I don't think it's a pipe dream at all. @arcnmx, I like your suggestion of an option that triggers an assertion if the files can't be statically generated. That raises the question of whether the auto-detection should still happen if both |
Ah, you didn't mention anything about wanting to use this for static disk images before. I've never done anything with those, but I assume the point is that you can do something like mount the whole thing ro and have it work like that? That is a much better benefit for this than just the tiny closure size decrease. So your intention is to rid your NixOS of |
Oh, I guess I've only mentioned that in some of my other recent pull requests, you're right. There are a couple other things that get a bit tricky, but yes, dynamic users are the big challenge for me. I used a much simpler version of this patch for my proof of concept–I didn't care about passwords, which cause most of the complexity here–and for my purposes I didn't need any of the other activation scripts to make my read-only squashfs root file system image work. I have gotten several other activation scripts replaced on master already though–yes, usually with |
Okay so I think I'd be alright with this if you made it with such an option like |
Per review comments from @arcnmx and @infinisil on NixOS#64594, don't use auto-detection to determine whether to prebuild /etc/passwd and /etc/group. Instead offer a new config option, by analogy with users.enforceIdUniqueness: users.enforceStaticIds provides an assertion that all users and groups have an ID statically assigned. If this option is turned on and the assertions pass, _and_ users.mutableUsers is turned off, then /etc/passwd and /etc/group will be generated at configuration build time instead of activation time. This allows admins to declare their intent and get a build failure if that intent is violated. Adding a configuration option also provided a natural place to document this feature (thanks again to @arcnmx for pointing that out) so this commit adds some documentation accordingly. I didn't want to rebase this PR but I've tested this commit both as-is and after a merge from current git master, by instantiating a minimal configuration.nix with several variations on: boot.isContainer = true; users.mutableUsers = false; users.enforceStaticIds = true; users.users.root.initialHashedPassword = ""; On this commit, the new assertions fire unless this option is also set: services.nscd.enable = false; Since NixOS#64268 has been merged to master, after merging master onto this branch, that option is no longer necessary.
Done! Your proposal is better for my purposes since it means that a configuration intended for a read-only filesystem can turn on these asserts, rather than hoping the auto-detection triggers. I named the option I also tested on both my branch, which is already almost 2000 commits behind master, as well as after merging master into my branch. So I think this is safe to merge despite the amount of churn it's missed. |
Cool, I think a code review is still needed though. I'm out of time for now but I'll ask myself for one. |
# TODO: make this a systemd service wantedBy/before nss-user-lookup.target | ||
# because neither activation scripts nor systemd need to check passwords | ||
# but it also needs to run during activation on an already-booted system | ||
system.activationScripts.shadow = stringAfter [ "users" ] (optionalString accountsNeedShadow ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why after users
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember, and re-reading the diffs isn't reminding me. Hmmm.
|
||
subuidFile = concatStrings (map mkSubuidEntry (attrValues cfg.users)); | ||
accountsAreDeterministic = !cfg.mutableUsers && cfg.enforceStaticIds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, with this check it won't build static /etc/{shadow,groups} if you have mutableUsers = true
. I think it would be better to not have any detection for this, just use the value of the option directly to determine whether it should be static or not, and have an assertion verify that mutableUsers = false
. Because why would anybody want to turn on this option if it doesn't end up doing anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion which enforceStaticIds
now turns on (all uid
s/gid
s are non-null
) still applies even if mutableUsers = true
. I'm not certain whether anyone would care about the assertion in that case, but it does do something.
I'm having an easier time thinking about explaining this feature when implemented this way than if the new option directly controls the static file generation. "Here's what mutableUsers
does, and here's a property of IDs which we could enforce for you, and if you put those together then you get this other thing" feels simpler than starting with static file generation and then explaining the preconditions on it.
But I don't feel strongly either way, especially if somebody has a better way to document this.
Once upon a time
So bringing back fully static generation feels a bit like a step back to me. |
@edolstra Okay, I understand those reasons, but do you have alternative suggestions for booting off a read-only filesystem? |
@jameysharp Maybe put |
I'd need I was thinking about a compromise, like making I'm still not real happy with this option since it requires some activation scripts, which are otherwise entirely unnecessary for my purposes. On a read-only system with a tmpfs on So I'm not sure where to go from here. |
I guess it's fair to say that nixos wasn't meant or designed to build immutable systems - there are a number of problematic areas beyond just what this PR targets. I may disagree with a few points made here against it, but understandably the main concern here is the increased maintenance burden for an admittedly niche use-case. It just feels unfortunate since it seems like it could get very close to being usable as a replacement for yocto/etc...
I suppose it mostly comes down to going with a fork or heavy customization approach as the alternative, and hoping the least intrusive changes can still be upstreamed? (specifically on the issue of passwd, bind mounts should be a usable alternative over symlinks?) |
Yeah, okay, apparently I can create an out-of-tree module that sets Is there a better way to let an out-of-tree module hook into this process, rather than using I think these commits are still worth merging as general cleanups to And these are general cleanups to I also have a pile of other cleanups on my https://github.com/jameysharp/nixpkgs/commits/deactivation branch, which don't have much to do with each other aside from eliminating various activation scripts. Should I pile everything that's ready into one new PR, or try to group them vaguely logically? |
|
Motivation for this change
When
mutableUsers = false
and all users and groups have assigned UID/GIDs, we can precompute exactly whatupdate-users-groups.pl
would output for/etc/passwd
and/etc/group
. In those cases we don't need to run it at activation time; we can generate the right files at build time and just symlink them into place.These patches make that happen automatically when it's safe to do so. Until #64268 gets merged most NixOS configurations won't see any benefit because
nscd
doesn't have an assigned UID, but you can always manually assign UIDs and GIDs as needed to make these changes trigger. In my testing I just disabled nscd instead.The first few commits are just cleanups that I think are worth applying regardless. In particular, "no output for empty snippets" just makes the generated activation script easier to read. Since the snippets I introduce in later commits are empty under various circumstances, it's nice to have that cleanup in place first.
I didn't do this because of closure size, but it does reduce the closure size for my test configurations by more than I expected: 187kB savings if no users have
password
orpasswordFile
set. If any users do have one of those options, that adds 32kB to pull inmkpasswd
, which is still a 155kB savings. Of course, that's all rounding errors in these 539MB closures...Hey @peterhoeg and @arianvp, you seemed interested in this work, so here's the pull request I promised.
Things done
sandbox
innix.conf
on non-NixOS)nix-shell -p nix-review --run "nix-review wip"
./result/bin/
)nix path-info -S
before and after)