Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prebuild /etc/passwd and /etc/group if possible #64594

Closed
wants to merge 10 commits into from

Conversation

jameysharp
Copy link
Contributor

@jameysharp jameysharp commented Jul 10, 2019

Motivation for this change

When mutableUsers = false and all users and groups have assigned UID/GIDs, we can precompute exactly what update-users-groups.pl would output for /etc/passwd and /etc/group. In those cases we don't need to run it at activation time; we can generate the right files at build time and just symlink them into place.

These patches make that happen automatically when it's safe to do so. Until #64268 gets merged most NixOS configurations won't see any benefit because nscd doesn't have an assigned UID, but you can always manually assign UIDs and GIDs as needed to make these changes trigger. In my testing I just disabled nscd instead.

The first few commits are just cleanups that I think are worth applying regardless. In particular, "no output for empty snippets" just makes the generated activation script easier to read. Since the snippets I introduce in later commits are empty under various circumstances, it's nice to have that cleanup in place first.

I didn't do this because of closure size, but it does reduce the closure size for my test configurations by more than I expected: 187kB savings if no users have password or passwordFile set. If any users do have one of those options, that adds 32kB to pull in mkpasswd, which is still a 155kB savings. Of course, that's all rounding errors in these 539MB closures...

Hey @peterhoeg and @arianvp, you seemed interested in this work, so here's the pull request I promised.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

The same code was used for the system activation script and the user
activation script; factor it out to a common function.
There are several empty snippets in a standard configuration. They are
present only for backward compatibility with other snippets which may
have declared dependencies on them.

This patch eliminates the per-snippet boilerplate for empty snippets.
If, someday, we get to the point where there are no non-empty activation
snippets, then this will lead to an empty string where the snippets used
to be.

Note that I didn't want newlines inserted where empty scripts used to be
either, so rather than using textClosureMap with its built-in "\n"
separators, I switched to using the underlying textClosureList, and
adding an extra newline only for non-empty snippets.
@ofborg ofborg bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jul 10, 2019
@jameysharp
Copy link
Contributor Author

After opening this PR I discovered that:

  • I misunderstood the passwordFile option and over-complicated how I implemented it;
  • I needed to take additional steps to get nscd to reload its caches when using this mode;
  • and the steps that update-users-groups.pl took to get nscd to reload its caches are not necessary.

I've force-pushed fixes for these issues.

Factor the "lines of colon-separated fields" format out to a function,
so the subuid and subgid formatting can work at a more abstract level of
lists of lists of fields.

I find this easier to read and reason about, and it's also reusable for
other files in /etc.
If environment.etc is being used to set up /etc/passwd or /etc/group,
then we can't rely on getpwnam and getgrnam until we've already put all
the new /etc files in place.

The easiest way to fix that is to make all the symlinks and do all the
copies first, and then go back to each copied file and fix its ownership
and permissions.

Since we're deferring ownership changes, we have to also defer
permissions changes; otherwise an unprivileged service could lose access
to its config files during this window. Leaving files readable by
everyone for a bit isn't a security problem because the source of these
files is already world-readable in the Nix store.
In order to create /etc/passwd and /etc/group via environment.etc, the
"etc" snippet becomes a dependency of "users", which is the snippet that
other snippets depend on. But when "users" creates those files instead,
"etc" needs to depend on "users".

This patch splits "users" into two snippets:

- users-before-etc does everything users currently does, and etc depends
  on it.
- users is now an empty snippet which exists only to support ordering
  constraints for other snippets. It depends on etc.

This resolves the potential circular dependency.
In a subsequent commit I'll eliminate update-users-groups in the case
where the contents of /etc/passwd and /etc/group can be precomputed at
build time. But that requires that either there are no createHome users
or that creating homedirs is done by something other than that script.
So in this commit I've implemented the latter approach.

Tested by instantiating a minimal configuration.nix with:

    boot.isContainer = true;
    services.mingetty.autologinUser = "root";

both with and without a `isNormalUser = true` user.

Without any createHome users, the "users" snippet is empty so it does
not show up in the activation script and /home does not get created.

With a normal user (which implies createHome), the "users" snippet
appears in the activation script, and in a container which did not
previously contain /home, the snippet successfully creates /home and the
user's homedir under it. After changing the mode and ownership of the
user's homedir, re-running the activation script preserves the new mode
but resets the ownership, which is consistent with the perl version.
Although this feature isn't documented, it looks like
update-users-groups.pl is supposed to recognize a user's primary group
"name" that's composed entirely of digits as a GID instead.

However it did this only for single-digit GIDs, which I can't imagine is
what was intended.

So this patch generalizes it to accept GIDs of any length.
When mutableUsers = false and all users and groups have assigned
UID/GIDs, we can precompute exactly what update-users-groups.pl would
output for /etc/passwd and /etc/group. In those cases we don't need to
run it at activation time; we can generate the right files at build time
and just symlink them into place.

Note that most NixOS configs can't currently trigger this optimization,
because normally everyone uses the nscd module, which defines a user
with a null UID. I've separately proposed fixes for that issue in pull
request NixOS#64268.

There's no point generating /etc/shadow this way because the generated
files are already world-readable in the Nix store, but that isn't a big
deal as long as suitably strong passwords, salts, and hashes are used.

That said, for users with 'passwordFile' or 'password' set, we can't
construct the password hash at build time. But we can limit the work at
activation time to building /etc/shadow, and only for users who need it.

Tested by instantiating a minimal configuration.nix with:

    boot.isContainer = true;
    services.nscd.enable = false;
    users.mutableUsers = false;
    users.users.root.initialHashedPassword = "";

and running the resulting container. It builds and boots fine, and I can
log in as root without giving a password, as expected.

I also tested with several permutations of users with 'password',
'passwordFile', etc set.
If /etc/passwd or /etc/group change while nscd is running, it will
automatically invalidate its caches for those files---unless someone has
gone to the trouble of turning off the `check-files` option for those
databases, in which case I'd hope they know what they're doing. (And if
somebody builds a kernel without inotify, the check won't happen
immediately, but again that's really their own fault.)

So update-users-groups.pl doesn't need to call `nscd --invalidate` after
writing either file.

Removing these calls lets us remove glibc.bin from the PATH that
activation snippets use.

Also, NixOS supports configurations without nscd, in which case these
calls are clearly unnecessary, albeit harmless.

Furthermore, it takes a fairly careful reading of Perl documentation to
determine that these uses of the `system` function do not invoke
/bin/sh. Since the activation snippet that calls this script doesn't
declare a dependency on the "binsh" snippet, I was concerned about
whether this was safe. Turns out it is safe, but deleting the calls
means nobody has to wonder.
@jameysharp
Copy link
Contributor Author

Also my commit 6bab2aef337b23b3c0b9f08f774fb48c63d242b4, "just symlink subuid/subgid", broke tests.login. I think the existing implementation is broken and the tests aren't checking the right thing, but since that's a separate issue I've opened #64606 and dropped this patch from this series.

@peterhoeg
Copy link
Member

I need to go through this in detail in order to be able to say anything sensible about it.

@arianvp
Copy link
Member

arianvp commented Jul 12, 2019

These patches make that happen automatically when it's safe to do so. Until #64268 gets merged most NixOS configurations won't see any benefit because nscd doesn't have an assigned UID, but you can always manually assign UIDs and GIDs as needed to make these changes trigger. In my testing I just disabled nscd instead.

@jameysharp this sentence I don't fully understand what you mean here. Could you go into a bit more detail what nscd has to do with this?

@jameysharp
Copy link
Contributor Author

@arianvp, sure, happy to. Here's the problematic bit in current master:

users.users.nscd =
{ isSystemUser = true;
description = "Name service cache daemon user";
};

Because this users.users.nscd declaration doesn't specify uid, the only way to ensure that the allocated UID remains stable on a given NixOS installation is to use update-users-groups.pl to check if a UID has previously been assigned to that user on that installation, and only allocate a new UID if it hasn't been set yet.

This PR detects that case, where at least one user has uid == null, and falls back to the existing update-users-groups.pl strategy to handle it. There are two ways to get this PR to work with the existing definition for nscd: either manually pick a UID and assign it to users.users.nscd.uid in your configuration, or set services.nscd.enable = false. Of course the latter option has bad consequences for most NixOS installations.

My other PR takes advantage of the fact that nscd does not need a stable UID, and deletes the user definition entirely, which of course is another way to allow this PR to work.

Question: Do I need to document the constraints on when static passwd/group are possible, since that doesn't seem to have been clear? If so, where?

@arcnmx
Copy link
Member

arcnmx commented Jul 27, 2019

This looks like a nice change!

Question: Do I need to document the constraints on when static passwd/group are possible, since that doesn't seem to have been clear? If so, where?

I'd like an option that acts as an accountsAreDeterministic assertion to catch when you've unknowingly enabled a service that makes them nondeterministic. Default to false of course, and would double as a good place to document this feature? Alternatively just expose it as an readonly output option where the default is calculated from the config. (then the user can assert if they want to, or maybe this is too weird but: make it pseudo-readonly so if it's set by user configuration and reality differs from the expectations implied by setting/mkForcing it, assert fail)

So, I wonder how this change interacts with ID scarcity and @infinisil's recent conclusion that static IDs should stop being assigned entirely. I know I've encountered services that do not have IDs assigned that I've manually done so for the purpose of having stable IDs for mutable data in /var/lib/etc, though wasn't sure whether to report it as the warning in ids.nix gave me pause.

I'm not sure if I have a personal opinion on this (maybe stable IDs are important or maybe IDs should just be a system-local implementation detail) but I will say avoiding activation scripts is always a fantastic benefit. It seems unfortunate if taking advantage of this PR will boil down to "you'll just have to maintain your own static id map" but maybe that's the best that can be done without reserving a bigger range for everything (in which case there should probably be a well-documented way/range to do this?)

@infinisil
Copy link
Member

Yeah so as described in the issue, we don't have many static ips left in the normal range (below 400), and NixOS has had good support for non-static uid's for some time now. So I'd like for NixOS to move to what I describe in #60732 (comment), dynamic uid's for all future services.

And regarding this PR, even if we ignore that issue, the benefits of it (<1MB less closure size only if all users have a static uid) just don't outweigh the costs (complicating user management code a lot).

@arcnmx
Copy link
Member

arcnmx commented Jul 27, 2019

So I'd like for NixOS to move to what I describe in #60732 (comment), dynamic uid's for all future services.

If we want IDs to be transient I think it's worth considering moving to systemd's DynamicUser instead, and mostly alleviate NixOS scripts of the responsibility of managing ids - which also happens to be a solution that makes this PR's approach viable with dynamic IDs.

the benefits of it (<1MB less closure size only if all users have a static uid) just don't outweigh the costs (complicating user management code a lot).

I'd argue the benefits come more from immutability, reducing impure activation script logic, and not having to maintain local system state rather than the impact on closure size. Moving the user management logic to build time feels more like a reduction of complexity; unfortunately having to support mutableUsers and backward compatibility makes that point moot. It's also maybe worth noting that the actual closure size impact could be more like 50MB if one were able to remove perl from the system, though that is a bit of a pipe dream.

@infinisil
Copy link
Member

infinisil commented Jul 27, 2019

Yeah I also just replied in #60732 (comment), I also very much prefer DynamicUser if it's applicable.

The reduction in complexity would be nice yeah, but unfortunately there's no way this is going to happen, because as you said we can't just remove support for mutableUsers and force all users to assign static uid's.

@jameysharp
Copy link
Contributor Author

Oh no! I'm super discouraged to hear this. NixOS is so close to being usable as a static disk image builder; please don't do things that make it harder to use it that way. For my use cases I don't see any way to use NixOS if I can't configure out usage of the update-users-groups script. My current approach is to just skip running any activation scripts and kludge enough things back into place that my specific configuration can boot, but I really hoped to get enough things upstream to be able to quit doing that.

For what it's worth, the proof of concept configuration I have here actually doesn't have Perl (or Python, or Ruby, or node.js) in its closure. Together with things like this PR, a lot of the current Perl-scripted boot process can be replaced with stuff that systemd has built-in. So I don't think it's a pipe dream at all.

@arcnmx, I like your suggestion of an option that triggers an assertion if the files can't be statically generated. That raises the question of whether the auto-detection should still happen if both mutableUsers and the new option are false, or if the only way to turn on this behavior would be by enabling the new option. Of course, all that is irrelevant if this PR is rejected, so I'm not going to think about it too hard yet.

@infinisil
Copy link
Member

Ah, you didn't mention anything about wanting to use this for static disk images before. I've never done anything with those, but I assume the point is that you can do something like mount the whole thing ro and have it work like that? That is a much better benefit for this than just the tiny closure size decrease.

So your intention is to rid your NixOS of activationScripts and replace them all with things that can be set up statically (like systemd.tmpfiles?). And the users is one thing that can't work like that due to it being dynamic, unless you make all users have a static uid (the same for groups). That right?

@jameysharp
Copy link
Contributor Author

Oh, I guess I've only mentioned that in some of my other recent pull requests, you're right.

There are a couple other things that get a bit tricky, but yes, dynamic users are the big challenge for me. I used a much simpler version of this patch for my proof of concept–I didn't care about passwords, which cause most of the complexity here–and for my purposes I didn't need any of the other activation scripts to make my read-only squashfs root file system image work.

I have gotten several other activation scripts replaced on master already though–yes, usually with tmpfiles.d, but I've worked out a number of tricks with one-shot units or ExecStartPre for more complex cases–and I have a pile of additional patches which are waiting for me to organize them into reviewable pull requests: https://github.com/jameysharp/nixpkgs/tree/deactivation

@infinisil
Copy link
Member

Okay so I think I'd be alright with this if you made it with such an option like staticIds which when enabled will enforce static uid/gid's for all users/groups and do this static /etc/{passwd,groups} setup (and doesn't do it if it's disabled, to make it more predictable).

Per review comments from @arcnmx and @infinisil on NixOS#64594, don't use
auto-detection to determine whether to prebuild /etc/passwd and
/etc/group.

Instead offer a new config option, by analogy with
users.enforceIdUniqueness: users.enforceStaticIds provides an assertion
that all users and groups have an ID statically assigned.

If this option is turned on and the assertions pass, _and_
users.mutableUsers is turned off, then /etc/passwd and /etc/group will
be generated at configuration build time instead of activation time.

This allows admins to declare their intent and get a build failure if
that intent is violated.

Adding a configuration option also provided a natural place to document
this feature (thanks again to @arcnmx for pointing that out) so this
commit adds some documentation accordingly.

I didn't want to rebase this PR but I've tested this commit both as-is
and after a merge from current git master, by instantiating a minimal
configuration.nix with several variations on:

    boot.isContainer = true;
    users.mutableUsers = false;
    users.enforceStaticIds = true;
    users.users.root.initialHashedPassword = "";

On this commit, the new assertions fire unless this option is also set:

    services.nscd.enable = false;

Since NixOS#64268 has been merged to master, after merging master onto this
branch, that option is no longer necessary.
@jameysharp
Copy link
Contributor Author

Done! Your proposal is better for my purposes since it means that a configuration intended for a read-only filesystem can turn on these asserts, rather than hoping the auto-detection triggers.

I named the option users.enforceStaticIds for some consistency with the existing users.enforceIdUniqueness option.

I also tested on both my branch, which is already almost 2000 commits behind master, as well as after merging master into my branch. So I think this is safe to merge despite the amount of churn it's missed.

@infinisil
Copy link
Member

Cool, I think a code review is still needed though. I'm out of time for now but I'll ask myself for one.

# TODO: make this a systemd service wantedBy/before nss-user-lookup.target
# because neither activation scripts nor systemd need to check passwords
# but it also needs to run during activation on an already-booted system
system.activationScripts.shadow = stringAfter [ "users" ] (optionalString accountsNeedShadow (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why after users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, and re-reading the diffs isn't reminding me. Hmmm.


subuidFile = concatStrings (map mkSubuidEntry (attrValues cfg.users));
accountsAreDeterministic = !cfg.mutableUsers && cfg.enforceStaticIds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, with this check it won't build static /etc/{shadow,groups} if you have mutableUsers = true. I think it would be better to not have any detection for this, just use the value of the option directly to determine whether it should be static or not, and have an assertion verify that mutableUsers = false. Because why would anybody want to turn on this option if it doesn't end up doing anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion which enforceStaticIds now turns on (all uids/gids are non-null) still applies even if mutableUsers = true. I'm not certain whether anyone would care about the assertion in that case, but it does do something.

I'm having an easier time thinking about explaining this feature when implemented this way than if the new option directly controls the static file generation. "Here's what mutableUsers does, and here's a property of IDs which we could enforce for you, and if you put those together then you get this other thing" feels simpler than starting with static file generation and then explaining the preconditions on it.

But I don't feel strongly either way, especially if somebody has a better way to document this.

@edolstra
Copy link
Member

Once upon a time mutableUsers = false implied that you had to provide static ID assignments, but we got rid of that and switched to doing dynamic ID allocations via update-users-groups.pl (1a75958). The reasons were that

  • Users shouldn't be required to manually manage ID assignments (just like you don't have to manage inode numbers).
  • Static ID assignments are not compositional (there is no way to prevent out-of-tree NixOS modules from clashing with each other).
  • We have only one code path to handle the mutableUsers = false and mutableUsers = true cases.

So bringing back fully static generation feels a bit like a step back to me.

@jameysharp
Copy link
Contributor Author

@edolstra Okay, I understand those reasons, but do you have alternative suggestions for booting off a read-only filesystem?

@edolstra
Copy link
Member

edolstra commented Aug 2, 2019

@jameysharp Maybe put /etc on a tmpfs (or symlink it to /run/etc)? Then passwd and group will be regenerated by the activation script at boot time.

@jameysharp
Copy link
Contributor Author

I'd need update-users-groups to not be written in Perl or the closure size will be too big for my use cases, but that can be a separate issue.

I was thinking about a compromise, like making /etc/passwd and the rest be symlinks to someplace in /var, but it turns out that the shadow suite opens those files with O_NOFOLLOW, so that doesn't work. Unless we rebuild pkgs.shadow with -DPASSWD_FILE=/var/... and so on, which I guess might not be the worst thing ever.

I'm still not real happy with this option since it requires some activation scripts, which are otherwise entirely unnecessary for my purposes. On a read-only system with a tmpfs on /var, which is a common situation for embedded Linux systems, everything that activation scripts or phase-2 init currently do can be done without them.

So I'm not sure where to go from here.

@arcnmx
Copy link
Member

arcnmx commented Aug 2, 2019

I guess it's fair to say that nixos wasn't meant or designed to build immutable systems - there are a number of problematic areas beyond just what this PR targets. I may disagree with a few points made here against it, but understandably the main concern here is the increased maintenance burden for an admittedly niche use-case. It just feels unfortunate since it seems like it could get very close to being usable as a replacement for yocto/etc...

So I'm not sure where to go from here.

I suppose it mostly comes down to going with a fork or heavy customization approach as the alternative, and hoping the least intrusive changes can still be upstreamed? mkForce or selective imports can be somewhat flexible tools I guess ><

(specifically on the issue of passwd, bind mounts should be a usable alternative over symlinks?)

@jameysharp
Copy link
Contributor Author

Yeah, okay, apparently I can create an out-of-tree module that sets system.activationScripts.users = lib.mkForce "" and then does what this PR does in lieu of the default behavior.

Is there a better way to let an out-of-tree module hook into this process, rather than using mkForce?

I think these commits are still worth merging as general cleanups to activation-script.nix:

And these are general cleanups to update-users-groups.pl:

I also have a pile of other cleanups on my https://github.com/jameysharp/nixpkgs/commits/deactivation branch, which don't have much to do with each other aside from eliminating various activation scripts. Should I pile everything that's ready into one new PR, or try to group them vaguely logically?

@jameysharp jameysharp closed this Aug 3, 2019
@peterhoeg
Copy link
Member

I think these commits are still worth merging as general cleanups
I had a look at those 4 and they should definitely go in. Any chance you could wrap them up in a separate PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants