Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic UID allocation #3600

Merged
merged 49 commits into from
Nov 29, 2022
Merged

Automatic UID allocation #3600

merged 49 commits into from
Nov 29, 2022

Conversation

edolstra
Copy link
Member

@edolstra edolstra commented May 20, 2020

This PR does the following:

  • It adds an experimental feature and option auto-uid-allocation which provides an alternative to having a nixbld group of pre-created build users. When enabled, Nix allocates UIDs/GIDs in the range 872415232+ on Linux and 56930 on macOS.

  • It adds an experimental feature cgroups that causes builds to be executed in a cgroup. This allows getting some statistics from a build (such as CPU time) and in the future may allow setting resource limits. But it mainly exists because the uid-range feature requires it.

  • It adds a system feature uid-range that causes a build to be executed as root in a UID namespace with 65,536 UIDs available. This allows things like systemd-nspawn and NixOS containers to run inside a Nix build.

@domenkozar
Copy link
Member

A few questions/comments:

  1. Does it work on macOS?
  2. Why does it need to be experimental feature?
  3. Missing documentation for the nix.conf option

@edolstra

This comment was marked as outdated.

@7c6f434c

This comment was marked as outdated.

@edolstra

This comment was marked as outdated.

@7c6f434c
Copy link
Member

I think I have heard of people trying to run systemd on NixOS using only cgroup2; and also, of course, I wonder what will be eventually needed to run lighter-weight NixOS tests on non-NixOS Linux. I guess creating a none,name=systemd cgroup hierarchy could be just mentioned in the error message. Is there a simple test I should run on my non-systemd Nixpkgs-kernel system to see if systemd-nspawn indeed works in such setups?

@edolstra

This comment was marked as outdated.

@7c6f434c
Copy link
Member

auto-allocate-uids needs both to be enabled and added to experimental features; there is a message saying auto-allocate-uids = true is needed, but a more generic message is actually printed.

Once things are configured: I do get «Container nixos exited successfully.», and the target path is built with reasonable output (and there is a running systemd inside the build but no apparent attempts at escaping happen).

Logs: firewall failing seems expected, system-getty.slice failing due to it is slightly surprising, the following might mean I am doing something wrong on the host system or maybe you also get these:

Failed to create symlink /sys/fs/cgroup/cpuacct: Read-only file system
Failed to create symlink /sys/fs/cgroup/cpu: Read-only file system
Failed to create symlink /sys/fs/cgroup/net_prio: Read-only file system
Failed to create symlink /sys/fs/cgroup/net_cls: Read-only file system

Overall nix-store -r time is <3s wall-clock time, which is just great, thanks for that feature.

@7c6f434c
Copy link
Member

Looked at the code and got an idea to test… auto-allocate-uids requires systemd cgroup hierarchy to exist even if systemd-cgroup feature is not enabled.

@edolstra
Copy link
Member Author

auto-allocate-uids requires systemd cgroup hierarchy to exist even if systemd-cgroup feature is not enabled.

Yeah that's currently true, didn't think about that. In principle however it could use any existing hierarchy since it's only used for tracking processes.

@7c6f434c
Copy link
Member

… or even mount «nix» hierarchy if systemd one is not mounted, I guess…

I am OK with mounting systemd hierarchy on boot, I guess a slightly more detailed message is enough. I just have no cgroup hierarchies mounted by default so it was very cheap for me to check this combination of options.

@edolstra
Copy link
Member Author

firewall failing seems expected

Actually that succeeds for me. Which makes me realize a big problem with this approach: whether certain networking features (firewall, NAT, ...) work depend on what kernel modules are loaded in the host system. I don't think there is any way to restrict access...

@7c6f434c
Copy link
Member

Erm. Does that require structured required features (and, in this instance, assertions about host kernel modules) for a proper-ish solution? I guess whoever cares could write a very minimal sinit VM to demonstrate whether some dependency is not listed…

@edolstra
Copy link
Member Author

Or maybe a seccomp filter could be used to restrict access to undeclared features.

@7c6f434c
Copy link
Member

Seccomp filter sounds like something that requires ahead-of-time enumeration of all possible things that could go wrong

@edolstra
Copy link
Member Author

I just realized that the situation isn't that bad. Or rather, it was already bad and this doesn't make it worse. It was already possible to create network devices, firewall tables etc. depending on the host kernel configuration. For example:

with import <nixpkgs> {};

runCommand "foo"
  {
    buildInputs = [ pkgs.utillinux pkgs.iproute pkgs.iptables ];
  }
  ''
    unshare -m -n -U -r -- bash -c "
      set -e
      mkdir -p foo/run foo/nix/store
      mount --rbind /nix/store foo/nix/store
      ip link add foo-h type veth peer name foo-c
      chroot foo iptables -t nat -A POSTROUTING -p tcp -s 192.168.1.2
      chroot foo iptables -t nat -L
    "
    mkdir $out
  ''

This will even cause kernel modules like veth and iptable_nat to be loaded if they're not already.

@7c6f434c
Copy link
Member

Ah, it might be that the host dependency got exposed by the fact that I did not enable module autoloading. I guess this impurity was «mitigated» by there being little incentive to do such things in public expressions.

@edolstra edolstra force-pushed the auto-uid-allocation branch from 90b4689 to e263fd4 Compare June 18, 2020 16:35
edolstra added 9 commits July 6, 2020 13:50
Rather than rely on a nixbld group, we now allocate UIDs/GIDs
dynamically starting at a configurable ID (872415232 by default).

Also, we allocate 2^18 UIDs and GIDs per build, and run the build as
root in its UID namespace. (This should not be the default since it
breaks some builds. We probably should enable this conditional on a
requiredSystemFeature.) The goal is to be able to run (NixOS)
containers in a build. However, this will also require some cgroup
initialisation.

The 2^18 UIDs/GIDs is intended to provide enough ID space to run
multiple containers per build, e.g. for distributed NixOS tests.
Also, run builds in a cgroup namespace (ensuring /proc/self/cgroup
doesn't leak information about the outside world) and mount /sys. This
enables running systemd-nspawn and thus NixOS containers in a Nix
build.
2^18 was overkill. The idea was to enable multiple containers to run
inside a build. However, those containers can use the same UID range -
we don't really care about perfect isolation between containers inside
a build.
"uid-range" provides 65536 UIDs to a build and runs the build as root
in its user namespace. "systemd-cgroup" allows the build to mount the
systemd cgroup controller (needed for running systemd-nspawn and NixOS
containers).

Also, add a configuration option "auto-allocate-uids" which is needed
to enable these features, and some experimental feature gates.

So to enable support for containers you need the following in
nix.conf:

  experimental-features = auto-allocate-uids systemd-cgroup
  auto-allocate-uids = true
  system-features = uid-range systemd-cgroup
Maybe this should be a separate system feature... /sys exposes a lot
of impure info about the host system.
doc/manual/src/release-notes/rl-next.md Outdated Show resolved Hide resolved
src/libstore/globals.hh Show resolved Hide resolved
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2022-11-14-nix-team-meeting-minutes-8/23452/1

@edolstra edolstra enabled auto-merge November 29, 2022 12:54
@edolstra edolstra merged commit fbc53e9 into master Nov 29, 2022
@edolstra edolstra deleted the auto-uid-allocation branch November 29, 2022 13:01
@fricklerhandwerk fricklerhandwerk mentioned this pull request Nov 29, 2022
sandbox uids. This must be done before any chownToBuilder()
calls. */
killSandbox(false);

/* Right platform? */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this code is no longer at the top? It makes sense to me to check the drv itself before doing IO like checking the filesystem --- cheap checks first.

Comment on lines +2437 to +2440

/* FIXME: set proper permissions in restorePath() so
we don't have to do another traversal. */
canonicalisePathMetaData(actualPath, {}, inodesSeen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just become needed?

@@ -580,10 +642,11 @@ void LocalDerivationGoal::startBuilder()

printMsg(lvlChatty, format("setting up chroot environment in '%1%'") % chrootRootDir);

if (mkdir(chrootRootDir.c_str(), 0750) == -1)
// FIXME: make this 0700
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This FIXME is fairly short so I doubt I would know how to interpret it a long time from know. Should we file an issue and link it here?

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-2-12-0-released/23780/4

@ncfavier
Copy link
Member

This is nice, but it breaks nix-top which gets build users from the nixbld group. Could we have a Nix command that lists the UIDs for the currently active builds? Or does this PR make nix-top obsolete?

@edolstra
Copy link
Member Author

@ncfavier Maybe you can create an issue for querying active builds? But maybe it's less important since in principle builds will show up in systemd-cgtop.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-team-report-2022-10-2023-03/27486/1

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-build-ate-my-ram/35752/2

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-on-macos-now-fails-because-i-set-auto-allocate-uids-true-like-an-idiot/40210/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.