containers · cgwalters · Mar 25, 2024 · Mar 24, 2024
diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md
@@ -6,6 +6,11 @@
 
 - [Installation](installation.md)
 
+# Building images
+
+- [Building images](building/guidance.md)
+- [Users, groups, SSH keys](building/users-and-groups.md)
+
 # Using bootc
 
 - [Upgrade and rollback](upgrades.md)

diff --git a/docs/src/building/guidance.md b/docs/src/building/guidance.md
@@ -0,0 +1,90 @@
+# Generic guidance for building images
+
+The bootc project intends to be operating system and distribution independent as possible,
+similar to its related projects [podman](http://podman.io/) and [systemd](https://systemd.io/),
+etc.
+
+The recommendations for creating bootc-compatible images will in general need to
+be owned by the OS/distribution - in particular the ones who create the default
+bootc base image(s). However, some guidance is very generic to most Linux
+systems (and bootc only supports Linux).
+
+Let's however restate a base goal of this project:
+
+> The original Docker container model of using "layers" to model
+> applications has been extremely successful.  This project
+> aims to apply the same technique for bootable host systems - using
+> standard OCI/Docker containers as a transport and delivery format
+> for base operating system updates.
+
+Every tool and technique for creating application base images
+should apply to the host Linux OS as much as possible.
+
+## Installing software
+
+For package management tools like `apt`, `dnf`, `zypper` etc.
+(generically, `$pkgsystem`) it is very much expected that
+the pattern of
+
+`RUN $pkgsystem install somepackage && $pkgsystem clean all`
+
+type flow Just Works here - the same way as it does
+"application" container images.  This pattern is really how
+Docker got started.
+
+There's not much special to this that doesn't also apply
+to application containers; but see below.
+
+## systemd units
+
+The model that is most popular with the Docker/OCI world
+is "microservice" style containers with the application as
+pid 1, isolating the applications from each other and
+from the host system - as opposed to "system containers"
+which run an init system like systemd, typically also
+SSH and often multiple logical "application" components
+as part of the same container.
+
+The bootc project generally expects systemd as pid 1,
+and if you embed software in your derived image, the
+default would then be that that software is initially
+launched via a systemd unit.
+
+```
+RUN dnf -y install postgresql
+```
+
+Would typically also carry a systemd unit, and that
+service will be launched the same way as it would
+on a package-based system.
+
+## Users and groups
+
+Note that the above `postgresql` today will allocate a user;
+this leads to the topic of [users, groups and SSH keys](users-and-groups.md).
+
+## Configuration
+
+A key aspect of choosing a bootc-based operating system model
+is that *code* and *configuration* can be strictly "lifecycle bound"
+together in exactly the same way.
+
+(Today, that's by including the configuration into the base
+ container image; however a future enhancement for bootc
+ will also support dynamically-injected ConfigMaps, similar
+ to kubelet)
+
+You can add configuration files to the same places they're
+expected by typical package systems on Debian/Fedora/Arch
+etc. and others - in `/usr` (preferred where possible)
+or `/etc`.  systemd has long advocated and supported
+a model where `/usr` (e.g. `/usr/lib/systemd/system`)
+contains content owned by the operating system image.
+
+`/etc` is machine-local state.  However, per [filesystem.md](../filesystem.md)
+it's important to note that the underlying OSTree
+system performs a 3-way merge of `/etc`, so changes you
+make in the container image to e.g. `/etc/postgresql.conf`
+will be applied on update, assuming it is not modified
+locally.
+
diff --git a/docs/src/building/users-and-groups.md b/docs/src/building/users-and-groups.md
@@ -0,0 +1,194 @@
+
+# Users and groups
+
+This is one of the more complex topics. Generally speaking, bootc has nothing to
+do directly with configuring users or groups; it is a generic OS
+update/configuration mechanism. (There is currently just one small exception in
+that `bootc install` has a special case `--root-ssh-authorized-keys` argument,
+but it's very much optional).
+
+## Generic base images
+
+Commonly OS/distribution base images will be generic, i.e.
+without any configuration.  It is *very strongly recommended*
+to avoid hardcoded passwords and ssh keys with publicly-available
+private keys (as Vagrant does) in generic images.
+
+### Injecting SSH keys via systemd credentials
+
+The systemd project has documentation for [credentials](https://systemd.io/CREDENTIALS/)
+which can be used in some environments to inject a root
+password or SSH authorized_keys.  For many cases, this
+is a best practice.
+
+At the time of this writing this relies on SMBIOS which
+is mainly configurable in local virtualization environments.
+(qemu).
+
+### Injecting users and SSH keys via cloud-init, etc.
+
+Many IaaS and virtualization systems are oriented towards a "metadata server"
+(see e.g. [AWS instance metadata](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html))
+that are commonly processed by software such as [cloud-init](https://cloud-init.io/)
+or [Ignition](https://github.com/coreos/ignition) or equivalent.
+
+The base image you're using may include such software, or you
+can install it in your own derived images.
+
+In this model, SSH configuration is managed outside of the bootable
+image.  See e.g. [GCP oslogin](https://cloud.google.com/compute/docs/oslogin/)
+for an example of this where operating system identities are linked
+to the underlying Google accounts.
+
+### Adding users and credentials via custom logic (container or unit)
+
+Of course, systems like `cloud-init` are not privileged; you
+can inject any logic you want to manage credentials via
+e.g. a systemd unit (which may launch a container image)
+that manages things however you prefer.  Commonly,
+this would be a custom network-hosted source.  For example,
+[FreeIPA](https://www.freeipa.org/page/Main_Page).
+
+Another example in a Kubernetes-oriented infrastructure would
+be a container image that fetches desired authentication
+credentials from a [CRD](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/)
+hosted in the API server.  (To do things like this
+it's suggested to reuse the kubelet credentials)
+
+### Adding users and credentials statically in the container build
+
+Relative to package-oriented systems, a new ability is to inject
+users and credentials as part of a derived build:
+
+```dockerfile
+RUN useradd someuser
+```
+
+However, it is important to understand some issues with the default
+`shadow-utils` implementation of `useradd`:
+
+First, typically user/group IDs are allocated dynamically, and this can result in "drift" (see below).
+
+#### User and group home directories and `/var`
+
+For systems configured with persistent `/home` → `/var/home`, any changes to `/var` made
+in the container image after initial installation *will not be applied on subsequent updates*.  If for example you inject `/var/home/someuser/.ssh/authorized_keys`
+into a container build, existing systems will *not* get the updated authorized keys file.
+
+#### Using DynamicUser=yes for systemd units
+
+For "system" users it's strongly recommended to use systemd [DynamicUser=yes](https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#DynamicUser=) where
+possible.
+
+This is significantly better than the pattern of allocating users/groups
+at "package install time" (e.g. [Fedora package user/group guidelines](https://docs.fedoraproject.org/en-US/packaging-guidelines/UsersAndGroups/)) because
+it avoids potential UID/GID drift (see below).
+
+#### Using systemd-sysusers
+
+See [systemd-sysusers](https://www.freedesktop.org/software/systemd/man/latest/systemd-sysusers.html).  For example in your derived build:
+
+```
+COPY mycustom-user.conf /usr/lib/sysusers.d
+```
+
+A key aspect of how this works is that `sysusers` will make changes
+to the traditional `/etc/passwd` file as necessary on boot.  If
+`/etc` is persistent, this can avoid uid/gid drift (but
+in the general case it does mean that uid/gid allocation can
+depend on how a specific machine was upgraded over time).
+
+#### Using systemd JSON user records
+
+See [JSON user records](https://systemd.io/USER_RECORD/).  Unlike `sysusers`,
+the canonical state for these live in `/usr` - if a subsequent
+image drops a user record, then it will also vanish
+from the system - unlike `sysusers.d`.
+
+#### nss-altfiles
+
+The [nss-altfiles](https://github.com/aperezdc/nss-altfiles) project
+(long) predates systemd JSON user records.  It aims to help split
+"system" users into `/usr/lib/passwd` and `/usr/lib/group`.  It's
+very important to understand that this aligns with the way
+the OSTree project handles the "3 way merge" for `/etc` as it
+relates to `/etc/passwd`.  Currently, if the `/etc/passwd` file is
+modified in any way on the local system, then subsequent changes
+to `/etc/passwd` in the container image *will not be applied*.
+
+Some base images may have `nss-altfiles` enabled by default;
+this is currently the case for base images built by
+[rpm-ostree](https://github.com/coreos/rpm-ostree).
+
+Commonly, base images will have some "system" users pre-allocated
+and managed via this file again to avoid uid/gid drift.
+
+In a derived container build, you can also append users
+to `/usr/lib/passwd` for example.  (At the time of this
+writing there is no command line to do so though).
+
+Typically it is more preferable to use `sysusers.d`
+or `DynamicUser=yes`.
+
+### Machine-local state for users
+
+At this point, it is important to understand the [filesystem](filesystem.md)
+layout - the default is up to the base image.
+
+The default Linux concept of a user has data stored in both `/etc` (`/etc/passwd`, `/etc/shadow` and groups)
+and `/home`.  The choice for how these work is up to the base image, but
+a common default for generic base images is to have both be machine-local persistent state.
+In this model `/home` would be a symlink to `/var/home/someuser`.
+
+But it is also valid to default to having e.g. `/home` be a `tmpfs`
+to ensure user data is cleaned up across reboots (and this pairs particularly
+well with a transient `/etc` as well).
+
+#### Injecting users and SSH keys via at system provisioning time
+
+For base images where `/etc` and `/var` are configured to persist by default, it
+will then be generally supported to inject users via "installers" such
+as [Anaconda](https://github.com/rhinstaller/anaconda/) (interactively or
+via kickstart) or any others.
+
+Typically generic installers such as this are designed for "one time bootstrap"
+and again then the configuration becomes mutable machine-local state
+that can be changed "day 2" via some other mechanism.
+
+The simple case is a user with a password - typically the installer helps
+set the initial password, but to change it there is a different in-system
+tool (such as `passwd` or a GUI as part of [Cockpit](https://cockpit-project.org/), GNOME/KDE/etc).
+
+It is intended that these flows work equivalently in a bootc-compatible
+system, to support users directly installing "generic" base images, without
+requiring changes to the tools above.
+
+### UID/GID drift
+
+Ultimately the `/etc/passwd` and similar files are a mapping
+between names and numeric identifiers.  A problem then becomes
+when this mapping is dynamic and mixed with "stateless"
+container image builds.
+
+For example today the CentOS Stream 9 `postgresql` package
+allocates a [static uid of `26`](https://gitlab.com/redhat/centos-stream/rpms/postgresql/-/blob/a03cf81d4b9a77d9150a78949269ae52a0027b54/postgresql.spec#L847).
+
+This means that
+```
+RUN dnf -y install postgresql
+```
+
+will always result in a change to `/etc/passwd` that allocates uid 26
+and data in `/var/lib/postgres` will always be owned by that UID.
+
+However in contrast, the cockpit project allocates
+[a floating cockpit-ws user](https://gitlab.com/redhat/centos-stream/rpms/cockpit/-/blob/1909236ad28c7d93238b8b3b806ecf9c4feb7e46/cockpit.spec#L506).
+
+This means that each container image build (without additional work)
+may (due to RPM installation ordering or other reasons) result
+in the uid changing.
+
+This can be a problem if that user maintains persistent state.
+Such cases are best handled by being converted to use `sysusers.d`
+(see [Fedora change](https://fedoraproject.org/wiki/Changes/Adopting_sysusers.d_format)) - or again even better, using `DynamicUser=yes` (see above).
+