Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for creating a composefs from a directory #36

Merged
merged 5 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions doc/oci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# How to create a composefs from an OCI image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really useful document!


This document is incomplete. It only serves to document some decisions we've
taken about how to resolve ambiguous situations.

# Data precision

We currently create a composefs image using the granularity of data as
typically appears in OCI tarballs:
- atime and ctime are not present (these are actually not physically present
in the erofs inode structure at all, either the compact or extended forms)
- mtime is set to the mtime in seconds; the sub-seconds value is simply
truncated (ie: we always round down). erofs has an nsec field, but it's not
normally present in OCI tarballs. That's down to the fact that the usual
tar header only has timestamps in seconds and extended headers are not
usually added for this purpose.
- we take great care to faithfully represent hardlinks: even though the
produced filesystem is read-only and we have data de-duplication via the
objects store, we make sure that hardlinks result in an actual shared inode
as visible via the `st_ino` and `st_nlink` fields on the mounted filesystem.

We apply these precision restrictions also when creating images by scanning the
filesystem. For example: even if we get more-accurate timestamp information,
we'll truncate it to the nearest second.

# Merging directories

This is done according to the OCI spec, with an additional clarification: in
case a directory entry is present in multiple layers, we use the tar metadata
from the most-derived layer to determine the attributes (owner, permissions,
mtime) for the directory.

# The root inode

The root inode (/) is a difficult case because it doesn't always appear in the
layer tarballs. We need to make some arbitrary decisions about the metadata.

Here's what we do:

- if any layer tarball contains an empty for '/' then we'd like to use it.
The code for this doesn't exist yet, but it seems reasonable as a principle.
In case the `/` entry were to appear in multiple layers, we'd use the
most-derived layer in which it is present (as per the logic in the previous
section).
- otherwise:
- we assume that the root directory is owned by root:root and has `a+rx`
permissions (ie: `0555`). This matches the behaviour of podman. Note in
particular: podman uses `0555`, not `0755`: the root directory is not
(nominally) writable by the root user.
- the mtime of the root directory is taken to be equal to the most recent
file in the entire system, that is: the highest numerical value of any
mtime on any inode. The rationale is that this is usually a very good
proxy for "when was the (most-derived) container image created".
5 changes: 5 additions & 0 deletions examples/unified/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/cfsctl
/extra/usr/lib/dracut/modules.d/37composefs/composefs-pivot-sysroot
/fix-verity.efi
/image.qcow2
/tmp/
48 changes: 48 additions & 0 deletions examples/unified/Containerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Need 6.12 kernel from rawhide
FROM fedora:rawhide AS base
COPY extra /
COPY cfsctl /usr/bin
RUN --mount=type=cache,target=/var/cache/libdnf5 <<EOF
set -eux

# we should install kernel-modules here, but can't
# because it'll pull in the entire kernel with it
# it seems to work fine for now....
dnf --setopt keepcache=1 install -y \
composefs \
dosfstools \
policycoreutils-python-utils \
selinux-policy-targeted \
skopeo \
strace \
systemd \
util-linux
systemctl enable systemd-networkd
semanage permissive -a systemd_gpt_generator_t # for volatile-root workaround
passwd -d root
mkdir /sysroot
EOF

FROM base AS kernel
RUN --mount=type=bind,from=base,target=/mnt/base <<EOF
set -eux

mkdir -p /tmp/sysroot/composefs
COMPOSEFS_FSVERITY="$(cfsctl --repo /tmp/sysroot create-image /mnt/base)"

mkdir -p /etc/kernel /etc/dracut.conf.d
echo "composefs=${COMPOSEFS_FSVERITY} rw" > /etc/kernel/cmdline
EOF
RUN --mount=type=cache,target=/var/cache/libdnf5 <<EOF
# systemd-boot-unsigned: ditto
# btrfs-progs: dracut wants to include this in the initramfs
# ukify: dracut doesn't want to take our cmdline args?
dnf --setopt keepcache=1 install -y kernel btrfs-progs systemd-boot-unsigned systemd-ukify
EOF

# This could (better?) be done from cfsctl...
FROM base AS bootable
COPY --from=kernel /boot /composefs-meta/boot
# RUN rm -rf /composefs-meta
# RUN commands touch /run unfortunately
COPY empty /.wh.composefs-meta
35 changes: 35 additions & 0 deletions examples/unified/build
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/sh

set -eux

cd "${0%/*}"

cargo build --release

cp ../../target/release/cfsctl .
cp ../../target/release/composefs-pivot-sysroot extra/usr/lib/dracut/modules.d/37composefs/
CFSCTL='./cfsctl --repo tmp/sysroot/composefs'

rm -rf tmp
mkdir -p tmp/sysroot/composefs tmp/sysroot/var

# mkdir tmp/internal-sysroot # for debugging
# podman build -v $(pwd)/tmp/internal-sysroot:/tmp/sysroot:z,U --iidfile=tmp/iid "$@" .
#
podman build --iidfile=tmp/iid "$@" .

IMAGE_ID="$(sed s/sha256:// tmp/iid)"
podman save --format oci-archive -o tmp/final.tar "${IMAGE_ID}"
${CFSCTL} oci pull oci-archive:tmp/final.tar
IMAGE_FSVERITY="$(${CFSCTL} oci create-image "${IMAGE_ID}")"

mkdir -p tmp/efi/loader
echo 'timeout 3' > tmp/efi/loader/loader.conf
mkdir -p tmp/efi/EFI/BOOT tmp/efi/EFI/systemd
cp /usr/lib/systemd/boot/efi/systemd-bootx64.efi tmp/efi/EFI/systemd
cp /usr/lib/systemd/boot/efi/systemd-bootx64.efi tmp/efi/EFI/BOOT/BOOTX64.EFI
${CFSCTL} oci prepare-boot "${IMAGE_ID}" tmp/efi

fakeroot ./make-image
qemu-img convert -f raw tmp/image.raw -O qcow2 image.qcow2
./fix-verity image.qcow2 # https://github.com/tytso/e2fsprogs/issues/201
Empty file added examples/unified/empty
Empty file.
1 change: 1 addition & 0 deletions examples/unified/extra/etc/resolv.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# we want to make sure the virtio disk drivers get included
hostonly=no
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of stuff is also in the fedora-bootc base image.


# we need to force these in via the initramfs because we don't have modules in
# the base image
force_drivers+=" virtio_net vfat "
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Copyright (C) 2013 Colin Walters <walters@verbum.org>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be dropped, I am not sure I'd consider it "derived enough" honestly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and it's already been copied from the other two copies of it already kicking around in the examples/ directory...

#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to above probably for consistency we should use the overall repo license.

But I guess this is all a demo that may end up in a separate repo anyways.

# License as published by the Free Software Foundation; either
# version 2 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library. If not, see <https://www.gnu.org/licenses/>.

[Unit]
DefaultDependencies=no
ConditionKernelCommandLine=composefs
ConditionPathExists=/etc/initrd-release
After=sysroot.mount
Requires=sysroot.mount
Before=initrd-root-fs.target
Before=initrd-switch-root.target

OnFailure=emergency.target
OnFailureJobMode=isolate

[Service]
Type=oneshot
ExecStart=/usr/bin/composefs-pivot-sysroot
StandardInput=null
StandardOutput=journal
StandardError=journal+console
RemainAfterExit=yes
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/usr/bin/bash

check() {
return 0
}

depends() {
return 0
}

install() {
inst \
"${moddir}/composefs-pivot-sysroot" /bin/composefs-pivot-sysroot
inst \
"${moddir}/composefs-pivot-sysroot.service" \
"${systemdsystemunitdir}/composefs-pivot-sysroot.service"

$SYSTEMCTL -q --root "${initdir}" add-wants \
'initrd-root-fs.target' 'composefs-pivot-sysroot.service'
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
layout = uki
uki_generator = ukify
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[Match]
Type=ether

[Link]
RequiredForOnline=routable

[Network]
DHCP=yes

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Make sure we grow the right root filesystem

[Service]
ExecStart=
ExecStart=/usr/lib/systemd/systemd-growfs /sysroot

59 changes: 59 additions & 0 deletions examples/unified/fix-verity
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/bin/sh

# workaround for https://github.com/tytso/e2fsprogs/issues/201

set -eux

# We use a custom UKI with an initramfs containing a script that remounts
# /sysroot read-write and enables fs-verity on all of the objects in
# /composefs/objects.
#
# The first time we're run (or if we are modified) we (re-)generate the UKI.
# This is done inside of a container (for independence from the host OS).

image_file="$1"

if [ "$0" -nt fix-verity.efi ]; then
podman run --rm -i fedora > tmp/fix-verity.efi <<'EOF'
set -eux

cat > /tmp/fix-verity.sh <<'EOS'
mount -o remount,rw /sysroot
(
cd /sysroot/composefs/objects
echo >&2 'Enabling fsverity on composefs objects'
for i in */*; do
fsverity enable $i;
done
echo >&2 'done!'
)
umount /sysroot
sync
poweroff -ff
EOS

(
dnf --setopt keepcache=1 install -y \
kernel binutils systemd-boot-unsigned btrfs-progs fsverity-utils
dracut \
--uefi \
--no-hostonly \
--install 'sync fsverity' \
--include /tmp/fix-verity.sh /lib/dracut/hooks/pre-pivot/fix-verity.sh \
--kver "$(rpm -q kernel-core --qf '%{VERSION}-%{RELEASE}.%{ARCH}')" \
--kernel-cmdline="root=PARTLABEL=root-x86-64 console=ttyS0" \
/tmp/fix-verity.efi
) >&2

cat /tmp/fix-verity.efi
EOF
mv tmp/fix-verity.efi fix-verity.efi
fi

qemu-system-x86_64 \
-nographic \
-m 4096 \
-enable-kvm \
-bios /usr/share/edk2/ovmf/OVMF_CODE.fd \
-drive file="$1",if=virtio,media=disk \
-kernel fix-verity.efi
19 changes: 19 additions & 0 deletions examples/unified/make-image
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/sh

set -eux

chown -R 0:0 tmp/sysroot
chcon -R system_u:object_r:usr_t:s0 tmp/sysroot/composefs
chcon system_u:object_r:var_t:s0 tmp/sysroot/var

> tmp/image.raw
SYSTEMD_REPART_MKFS_OPTIONS_EXT4='-O verity' \
systemd-repart \
--empty=require \
--size=auto \
--dry-run=no \
--no-pager \
--offline=yes \
--root=tmp \
--definitions=repart.d \
tmp/image.raw
6 changes: 6 additions & 0 deletions examples/unified/repart.d/01-esp.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[Partition]
Type=esp
Format=vfat
CopyFiles=/efi:/
SizeMinBytes=512M
SizeMaxBytes=512M
6 changes: 6 additions & 0 deletions examples/unified/repart.d/02-sysroot.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[Partition]
Type=root
Format=ext4
SizeMinBytes=10G
SizeMaxBytes=10G
CopyFiles=/sysroot:/
12 changes: 12 additions & 0 deletions examples/unified/run
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/sh

set -eux

cd "${0%/*}"

qemu-system-x86_64 \
-m 4096 \
-enable-kvm \
-bios /usr/share/edk2/ovmf/OVMF_CODE.fd \
-drive file=image.qcow2,if=virtio,cache=unsafe \
-nic user,model=virtio-net-pci
17 changes: 16 additions & 1 deletion src/bin/cfsctl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,9 @@ enum Command {
/// Perform garbage collection
GC,
/// Imports a composefs image (unsafe!)
ImportImage { reference: String },
ImportImage {
reference: String,
},
/// Commands for dealing with OCI layers
Oci {
#[clap(subcommand)]
Expand All @@ -86,6 +88,12 @@ enum Command {
/// the mountpoint
mountpoint: String,
},
CreateImage {
path: PathBuf,
},
CreateDumpfile {
path: PathBuf,
},
}

fn main() -> Result<()> {
Expand Down Expand Up @@ -165,6 +173,13 @@ fn main() -> Result<()> {
oci::prepare_boot(&repo, name, None, &output)?;
}
},
Command::CreateImage { ref path } => {
let image_id = composefs::fs::create_image(path, Some(&repo))?;
println!("{}", hex::encode(image_id));
}
Command::CreateDumpfile { ref path } => {
composefs::fs::create_dumpfile(path)?;
}
Command::Mount { name, mountpoint } => {
repo.mount(&name, &mountpoint)?;
}
Expand Down
Loading