Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmdlib: don't use cache qcow for composes; use virtiofs #3720

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dustymabe
Copy link
Member

Now that our OSBuild workflow is using the cache we saw at least one case where the pipeline was running out of space. Since we had a previous proposal [1] to just drop the cahce altogether anyway let's try to at least remove it from the runcompose functions to eliminate the use of it there anyway.

[1] #3615

If we ever want to compose over virtiofsd we'll need xattr support.
Now that our OSBuild workflow is using the cache we saw at least one
case where the pipeline was running out of space when composing the
extensions. Since we had a previous proposal [1] to just drop the
cache altogether anyway let's try to at least remove it from the
runcompose functions to eliminate the use of it there anyway.

[1] coreos#3615
@dustymabe
Copy link
Member Author

ok this worked (or rather didn't work because it was an invalid test) locally for me initially because I wasn't running in a VM (i.e. I was executing the privileged workflow). Setting FORCE_UNPRIVILEGED=1 I'm now testing this properly.

I added a commit to add xattr support for virtiofsd. I get farther now but then hit another error:

  zram-generator-1.1.2-8.fc39.x86_64 (fedora)
  zstd-1.5.5-4.fc39.x86_64 (fedora)
Input state hash: 03430d7c33a90d5213dbacd137402f272608196b50ff4172d80fb4313600ae84
error: cannot open Packages database in /proc/self/fd/25/usr/share/rpm
Skipping file /usr/bin/systemd-firstboot from checkout
Skipping file /usr/lib/systemd/system/systemd-firstboot.service from checkout
Skipping file /usr/lib/systemd/system/sysinit.target.wants/systemd-firstboot.service from checkout
Skipping file /usr/lib/systemd/system-generators/systemd-gpt-auto-generator from checkout
Skipping file /usr/etc/grub.d/08_fallback_counting from checkout
Skipping file /usr/etc/grub.d/10_reset_boot_success from checkout
Skipping file /usr/etc/grub.d/12_menu_auto_hide from checkout
Skipping file /usr/lib/systemd/ from checkout
Checking out packages...done
Checking out ostree layers...done
Running pre scripts...20 done
Running post scripts...done
error: While applying overrides for pkg shadow-utils: fchownat(usr/bin/chage): Operation not permitted
failed to execute cmd-build: exit status 1

@dustymabe
Copy link
Member Author

It failed the same way in CI here.

Copy link

openshift-ci bot commented Feb 6, 2024

@dustymabe: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/rhcos 100198b link true /test rhcos

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

dustymabe added a commit to dustymabe/coreos-assembler that referenced this pull request Feb 7, 2024
Since we are now using the cache qcow2 for OSBuild we need it to
be a little large to handle those duties as well as the tree compose
ones. I'd like to drop the cache, but hit some trouble there; see
coreos#3720.
dustymabe added a commit that referenced this pull request Feb 7, 2024
Since we are now using the cache qcow2 for OSBuild we need it to
be a little large to handle those duties as well as the tree compose
ones. I'd like to drop the cache, but hit some trouble there; see
#3720.
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jlebon
Copy link
Member

jlebon commented Feb 13, 2024

I think the issue there is that the actual compose of the rootfs happens over virtiofs also because rpm-ostree wants to colocate it with the cache repo to get hardlinks. One thing we could do is put the virtiofs mount r/w in cosa fetch mode, but in cosa build mode, mount it read-only. Then rpm-ostree could detect that and put the workdir in e.g. /var/tmp (and lose hardlinks, so it'd be slower but meh...).

@dustymabe
Copy link
Member Author

also because rpm-ostree wants to colocate it with the cache repo to get hardlinks

I'm not sure I understand this comment. Here I am modifying it to use runvm rather than runvm_with_cache, which means everything is happening over virtiofs IIUC. so the "cache repo" is also on virtiofs, right?

@jlebon
Copy link
Member

jlebon commented Feb 13, 2024

also because rpm-ostree wants to colocate it with the cache repo to get hardlinks

I'm not sure I understand this comment. Here I am modifying it to use runvm rather than runvm_with_cache, which means everything is happening over virtiofs IIUC. so the "cache repo" is also on virtiofs, right?

Right. The compose just happens wherever the pkgcache repo is. Before (status quo), that was on the cache qcow2. Now, that's over virtiofs.

@jlebon
Copy link
Member

jlebon commented Feb 13, 2024

So rpm-ostree does have support for e.g. applying filecaps at commit time, but it currently keys off of uid != 0 to know this, except that in the supermin VM we are root. And we do need to be root to e.g. do privileged stuff like mount namespaces. But it might work to just add a flag to force the commit modifier path even if uid == 0. E.g. we could try testing with

diff --git a/src/libpriv/rpmostree-core.cxx b/src/libpriv/rpmostree-core.cxx
index 9cc872b2..efb77107 100644
--- a/src/libpriv/rpmostree-core.cxx
+++ b/src/libpriv/rpmostree-core.cxx
@@ -3561,7 +3561,7 @@ apply_rpmfi_overrides (RpmOstreeContext *self, int tmprootfs_dfd, DnfPackage *pk
    *
    * TODO: For non-root `--unified-core` we need to do it as a commit modifier.
    */
-  if (getuid () != 0)
+  if (g_getenv ("RPMOSTREE_SKIP_RPMFI_OVERRIDES") || getuid () != 0)
     return TRUE; /* 🔚 Early return */
 
   g_auto (rpmfi) fi = NULL;

But there may be other things that break.

@jlebon
Copy link
Member

jlebon commented May 9, 2024

I think this would be good to pick up again if it's not a lot of work to get working. But long-term, I think it'll get obsoleted by the move to deriving from a shared base image instead.

@dustymabe
Copy link
Member Author

dustymabe commented Oct 25, 2024

I looked at this a bit earlier this week and I'll update here with some notes. In order to make it so that you can create files that appear to the VM as UID/GID 0 we need to do some UID/GID mapping (i.e. inside the VM you can't chown 0:0 cache/file.txt without this). So you can use unshare to map those for you. You can also use --uid-map from virtiofsd but I never got that working.

diff --git a/mantle/platform/qemu.go b/mantle/platform/qemu.go
index e0070acc3..ead654005 100644
--- a/mantle/platform/qemu.go
+++ b/mantle/platform/qemu.go
@@ -1657,7 +1657,10 @@ func (builder *QemuBuilder) VirtioJournal(config *conf.Conf, queryArguments stri
 
 // createVirtiofsCmd returns a new command instance configured to launch virtiofsd.
 func createVirtiofsCmd(directory, socketPath string) exec.Cmd {
-       args := []string{"--sandbox", "none", "--socket-path", socketPath, "--shared-dir", "."}
+       args := []string{"--mount", "--keep-caps", "--map-users=auto",
+               "--map-groups=auto", "--map-root-user",
+               "/usr/libexec/virtiofsd", "--sandbox", "none",
+               "--socket-path", socketPath, "--shared-dir", "."}
        // Work around https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/197
        if os.Getuid() == 0 {
                args = append(args, "--modcaps=-mknod:-setfcap")
@@ -1665,7 +1668,7 @@ func createVirtiofsCmd(directory, socketPath string) exec.Cmd {
        // We don't need seccomp filtering; we trust our workloads. This incidentally
        // works around issues like https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/200.
        args = append(args, "--seccomp=none")
-       cmd := exec.Command("/usr/libexec/virtiofsd", args...)
+       cmd := exec.Command("/usr/bin/unshare", args...)
        // This sets things up so that the `.` we passed in the arguments is the target directory
        cmd.Dir = directory
        // Quiet the daemon by default

Of course unshare won't work inside an openshift pod so that's a bit of a non-starter :(

To handle xattrs/SELinux we can enable them and map them to a user.virtiofs.* xattr on the host, which solves the problem of labels getting blocked because the host SELinux doesn't know about those types) and this works 99% for what we need:

diff --git a/mantle/platform/qemu.go b/mantle/platform/qemu.go
index e0070acc3..43bcba6ab 100644
--- a/mantle/platform/qemu.go
+++ b/mantle/platform/qemu.go
@@ -1665,6 +1665,12 @@ func createVirtiofsCmd(directory, socketPath string) exec.Cmd {
        // We don't need seccomp filtering; we trust our workloads. This incidentally
        // works around issues like https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/200.
        args = append(args, "--seccomp=none")
+       // OSTree composes want xattrs for SELinux. We need to map all
+       // xattrs inside the guest to `user.virtiofs.*` xattrs on the host.
+       // https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/doc/xattr-mapping.md
+       args = append(args, "--xattr")
+       args = append(args, "--xattrmap=:map::user.virtiofs.:")
+       args = append(args, "--security-label")
        cmd := exec.Command("/usr/libexec/virtiofsd", args...)
        // This sets things up so that the `.` we passed in the arguments is the target directory
        cmd.Dir = directory

Where it doesn't work is (apparently) user.* xattrs aren't allowed to be created on symlinks. So it works for everything but symlinks, which we do care about.

Input state hash: 367f03eefb5f23d6046b246a84db267bd794dadfa991dcf4c113deb555252f8d
error: Installing packages: relabeling: Relabeling aardvark-dns-2:1.12.2-2.fc42.x86_64: Copy checkout of d8f741bf8ae477d9ce975343c1f85debbacf6be34f67bb6e38eaf8aa4498b4bb to 4588f6694ab43d5c991e30b31be6a23b72b2cc: symlinkat: Operation not permitted

There was an effort in the past years to lift this limitation on the user.* xattrs for symlinks but it hasn't been successful so far:

So I think we're at a bit of a dead-end unfortunately unless we gain CAP_SYS_ADMIN for virtiofsd and can write to trusted.* xattrs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants