Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmdlib: don't use cache qcow for composes; use virtiofs #3720

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dustymabe
Copy link
Member

Now that our OSBuild workflow is using the cache we saw at least one case where the pipeline was running out of space. Since we had a previous proposal [1] to just drop the cahce altogether anyway let's try to at least remove it from the runcompose functions to eliminate the use of it there anyway.

[1] #3615

If we ever want to compose over virtiofsd we'll need xattr support.
Now that our OSBuild workflow is using the cache we saw at least one
case where the pipeline was running out of space when composing the
extensions. Since we had a previous proposal [1] to just drop the
cache altogether anyway let's try to at least remove it from the
runcompose functions to eliminate the use of it there anyway.

[1] coreos#3615
@dustymabe
Copy link
Member Author

ok this worked (or rather didn't work because it was an invalid test) locally for me initially because I wasn't running in a VM (i.e. I was executing the privileged workflow). Setting FORCE_UNPRIVILEGED=1 I'm now testing this properly.

I added a commit to add xattr support for virtiofsd. I get farther now but then hit another error:

  zram-generator-1.1.2-8.fc39.x86_64 (fedora)
  zstd-1.5.5-4.fc39.x86_64 (fedora)
Input state hash: 03430d7c33a90d5213dbacd137402f272608196b50ff4172d80fb4313600ae84
error: cannot open Packages database in /proc/self/fd/25/usr/share/rpm
Skipping file /usr/bin/systemd-firstboot from checkout
Skipping file /usr/lib/systemd/system/systemd-firstboot.service from checkout
Skipping file /usr/lib/systemd/system/sysinit.target.wants/systemd-firstboot.service from checkout
Skipping file /usr/lib/systemd/system-generators/systemd-gpt-auto-generator from checkout
Skipping file /usr/etc/grub.d/08_fallback_counting from checkout
Skipping file /usr/etc/grub.d/10_reset_boot_success from checkout
Skipping file /usr/etc/grub.d/12_menu_auto_hide from checkout
Skipping file /usr/lib/systemd/ from checkout
Checking out packages...done
Checking out ostree layers...done
Running pre scripts...20 done
Running post scripts...done
error: While applying overrides for pkg shadow-utils: fchownat(usr/bin/chage): Operation not permitted
failed to execute cmd-build: exit status 1

@dustymabe
Copy link
Member Author

It failed the same way in CI here.

Copy link

openshift-ci bot commented Feb 6, 2024

@dustymabe: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/rhcos 100198b link true /test rhcos

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

dustymabe added a commit to dustymabe/coreos-assembler that referenced this pull request Feb 7, 2024
Since we are now using the cache qcow2 for OSBuild we need it to
be a little large to handle those duties as well as the tree compose
ones. I'd like to drop the cache, but hit some trouble there; see
coreos#3720.
dustymabe added a commit that referenced this pull request Feb 7, 2024
Since we are now using the cache qcow2 for OSBuild we need it to
be a little large to handle those duties as well as the tree compose
ones. I'd like to drop the cache, but hit some trouble there; see
#3720.
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jlebon
Copy link
Member

jlebon commented Feb 13, 2024

I think the issue there is that the actual compose of the rootfs happens over virtiofs also because rpm-ostree wants to colocate it with the cache repo to get hardlinks. One thing we could do is put the virtiofs mount r/w in cosa fetch mode, but in cosa build mode, mount it read-only. Then rpm-ostree could detect that and put the workdir in e.g. /var/tmp (and lose hardlinks, so it'd be slower but meh...).

@dustymabe
Copy link
Member Author

also because rpm-ostree wants to colocate it with the cache repo to get hardlinks

I'm not sure I understand this comment. Here I am modifying it to use runvm rather than runvm_with_cache, which means everything is happening over virtiofs IIUC. so the "cache repo" is also on virtiofs, right?

@jlebon
Copy link
Member

jlebon commented Feb 13, 2024

also because rpm-ostree wants to colocate it with the cache repo to get hardlinks

I'm not sure I understand this comment. Here I am modifying it to use runvm rather than runvm_with_cache, which means everything is happening over virtiofs IIUC. so the "cache repo" is also on virtiofs, right?

Right. The compose just happens wherever the pkgcache repo is. Before (status quo), that was on the cache qcow2. Now, that's over virtiofs.

@jlebon
Copy link
Member

jlebon commented Feb 13, 2024

So rpm-ostree does have support for e.g. applying filecaps at commit time, but it currently keys off of uid != 0 to know this, except that in the supermin VM we are root. And we do need to be root to e.g. do privileged stuff like mount namespaces. But it might work to just add a flag to force the commit modifier path even if uid == 0. E.g. we could try testing with

diff --git a/src/libpriv/rpmostree-core.cxx b/src/libpriv/rpmostree-core.cxx
index 9cc872b2..efb77107 100644
--- a/src/libpriv/rpmostree-core.cxx
+++ b/src/libpriv/rpmostree-core.cxx
@@ -3561,7 +3561,7 @@ apply_rpmfi_overrides (RpmOstreeContext *self, int tmprootfs_dfd, DnfPackage *pk
    *
    * TODO: For non-root `--unified-core` we need to do it as a commit modifier.
    */
-  if (getuid () != 0)
+  if (g_getenv ("RPMOSTREE_SKIP_RPMFI_OVERRIDES") || getuid () != 0)
     return TRUE; /* 🔚 Early return */
 
   g_auto (rpmfi) fi = NULL;

But there may be other things that break.

@jlebon
Copy link
Member

jlebon commented May 9, 2024

I think this would be good to pick up again if it's not a lot of work to get working. But long-term, I think it'll get obsoleted by the move to deriving from a shared base image instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants