Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osbuild: switch compression to off; workaround image corruption #3730

Merged
merged 7 commits into from
Feb 16, 2024

Conversation

dustymabe
Copy link
Member

@dustymabe dustymabe commented Feb 15, 2024

This obsoletes #3729 but keeps a lot of the cleanups done there to make unmounting cleaner. This biggest change here is switching the cache.qcow2 to ext4 (see #3728 (comment)) and adding a sanity check in OSBuild to catch if the created image is ever inconsistent.

This also now updates OSBuild to v108 and drops many patches as a result.

cgwalters and others added 7 commits February 13, 2024 11:33
Raw format is fine to use on systems that have reflinks for example.
I've been investigating why a seemingly innocuous change
(changing compression on OSBuild generated qemu qcow2) would
cause disk images to not boot [1]. I think I have found the issue.

I was first trying to make sure 100% that the files got written
out over the virtiofs mount before the VM got shutdown so I decided
to add a `umount $workdir` to the process. But this ended up with
a `umount: /srv/: target is busy.` error.

When the supermin VM gets run we `cd "${workdir}"` at the end of
supermin-init-prelude.sh. This has the effect of causing all
spawned processes (including PID1/init) to have a cwd of /srv/.

```
bash-5.2# lsof /srv
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
init        1 root  cwd    DIR   0,26     4096 10485829 /srv
kthreadd    2 root  cwd    DIR   0,26     4096 10485829 /srv
pool_work   3 root  cwd    DIR   0,26     4096 10485829 /srv
kworker/R   4 root  cwd    DIR   0,26     4096 10485829 /srv
...
...
```

Which means it's unlikely that the virtiofs mount ever gets cleanly
unmounted. Let's rework things here so that actual work gets spawned
in a subshell to prevent `init` from having a cwd on the virtiofs mount.

We also add in an `umount` of the cache qcow2 (if exists) and the virtiofs
mount to strengthen our chances of a clean unmount.

[1] coreos#3728
Just in case this is what is causing issues with file consistency
when copying out of the supermin VM.
And drop all patches that have now been upstreamed. The only remaining
patches are one to enable s390x builds to work while we figure out [1]
and another that adds a log statement when cache eviction happens, which
I plan to upstream at some point.

[1] coreos/fedora-coreos-tracker#1667
This will compare the image that was just created to see if it
has any problems.
We think there might be some XFS reflink issues when we run
the OSBuild org.osbuild.qemu stage compression: false. See
coreos#3728 (comment)
We previously did this in a different way (2a8d1e6) but then
had to revert it (39fdd61) because it caused images to not boot [1].
The root cause appears to have been the virtiofs mount not
being unmounted cleanly from the supermin VM and that is now
fixed so let's switch back to not compressing since we rely on
our outer compression [2].

[1] coreos#3728
[2] coreos/fedora-coreos-tracker#1653 (comment)
Copy link
Member

@ravanelli ravanelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dustymabe dustymabe merged commit a78a123 into coreos:main Feb 16, 2024
5 checks passed
@dustymabe dustymabe deleted the dusty-osbuild branch February 16, 2024 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants