sysroot: Rework /var handling to act like Docker `VOLUME /var` #3166

cgwalters · 2024-02-09T21:21:00Z

We've long struggled with semantics for /var. Our stance of "/var should start out empty and be managed by the OS" is a strict one, that pushes things closer to the original systemd upstream ideal of the "OS state is in /usr".

However...well, a few things. First, we had some legacy bits here which were always populating the deployment /var. I don't think we need that if systemd is in use, so detect if the tree has usr/lib/tmpfiles.d, and don't create that stuff at ostree admin stateroot-init time if so.

Building on that then, we have the stateroot var starting out actually empty.

When we do a deployment, if the stateroot var is empty, make a copy (reflink if possible of course) of the commit's /var into it.

This matches the semantics that Docker created with volumes, and this is sufficiently simple and easy to explain that I think it's closer to the right thing to do.

Crucially...it's just really handy to have some pre-existing directories in /var in container images, because Docker (and podman/kube/etc) don't run systemd and hence don't run tmpfiles.d on startup.

I really hit on the fact that we need /var/tmp in our container images by default for example.

So there's still some overlap here with e.g. /usr/lib/tmpfiles.d/var.conf as shipped by systemd, but that's fine - they don't actually conflict per se.

openshift-ci · 2024-02-09T21:21:05Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

cgwalters · 2024-02-09T22:15:06Z

/test all

This is intended to pair with ostreedev/ostree#3166 If we detect a new enough ostree version, then by default don't remap content in `/var`, assuming that ostree itself will handle it. In order to unit test this (without depending on the ostree version that happens to be on the host) add an API to the importer which allows overriding the version.

cgwalters · 2024-02-09T22:32:54Z

xref ostreedev/ostree-rs-ext#602

We've long struggled with semantics for `/var`. Our stance of "/var should start out empty and be managed by the OS" is a strict one, that pushes things closer to the original systemd upstream ideal of the "OS state is in /usr". However...well, a few things. First, we had some legacy bits here which were always populating the deployment `/var`. I don't think we need that if systemd is in use, so detect if the tree has `usr/lib/tmpfiles.d`, and don't create that stuff at `ostree admin stateroot-init` time if so. Building on that then, we have the stateroot `var` starting out actually empty. When we do a deployment, if the stateroot `var` is empty, make a copy (reflink if possible of course) of the commit's `/var` into it. This matches the semantics that Docker created with volumes, and this is sufficiently simple and easy to explain that I think it's closer to the right thing to do. Crucially...it's just really handy to have some pre-existing directories in `/var` in container images, because Docker (and podman/kube/etc) don't run systemd and hence don't run `tmpfiles.d` on startup. I really hit on the fact that we need `/var/tmp` in our container images by default for example. So there's still some overlap here with e.g. `/usr/lib/tmpfiles.d/var.conf` as shipped by systemd, but that's fine - they don't actually conflict per se.

cgwalters · 2024-02-09T23:53:17Z

OK yep, I've tested this more "end-to-end" in combination with ostreedev/ostree-rs-ext#602 and things work well.

jmarrero

lgtm

This is intended to pair with ostreedev/ostree#3166 If we detect a new enough ostree version, then by default don't remap content in `/var`, assuming that ostree itself will handle it. In order to unit test this (without depending on the ostree version that happens to be on the host) add an API to the importer which allows overriding the version.

cgwalters · 2024-02-11T19:53:39Z

One thing I was thinking about here is how this interacts with people who want to do a separate /var mount. It still makes sense to support combining them.

Looking at the current anaconda logic (assuming tasks aren't being run in parallel) then it looks like what happens is that the deploy task combines the stateroot-init with a deployment - and that's when we copy the container /var to the stateroot var.

And at this point, there's no mount point there. It's only after that that the mount point task runs - this is basically reimplementing the logic from ostree-prepare-root.

So in order to properly get the commit/container-image /var content on a separate mountpoint, it looks like we need to do one of two things:

Move data out of stateroot `var`

Possibly easiest: When we init the mountpoint in the second step, move data that ostree admin deploy wrote into the stateroot /var into the final mount

Split up phases more

Probably better: Split up the stateroot-init phase from the deploy, so the flow goes like this:

stateroot-init
mount /var from mount point if provided
deploy (which will write to the mount point)
bind mount stateroot var to target root

travier · 2024-02-12T16:28:47Z

So if I understand this correctly, this behaves like the factory option but with the content kept in /var?

At installation time, it will copy /var from the container to the stateroot.

What happens on updates?

If we're moving away from the factory option, shouldn't we completely remove all traces of it to remove confusion?

cgwalters · 2024-02-12T16:33:34Z

What happens on updates?

The default is that ostree will see that /var is non-empty, and not do anything with any content from the new commit/container /var content.

However, note this aligns with "factory reset" semantics; one can do a flow of rm -rf /var/* and reboot (in practice this would likely need to be done from the initramfs or at least a special systemd target that e.g. kills journald and other processes that might be writing to /var) and get the updated content.

cgwalters · 2024-02-12T16:37:29Z

If we're moving away from the factory option, shouldn't we completely remove all traces of it to remove confusion?

This is tricky. There's two problems being fixed here. The first is that I discovered belatedly that C+ isn't shipped in the systemd in c9s and so the /var handling didn't work at all there today.

The second is that C+ has more complex semantics, and I think it's ultimately much simpler to just be able to talk about "this works like VOLUME /var".

In theory something else could have started to rely on ostree including the tmpfiles.d rule for /usr/share/factory/var here too.

But yes...perhaps we should just remove it. Yeah, I will do a PR.

jlebon · 2024-03-25T16:45:21Z

BTW, one thing I realized this doesn't handle is that anyone doing /var on a separate filesystem will not get the seed content behaviour. ISTM like instead of deployment time, this should be done at first boot time After=var.mount, Before=local-fs.target or so?

jlebon · 2024-03-25T16:48:57Z

BTW, one thing I realized this doesn't handle is that anyone doing /var on a separate filesystem will not get the seed content behaviour. ISTM like instead of deployment time, this should be done at first boot time After=var.mount, Before=local-fs.target or so?

OK, split that into #3222.

openshift-ci bot added the do-not-merge/work-in-progress label Feb 9, 2024

cgwalters mentioned this pull request Feb 9, 2024

Multiple /var bugs ostreedev/ostree-rs-ext#598

Closed

cgwalters force-pushed the var-again branch 2 times, most recently from 428b823 to ab06743 Compare February 9, 2024 21:28

cgwalters mentioned this pull request Feb 9, 2024

store: If ostree >= 2024.3, retain content in /var ostreedev/ostree-rs-ext#602

Merged

cgwalters force-pushed the var-again branch from ab06743 to f81b9fa Compare February 9, 2024 22:46

cgwalters marked this pull request as ready for review February 9, 2024 23:52

openshift-ci bot removed the do-not-merge/work-in-progress label Feb 9, 2024

cgwalters enabled auto-merge February 9, 2024 23:53

jmarrero approved these changes Feb 10, 2024

View reviewed changes

cgwalters merged commit cb3c42e into ostreedev:main Feb 10, 2024
25 checks passed

This was referenced Feb 12, 2024

install to-filesystem: support separate /var for install containers/bootc#336

Open

Fix for /var content (ostreedev PR #569) doesn't seem to be in the latest centos-bootc-dev base image CentOS/centos-bootc#140

Closed

jlebon mentioned this pull request Mar 25, 2024

Seed /var behaviour and separate /var filesystems #3222

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sysroot: Rework /var handling to act like Docker `VOLUME /var` #3166

sysroot: Rework /var handling to act like Docker `VOLUME /var` #3166

cgwalters commented Feb 9, 2024

openshift-ci bot commented Feb 9, 2024

cgwalters commented Feb 9, 2024

cgwalters commented Feb 9, 2024

cgwalters commented Feb 9, 2024

jmarrero left a comment

cgwalters commented Feb 11, 2024 •

edited

Loading

travier commented Feb 12, 2024

cgwalters commented Feb 12, 2024

cgwalters commented Feb 12, 2024

jlebon commented Mar 25, 2024

jlebon commented Mar 25, 2024

sysroot: Rework /var handling to act like Docker VOLUME /var #3166

sysroot: Rework /var handling to act like Docker VOLUME /var #3166

Conversation

cgwalters commented Feb 9, 2024

openshift-ci bot commented Feb 9, 2024

cgwalters commented Feb 9, 2024

cgwalters commented Feb 9, 2024

cgwalters commented Feb 9, 2024

jmarrero left a comment

Choose a reason for hiding this comment

cgwalters commented Feb 11, 2024 • edited Loading

Move data out of stateroot var

Split up phases more

travier commented Feb 12, 2024

cgwalters commented Feb 12, 2024

cgwalters commented Feb 12, 2024

jlebon commented Mar 25, 2024

jlebon commented Mar 25, 2024

sysroot: Rework /var handling to act like Docker `VOLUME /var` #3166

sysroot: Rework /var handling to act like Docker `VOLUME /var` #3166

cgwalters commented Feb 11, 2024 •

edited

Loading

Move data out of stateroot `var`