40ignition-ostree: add autosave-xfs transposefs unit #2320

jlebon · 2023-03-22T21:00:34Z

Add a new transposefs unit: autosave-xfs. This unit runs after
ignition-disks and ignition-ostree-growfs, but before the restore
transposefs unit.

If the XFS root was grown, it checks if the allocation group count
(agcount) is within a reasonable amount (128 is chosen here). If
it isn't, it saves the rootfs and reformats the filesystem. The
restore unit will then restore it as usual. In the case of in-place
reprovisioning like LUKS (i.e. where the partition table isn't modified
by the Ignition config), the rootfs is still saved only once.

Ideally, instead of adding a new transposefs unit, we would make it
part of the initial save unit. But at that point, there's no way to
tell whether we should autosave without gazing even more deeply into the
Ignition config. We also don't want to unconditionally save the rootfs
when we may not need it.

Closes: coreos/fedora-coreos-tracker#1183

jlebon · 2023-03-22T21:01:58Z

overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh

+    # Semi-arbitrarily chosen: this is roughly ~64G currently (based on initial
+    # ag sizing at build time) which seems like a good rootfs size at which to
+    # discriminate between "throwaway/short-lived systems" and "long-running
+    # workload systems". It's not like XFS performance is way worse at 128.
+    if [ "$agcount" -lt 128 ]; then


@sandeen, ended up going with 128 as a threshold for this.

cgwalters

Gave this a read, looks good.

cgwalters · 2023-03-24T20:21:56Z

overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/module-setup.sh

@@ -28,6 +28,8 @@ install() {
        systemd-sysusers \
        systemd-tmpfiles \
        sort \
+        xfs_info \
+        xfs_spaceman \


🧑‍🚀

jlebon · 2023-04-05T13:31:38Z

Rebased for conflicts now! With FCOS releases freshly out, I think we can get this in.

dustymabe

Mostly LGTM - some suggestions

tests/kola/disks/growfs

tests/kola/root-reprovision/autosave-xfs/test.sh

tests/kola/root-reprovision/luks/test.sh

tests/kola/root-reprovision/luks/data/luks-test.sh

overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-growfs.sh

overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh

Prep for automatic XFS reprovisioning. The only way to know for sure whether the rootfs should be reprovisioned is analyzing it after the filesystem was grown. We could do calculations beforehand, but it'd get complex having to analyze the partition table. Anyway, the partition growing and e.g. LUKS container resizing need to happen before automatic reprovisioning and ignition-ostree-growfs already knows how to do that.

No functional change. Prep for future patch.

Currently, this code only executes via Ignition reprovisioning the rootfs, but we're about to add code to reprovision the rootfs outside of that path. In that case, we don't need to query the Ignition config.

Add a new transposefs unit: `autosave-xfs`. This unit runs after `ignition-disks` and `ignition-ostree-growfs,` but before the `restore` transposefs unit. If the XFS root was grown, it checks if the allocation group count (agcount) is within a reasonable amount (128 is chosen here). If it isn't, it saves the rootfs and reformats the filesystem. The `restore` unit will then restore it as usual. In the case of in-place reprovisioning like LUKS (i.e. where the partition table isn't modified by the Ignition config), the rootfs is still saved only once. Ideally, instead of adding a new transposefs unit, we would make it part of the initial `save` unit. But at that point, there's no way to tell whether we should autosave without gazing even more deeply into the Ignition config. We also don't want to unconditionally save the rootfs when we may not need it. Closes: coreos/fedora-coreos-tracker#1183

We weren't checking anywhere in the non-reprovisioning case that we grow the root filesystem on first boot. Add a trivial test for this.

Prep for adding another LUKS where we want the same checks.

Add a new `ext.config.root-reprovision.autosave-xfs` test that checks that the logic kicks in on a large enough disk. Add a similar `ext.config.root-reprovision.luks.autosave-xfs` for the LUKS version of this. Sanity-check in other reprovisioning tests that autosave-xfs didn't kick in.

jlebon · 2023-04-06T20:55:36Z

Updated for comments!

dustymabe · 2023-04-06T21:03:35Z

overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh

@@ -202,6 +244,23 @@ case "${1:-}" in
            mkdir "${saved_prep}"
        fi
        ;;
+    autosave-xfs)
+        should_autosave=$(should_autosave_rootfs)


so (by design) now the script will exit here if an unexpected error occurs (because of set -euo pipefail) and that is how we'll handle the cases we weren't handling before?

Yup, exactly. It's no longer part of an if-statement, so the errexit should be respected.

dustymabe · 2023-04-06T21:04:09Z

overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh

@@ -202,6 +244,23 @@ case "${1:-}" in
            mkdir "${saved_prep}"
        fi
        ;;
+    autosave-xfs)
+        should_autosave=$(should_autosave_rootfs)
+        if [ "${should_autosave}" = "1" ]; then


Suggested change

if [ "${should_autosave}" = "1" ]; then

if [ "${should_autosave}" == "1" ]; then

It always throws me off that a single = works.

You don't need to change this, I'm just calling it out.

dustymabe

LGTM

cgwalters · 2023-07-19T18:47:14Z

Downstream bug https://issues.redhat.com/browse/OCPBUGS-16157

This was added to force a reprovision of the root filesystem on a particular instance we were using for a RHCOS builder, but it's no longer needed. Now when a system has a disk > 100G the root filesystem will get reprovisioned automatically by the work done in coreos/fedora-coreos-config#2320

cgwalters · 2023-08-23T12:57:59Z

Further fallout from this in https://issues.redhat.com/browse/OCPBUGS-16724

The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. First, rework the reprovision threshold to operate in terms of disk size, which is much easier to explain and debug than allocation group count. (Which to be clear, *is* the real problem, but disk size is a good enough proxy for this) Then, bump the reprovision threshold to 1TiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo.

The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes.

cgwalters · 2023-08-24T18:50:55Z

Followup PR in #2565

The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes.

The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable

The change in #2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable

The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable (cherry picked from commit 4faba4f)

The change in #2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable (cherry picked from commit 4faba4f)

The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable

jlebon commented Mar 22, 2023

View reviewed changes

jlebon mentioned this pull request Mar 22, 2023

CoreOS autoinstall creates huge number of XFS allocation groups coreos/fedora-coreos-tracker#1183

Closed

cgwalters previously approved these changes Mar 24, 2023

View reviewed changes

jlebon dismissed cgwalters’s stale review via fe72963 April 5, 2023 13:31

jlebon force-pushed the pr/xfs-autoreprovision branch from 7cacd47 to fe72963 Compare April 5, 2023 13:31

dustymabe reviewed Apr 5, 2023

View reviewed changes

jlebon added 7 commits April 6, 2023 16:52

40ignition-ostree: factor out zram-related functions

9b70797

No functional change. Prep for future patch.

40ignition-ostree: skip udev hack if Ignition did not reprovision rootfs

39f64dc

Currently, this code only executes via Ignition reprovisioning the rootfs, but we're about to add code to reprovision the rootfs outside of that path. In that case, we don't need to query the Ignition config.

tests/kola: add non-exclusive check for growfs

298a03e

We weren't checking anywhere in the non-reprovisioning case that we grow the root filesystem on first boot. Add a trivial test for this.

tests/kola: move LUKS checks to shared file

5a5edfe

Prep for adding another LUKS where we want the same checks.

jlebon force-pushed the pr/xfs-autoreprovision branch from fe72963 to 2a3c4aa Compare April 6, 2023 20:55

dustymabe reviewed Apr 6, 2023

View reviewed changes

dustymabe approved these changes Apr 6, 2023

View reviewed changes

jlebon enabled auto-merge (rebase) April 6, 2023 21:23

jlebon merged commit 1744c68 into coreos:testing-devel Apr 6, 2023

marmijo mentioned this pull request Apr 7, 2023

ext.config.root-reprovision.autosave-xfs and ext.config.root-reprovision.luks.autosave-xfs failing on aarch64 and ppc64le coreos/fedora-coreos-tracker#1458

Closed

jschintag mentioned this pull request Apr 18, 2023

s390x/Secure Execution: Firstboot fails during ignition-ostree-growfs.service openshift/os#1264

Closed

jlebon deleted the pr/xfs-autoreprovision branch April 23, 2023 23:29

dustymabe mentioned this pull request Jul 27, 2023

multi-arch-builders: drop coreos-ppc64le-builder-512e.bu config coreos/fedora-coreos-pipeline#898

Merged

dustymabe mentioned this pull request Aug 4, 2023

low memory systems with large disks fail to provision coreos/fedora-coreos-tracker#1535

Open

cgwalters mentioned this pull request Aug 24, 2023

transposefs: Only autosave-xfs for much larger filesystems #2565

Merged

jlebon mentioned this pull request Sep 8, 2023

blockdev: query physical sector size, not logical coreos/coreos-installer#1056

Closed

djuran mentioned this pull request Feb 5, 2024

OCPBUGS#10640: Added clarification point to disk partition BM doc openshift/openshift-docs#67707

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

40ignition-ostree: add autosave-xfs transposefs unit #2320

40ignition-ostree: add autosave-xfs transposefs unit #2320

jlebon commented Mar 22, 2023

jlebon Mar 22, 2023

cgwalters left a comment

cgwalters Mar 24, 2023

jlebon commented Apr 5, 2023

dustymabe left a comment

jlebon commented Apr 6, 2023

dustymabe Apr 6, 2023 •

edited

Loading

jlebon Apr 6, 2023

dustymabe Apr 6, 2023

dustymabe left a comment

cgwalters commented Jul 19, 2023

cgwalters commented Aug 23, 2023

cgwalters commented Aug 24, 2023

	if [ "${should_autosave}" = "1" ]; then
	if [ "${should_autosave}" == "1" ]; then

40ignition-ostree: add autosave-xfs transposefs unit #2320

40ignition-ostree: add autosave-xfs transposefs unit #2320

Conversation

jlebon commented Mar 22, 2023

jlebon Mar 22, 2023

Choose a reason for hiding this comment

cgwalters left a comment

Choose a reason for hiding this comment

cgwalters Mar 24, 2023

Choose a reason for hiding this comment

jlebon commented Apr 5, 2023

dustymabe left a comment

Choose a reason for hiding this comment

jlebon commented Apr 6, 2023

dustymabe Apr 6, 2023 • edited Loading

Choose a reason for hiding this comment

jlebon Apr 6, 2023

Choose a reason for hiding this comment

dustymabe Apr 6, 2023

Choose a reason for hiding this comment

dustymabe left a comment

Choose a reason for hiding this comment

cgwalters commented Jul 19, 2023

cgwalters commented Aug 23, 2023

cgwalters commented Aug 24, 2023

dustymabe Apr 6, 2023 •

edited

Loading