Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent clobbering data partition when reprovisioning #1257

Open
bgilbert opened this issue Aug 2, 2023 · 0 comments
Open

Prevent clobbering data partition when reprovisioning #1257

bgilbert opened this issue Aug 2, 2023 · 0 comments

Comments

@bgilbert
Copy link
Contributor

bgilbert commented Aug 2, 2023

When reprovisioning existing systems, we assume we can rewrite the blocks at the beginning of the disk without impinging on any user-created data partitions immediately after the root partition. However, per coreos/fedora-coreos-tracker#1465, FCOS and RHCOS will eventually increase the size of the /boot partition and thus of the install image. Thus, if an older system is reprovisioned with a newer OS image, such data partitions will be clobbered.

We can't transparently fix disks that don't have enough space for the OS image, but we should try to prevent data loss. Historically we've intentionally ignored any partitions that aren't saved with --save-partindex or --save-partlabel, on the theory that the user knows what they're doing. However, we shouldn't provide footguns. Without changing our general policy, we could check the target disk for an existing CoreOS installation, and if present, add all data partitions to the saved-partition list. (We can't just save the first data partition because only the saved partition table entries are restored after an install failure.)

Detecting CoreOS isn't trivial. The root filesystem may be encrypted. /boot isn't, but might be on a RAID volume. We'd need to:

  1. Detect the boot partition label, or boot-<n> in the case of a Butane-created RAID config. Not all s390x systems have partition tables that support these labels.
  2. If present, start the RAID, read-only and possibly degraded.
  3. Mount the filesystem, read-only, keeping in mind that the boot partition label may not be unique on the host system.
  4. Check the filesystem for some combination of properties indicating a CoreOS volume. We need to successfully detect every FCOS and RHCOS release.
  5. Unmount the filesystem and stop the RAID.

In the overlap case, coreos-installer will download (if necessary) and write most of the image, notice the overlap before reaching the data partition, restore the saved partition table entries, and fail. We'll need to ensure the error message provides clear advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant