Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metal: Support redundant bootable disks #581

Closed
cgwalters opened this issue Jul 23, 2020 · 11 comments · Fixed by coreos/fedora-coreos-config#794
Closed

metal: Support redundant bootable disks #581

cgwalters opened this issue Jul 23, 2020 · 11 comments · Fixed by coreos/fedora-coreos-config#794

Comments

@cgwalters
Copy link
Member

cgwalters commented Jul 23, 2020

We're about to gain support for many types of complex storage for the root filesystem, but this is distinct from the "root disk" which includes /boot and the ESP for example.

We had a subthread around this starting around here: #94 (comment)

Basically my proposal is that we support "equivalent RAID 1" by teaching coreos-installer how to replicate the /boot partition and ESP to multiple block devices. Making the root filesystem RAID is an orthogonal thing - for example, one might choose to use a higher RAID level for the OS and data, or one could use Stratis/btrfs/LVM for /. Another important variation here is to use LUKS-on-RAID (or equivalent) for the root device. To restate, the root filesystem can be distinct from how /boot and the ESP works.

I think it would work in this scenario to make /boot be MD-RAID - the bootloaders support that and it would mean having ostree update kernels would be fully transparent.
We can't make the ESP be RAID - that would need to be manually sync'd.

@cgwalters cgwalters changed the title Support redundant bootable disks metal: Support redundant bootable disks Jul 23, 2020
@bgilbert
Copy link
Contributor

Basically my proposal is that we support "equivalent RAID 1" by teaching coreos-installer how to replicate the /boot partition and ESP to multiple block devices.

The initramfs is already handling copyout/copyin for the root filesystem; why not for /boot and the ESP? That way this functionality wouldn't be limited to the coreos-installer flow, and would hew closer to the "do everything via Ignition" principle.

@cgwalters
Copy link
Member Author

I'm OK with implementing the work in the initramfs; though I think this case is basically never relevant for cloud platforms.

Are you also arguing (implicitly) that the user interface to this is via Ignition? I guess I'd thought of this as coreos-installer --redundant /dev/vdb /dev/vda or so. What would the Ignition look like if so? I don't think we can allow/support admins to do arbitrary things for /boot in the same way as /root. Or would be this have some high level sugar in fcct that compiles down into the Ignition to support setting up our main supported case of RAID 1 for /boot and then...hmm, generate a second ESP and a systemd unit to copy it?

@bgilbert
Copy link
Contributor

I'm OK with implementing the work in the initramfs; though I think this case is basically never relevant for cloud platforms.

It may be relevant for image-based bare metal environments, like Packet or Ironic.

Are you also arguing (implicitly) that the user interface to this is via Ignition?

Sure. Ignition-disks has always been a bit tricky to write configs for; it'll easily let you create disk layouts that won't boot. Writing a config requires knowing which operations are supported (erasing or moving the root filesystem, soon) and which ones aren't (moving the ESP, currently). This use case is just another example of that. To address it, we have happy-path documentation with examples, and potentially we also have FCCT.

The Ignition config would look like: create an ESP, BIOS boot partition, and /boot partition on a second disk, and an MD-RAID1 on /boot. We can recognize the first two by their type GUIDs, and for /boot, we could try using the systemd Extended Boot Loader Partition GUID or we could define our own GUID. We'd then handle the copy operation in initramfs glue, same as we do for root. That code can be special-cased to support only the things we want to support, since everything else will just fail and the user will fix their config. FCCT sugar makes sense here, since otherwise we're asking users to paste type GUIDs and partition sizes from docs into their Ignition config.

As to the on-disk layout: /boot on MD-RAID1 should work even without bootloader support. We can just use a 1.0 RAID superblock (which goes at the end of the partition). That also requires the bootloader never to write to the disk (so no grubenv), and currently also requires using the raid.options escape hatch in the Ignition spec. Bootloader support would be better, of course. The same trick might work for the ESP but that's problematic, since the firmware might decide to write to that filesystem. (For example, when applying a firmware update capsule.)

@cgwalters
Copy link
Member Author

It may be relevant for image-based bare metal environments, like Packet or Ironic.

Hmm right. Though even though coreos-installer is "glorified dd" a lot of the stuff we've built up there is really useful (like auto-detecting 4kn disks, validating signatures etc.). Replicating even that small stuff in Packet/Ironic carries a nontrivial cost.

I've been (implicitly) arguing that Ironic should basically learn to delegate to coreos-installer and not replicate it. But, that also carries a cost because Ironic obviously wants to support installing non-CoreOS systems too.

Anyways, I'm fine with doing it in the initramfs.

@cgwalters
Copy link
Member Author

I was thinking recently that https://github.com/coreos/bootupd/ could own the post-install synchronization aspect of this. I think we'd need to define some equivalent of a "RAID config" that'd be like a JSON file we write to each copy of the ESP that would contain a uuid for itself plus the other members of the set, then bootupd takes care of mounting and synchronizing later updates.

@bgilbert
Copy link
Contributor

bgilbert commented Nov 3, 2020

Proposal posted in coreos/enhancements#3.

@bgilbert
Copy link
Contributor

bgilbert commented Dec 4, 2020

The code landed in coreos/fedora-coreos-config#718 and the sugar in coreos/butane#162. Closing this out,

@bgilbert bgilbert closed this as completed Dec 4, 2020
@dustymabe
Copy link
Member

The fix for this went into testing stream release 33.20201214.2.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-next-release Fixed upstream. Waiting on a next release. status/pending-testing-release Fixed upstream. Waiting on a testing release. labels Dec 20, 2020
@bgilbert
Copy link
Contributor

We're planning to make some changes to the RAID functionality in coreos/fedora-coreos-config#794, coreos/butane#178, and coreos/coreos-assembler#1979.

Machines that were configured with a mirrored boot disk on 33.20201214.2.0 should continue to function on upgrade, but will work differently from mirrored boot disks deployed on newer releases. Ignition configs that specify boot device mirroring and are built with fcct 0.8.0 will not be compatible with future OS releases. Folks should feel free to test out the new release, but be aware that changes are coming.

@dustymabe, I'm going to reset this issue's labels accordingly.

@dustymabe
Copy link
Member

@dustymabe, I'm going to reset this issue's labels accordingly.

Sounds good. Thanks for the context!

@bgilbert bgilbert added status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Dec 22, 2020
@dustymabe
Copy link
Member

The fix for this went into testing stream release 33.20210104.2.0. Please try out the new release and report issues.

@dustymabe dustymabe removed status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants