Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

payload/rpmostree: Add support for bootupd #5298

Closed

Conversation

cgwalters
Copy link
Contributor

The https://github.com/coreos/bootupd project was created to fill the gap in bootloader management for ostree-based systems.

When it was created, it was just integrated into Fedora CoreOS and derivatives; this left the Atomic Desktops (Silverblue etc.) as unfixed, and it was never used by RHEL for Edge.

This PR is aiming to circle back and close that gap. We detect if bootupd is in the target root; if it is, then we should act as if bootloader --location=none had been specified, and just run bootupd to perform the installation.

The other hacks we have around the grub config are no longer necessary in this mode.

@pep8speaks
Copy link

pep8speaks commented Nov 2, 2023

Hello @cgwalters! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2023-11-02 21:03:24 UTC

@github-actions github-actions bot added the f40 label Nov 2, 2023
@cgwalters
Copy link
Contributor Author

This is totally untested as of right now; putting up for early feedback. I'll need some help in spinning up an Anaconda development environment (I did it long ago in the past).

For reference, a goal here is it should work with this to do e.g.
ostreecontainer --url quay.io/fedora/fedora-coreos:stable --no-signature-verification via kickstart.

The https://github.com/coreos/bootupd project was created to
fill the gap in bootloader management for ostree-based systems.

When it was created, it was just integrated into Fedora CoreOS
and derivatives; this left the Atomic Desktops (Silverblue etc.)
as unfixed, and it was never used by RHEL for Edge.

This PR is aiming to circle back and close that gap.  We
detect if bootupd is in the target root; if it is, then
we should act as if `bootloader --location=none` had been
specified, and just run bootupd to perform the installation.

The other hacks we have around the grub config are no longer
necessary in this mode.
"""
rc = execWithRedirect(
"bootupctl",
["backend", "install", "--auto", "--with-static-configs",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting another big difference here vs how Anaconda works by default is that we use a static grub config. This means we don't run grub2-mkconfig for every kernel update - which in turns we don't run os-prober. And IMO os-prober is one of the worst chunks of code we ship that runs as root, scanning all your block devices to see if there's e.g. a BeOS or Windows partition there every time you just want to update your kernel...

@cgwalters
Copy link
Contributor Author

/build-image

@cgwalters
Copy link
Contributor Author

Any feedback on this?

I see some unit tests started falling over; it may be slightly tricky to mock this. I can look, though I ran into other unrelated problems running the unit tests locally around locales I need to figure out.

Comment on lines +759 to +760
bootloader = STORAGE.get_proxy(BOOTLOADER)
bootloader.set_use_bootupd()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the current anaconda architecture really, and I was wondering if it really works to make "cross actor" dynamic calls or not like this. Basically here the bootloader state can only be fully computed once we've written the payload today. Hopefully this can work.

Copy link
Contributor

@VladimirSlavik VladimirSlavik Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't work. The place doing return [SomeTask()] is called before calling the run() that checks if there is bootupd installed. The whole installation task queue is gathered before starting the tasks.

Is there some way of detecting this from the payload before installation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VladimirSlavik That's what I though as well, but the installation queue still uses workarounds from the time when the payload wasn't fully migrated, so these tasks are actually collected on demand during the installation. That makes me sad, because it means that this will work until we remove the workarounds.

On the other side, we can hook the bootloader tasks to the bootloader mode change signal and update their states on change. Alternatively, we can wrap them into another task. It won't be so difficult to support this use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way of detecting this from the payload before installation?

Not today. we could add something to the container image metadata for it, and then have a flow that does:

  • fetch metadata (i.e. container image manifest)
  • set up kickstart state
  • perform installation

This topic intersects with #5197 a bit...

What I think would be really neat is generalized support for embedding kickstart fragments in the container image. This seems most viable then if it's done in the metadata instead of the payload, because otherwise in the general case we'd need to buffer the whole payload into RAM before an install.

If we did that, then we could add e.g. inst.bootc=quay.io/examplecorp/someos:latest on the kernel commandline which could then entirely replace inst.ks - the ergonomic improvement there would be huge in the general case.

(But this general case again is for "using pre-generated os/distro ISO with custom payload", like Fedora today, but not like what we want for a custom Image Builder flow)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually, not having a kickstart option and state would be preferable. If there's any way to read this from the payload once it's "known", before being "installed", that would be best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep agree that's preferable! In this stubbed out code we detect this state by noticing that /usr/bin/bootupctl is in the target root...that's after "install" but should still be before the bootloader stage, right?

Failing that, I think we can add a metadata property (LABEL bootupd.enabled=true) or so in the container image manifest. But implementing a separate "discovery" phase would be some new code.

(Not difficult code, just need to fork off skopeo inspect <container image reference>)

Copy link
Contributor

@VladimirSlavik VladimirSlavik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's how to get it closer to working...

Comment on lines +453 to +462
@property
def use_bootupd(self):
"""Whether bootupd is enabled"""
return self._use_bootupd

def set_use_bootupd(self):
"""Install the bootloader using https://github.com/coreos/bootupd"""
self._use_bootupd = True
self.set_bootloader_mode(BootloaderMode.SKIPPED)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "implementation", in the bootloader "module", but it needs to be added to the "interface", too. In the file bootlader_interface.py, add the same thing, CamelCase-named, touching self.implementation.<name-from-module>. Also, it's a D-Bus property, so in the interface should be a property, not a function. Check bootloader_mode and BootloaderMode to see an example how this is done.

(The interface members are translated to D-Bus automagically)

Comment on lines +501 to +502
bootloader = STORAGE.get_proxy(BOOTLOADER)
if not bootloader.use_bootupd:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you're calling the interface, so CamelCase UseBootupd once added as explained above.

@@ -509,7 +513,7 @@ def _move_grub_config(self):
os.rename(boot_grub2_cfg, target_grub_cfg)
os.symlink('../loader/grub.cfg', boot_grub2_cfg)

def _set_kargs(self):
def _set_kargs(self, bootloader):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add this option, please add it to docstring, it's a "proxy to bootloader". However, as it's just a Pythonified wrapper for D-Bus, it's equal to grabbing it anew as it was below, so just not doing this change is an option too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a procedural level please feel free to just force-push fixups. I haven't yet spun up an Anaconda dev/test environment.

@VladimirSlavik
Copy link
Contributor

VladimirSlavik commented Nov 10, 2023

I think...

  • Broadly speaking, this is the right place to plug this in, in terms of the larger system. I mean, not extending the normal bootloader hierarchy, but doing a special ostree thing.
  • Tests can be left out for now, until the thing works.

@@ -456,6 +469,11 @@ def install_bootloader_with_tasks(self, payload_type, kernel_versions):
:param kernel_versions: a list of kernel versions
:return: a list of tasks
"""
if self._use_bootupd:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that setting the bootloader mode to SKIPPED is enough to disable the tasks in the else branch (there are conditions in their run methods that will stop the tasks before any action). So the only effect of the use_bootupd flag on this module is running the bootupctl tool.

I am wondering if the bootupd code shouldn't stay in the rpm ostree module and the bootupctl tool shoudn't be called from there. This functionality is specific to ostree installations, right? Or is there a possibility of expansion to other types of payloads in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality is specific to ostree installations, right?

Today yes.

Or is there a possibility of expansion to other types of payloads in the future?

In theory it's possible, in practice it'd be a giant change and not really worth it.

@cgwalters
Copy link
Contributor Author

cgwalters commented Nov 10, 2023

Thanks for all the review so far!

Thinking about this a bit more, since our initial primary goal here will be generating "self-installing" ISOs via Image Builder with the content embedded, perhaps it's far simpler to add a new flag like bootloader --backend=bootupd?

@VladimirSlavik
Copy link
Contributor

VladimirSlavik commented Nov 10, 2023

If it satisfies the goal, then yes, that would be probably easier. Given how the sdboot stuff looks in kickstart, it would be bootloader --bootupd, the rest probably same as that on pykickstart side, on our side some checking that it's used only with ostreecontainer.

@cgwalters
Copy link
Contributor Author

bootloader --bootupd

This SGTM. Though to be clear right now I'm juggling about 10 things related to this, and happy to assist/pair program/meet on this, just probably can't drive it.

@poncovka
Copy link
Contributor

poncovka commented Nov 21, 2023

Hi @cgwalters!

We discovered some issues with the current approach. The idea is to act the same way as bootloader --location=none, however, this doesn't affect only the bootloader installation and configuration, but also the selection/discovery of the bootloader devices, scheduling of the bootloader partitions, sanity checks that validate these partitions and devices and installation tasks that make these devices bootable (via setting a boot flag). Without this support, the installer is not able to make sure that the system will be bootable after the installation and inform users about required actions. Instead of that, the installer will fail in the middle of the installation process after modifying user's disks. I don't think this was the intention, so I suggest to just disable our installation tasks for configuration and installation of the bootloader, but keep the remaining support in place.

I was testing the current workaround using https://centos.github.io/centos-boot/example.ks and virt-install:


# Download the example kickstart file.
curl https://centos.github.io/centos-boot/example.ks > example.ks

# Start the virtual machine with 1 disk.
virt-install \
  --connect qemu:///system \
  --name bootc-example \
  --memory 4096 \
  --vcpus 2 \
  --disk size=40 \
  --os-variant fedora38 \
  --location "https://dl.fedoraproject.org/pub/fedora/linux/releases/38/Everything/x86_64/os/" \
  --initrd-inject "$(pwd)/example.ks" \
  -x inst.ks=file:/example.ks \
  --transient \
  --destroy-on-exit \
  --wait

There are some notes from my investigation:

  1. If I use two disks (--disk size=20 --disk size=20), the installation will fail with the following error:

image

  1. If there are two or more disks, bootloader --location=none has affect on the generated partitioning layout.

pasted_image001
pasted_image002

  1. If I remove bootloader --location=none and the biosboot partition from the kickstart file, the installer will show a warning about the missing biosboot partition and won't continue with the installation. If I keep bootloader --location=none in place, there will be no warning and installation will fail.

  2. I think that bootupctl might be called with wrong arguments, but I am not able to verify it since I can't find a documentation for it. I would expect it to use our selected/detected stage1 device. @VladimirSlavik is already experimenting with it.

  3. It will be probably necessary to expand our support for bootloader devices and improve their scheduling, because the installer kind of ignores types that are not relevant for the current platform.

@cgwalters
Copy link
Contributor Author

however, this doesn't affect only the bootloader installation and configuration, but also the selection/discovery of the bootloader devices, scheduling of the bootloader partitions, sanity checks that validate these partitions and devices and installation tasks that make these devices bootable (via setting a boot flag). Without this support, the installer is not able to make sure that the system will be bootable after the installation and inform users about required actions. Instead of that, the installer will fail in the middle of the installation process after modifying user's disks. I don't think this was the intention, so I suggest to just disable our installation tasks for configuration and installation of the bootloader, but keep the remaining support in place.

Yes, this is fair. Hmm....so the primary problems here are around the BIOS/MBR handling?

Perhaps we can keep Anaconda doing that part...i.e. something like bootupctl backend install --bios-record-only or something that just does the basic tracking bootupd wants to do around the version of the bootloader at install time.

If I use two disks (--disk size=20 --disk size=20)

Hmm, what is the default Anaconda semantics of doing this with this kickstart? Is it trying to install an MBR on both disks? Will look. I don't actually know the semantics of the existing clearpart/part verbs in this scenario; is it trying to mirror everything? Pick one disk randomly? How important is this scenario?

@cgwalters
Copy link
Contributor Author

cgwalters commented Nov 21, 2023

OK I just tried tweaking the kickstart to do basically:

%packages
@core
%end

and dropping the bootupd stuff...i.e. just checking "what does anaconda do by default here", and the result seems quite bizarre to me:

Configuring storage
Creating disklabel on /dev/vdb
Creating biosboot on /dev/vdb2
Creating prepboot on /dev/vdb1
Creating disklabel on /dev/vda
Creating xfs on /dev/vda3
Creating ext4 on /dev/vda2
Creating efi on /dev/vda1

Are partitions just...randomly chosen here? Why put the bootloader bits on the second disk but the OS install on the first?

I am totally aware that this example kickstart does not explicitly request specific disks, and hence there's a bit of a GIGO effect here but...

For what we're targeting honestly I am not sure we really need to support use cases where the primary OS root is on a separate drive than the bootloader.

@poncovka
Copy link
Contributor

Hi @cgwalters ! I will get to your questions. Just a quick update, @VladimirSlavik is currently working on an alternative minimal solution that he and I agreed on. The idea is to start with something small and simple that works in most cases and build on that.

  1. We will try to go ahead with the detection instead of a special kickstart option. Specifically, we will look for /usr/bin/bootupctl on the new system, but only on RPM OSTree installations. If we ever need to tweak this, we can add the kickstart option.

  2. We will keep the bootloader support enabled with the exception of our bootloader installation tasks. Most of these tasks are already disabled on RPM OSTree installations except for InstallBootloaderTask.

    class InstallBootloaderTask(Task):

    This task is doing two things: generates a final list of kernel arguments and installs the bootloader. This is the place where we write to bootloader configuration files and call bootloader-specific tools (grub2-mkconfig, grub2-install, efibootmgr, ...). I think it should be enough to disable the bootloader installation, but we should probably keep the generation of the kernel arguments since the RPM OSTree payload uses it:

    set_kargs_args.extend(bootloader.GetArguments())

  3. The bootupctl tool will be called from the ConfigureBootloader task of the RPM OStree payload module (at least for now). This can evolve in the future in another direction, but it is easier this way.

  4. We will try to call the bootupctl tool with our stage1 device. Since the bootloader support will be enabled, this should be a valid drive chosen for booting the system. There are several important aspects of this device. Users are able to specify it in the kickstart file via bootloader --boot-drive. Or they can let Anaconda to choose a valid stage1 device automatically. This device also influences the scheduling of partitions (I will get to that in another comment). Without a valid stage1 device, the installer will refuse to continue with the installation.

@cgwalters
Copy link
Contributor Author

cgwalters commented Nov 22, 2023

The idea is to start with something small and simple that works in most cases and build on that.

SGTM!

We will try to go ahead with the detection instead of a special kickstart option. Specifically, we will look for /usr/bin/bootupctl on the new system, but only on RPM OSTree installations

Also agreed.

but we should probably keep the generation of the kernel arguments since the RPM OSTree payload uses it

Yes definitely!

@cgwalters
Copy link
Contributor Author

BTW when I was playing with this I now understand that also we definitely don't want bootloader --disabled because this also means that e.g. reqpart won't create an ESP, etc. And supporting reqpart is important I'd say.

@VladimirSlavik VladimirSlavik mentioned this pull request Nov 23, 2023
3 tasks
poncovka added a commit to poncovka/blivet that referenced this pull request Nov 23, 2023
Anaconda needs to be able to create hybrid boot disks. For example:

  clearpart --all --initlabel --disklabel=gpt
  part prepboot  --size=4    --fstype=prepboot
  part biosboot  --size=1    --fstype=biosboot
  part /boot/efi --size=100  --fstype=efi
  part /boot     --size=1000 --fstype=ext4 --label=boot
  part /         --grow      --fstype xfs

However, this kickstart snippet is not working with two or more disks.
The bootloader-related partitions should be all created on the disk
the computer will boot from, but Blivet does that only for platform
-specific partitions. The rest of them are created on any disk with
enough space.

It looks like this can be easily fixed by setting the same weight
to all of these partitions regardless of the current platform.

See: rhinstaller/anaconda#5298
@poncovka
Copy link
Contributor

poncovka commented Nov 23, 2023

@cgwalters About the problematic scheduling of bootloader partitions on multiple disks, long story short, I think we are able to fix that (storaged-project/blivet#1174). It doesn't solve all issues, but it should help with kickstart partitionings like this one: https://github.com/CentOS/centos-bootc/blob/main/docs/example.ks.

Screenshot from 2023-11-23 16-08-28

About the reqpart and autopart commands, this is on us and these commands will not create hybrid boot disks unless we specifically add support for it.

poncovka added a commit to poncovka/blivet that referenced this pull request Nov 24, 2023
Anaconda needs to be able to create hybrid boot disks. For example:

  clearpart --all --initlabel --disklabel=gpt
  part prepboot  --size=4    --fstype=prepboot
  part biosboot  --size=1    --fstype=biosboot
  part /boot/efi --size=100  --fstype=efi
  part /boot     --size=1000 --fstype=ext4 --label=boot
  part /         --grow      --fstype xfs

However, this kickstart snippet is not working with two or more disks.
The bootloader-related partitions should be all created on the disk
the computer will boot from, but Blivet does that only for platform
-specific partitions. The rest of them are created on any disk with
enough space.

It looks like this can be easily fixed by setting the same weight
to all of these partitions regardless of the current platform.

See: rhinstaller/anaconda#5298
poncovka added a commit to poncovka/blivet that referenced this pull request Nov 24, 2023
Anaconda needs to be able to create hybrid boot disks. For example:

  clearpart --all --initlabel --disklabel=gpt
  part prepboot  --size=4    --fstype=prepboot
  part biosboot  --size=1    --fstype=biosboot
  part /boot/efi --size=100  --fstype=efi
  part /boot     --size=1000 --fstype=ext4 --label=boot
  part /         --grow      --fstype xfs

However, this kickstart snippet is not working with two or more disks.
The bootloader-related partitions should be all created on the disk
the computer will boot from, but Blivet does that only for platform
-specific partitions. The rest of them are created on any disk with
enough space.

It looks like this can be easily fixed by setting the same weight
to all of these partitions regardless of the current platform.

See: rhinstaller/anaconda#5298
poncovka added a commit to poncovka/blivet that referenced this pull request Nov 24, 2023
Anaconda needs to be able to create hybrid boot disks. For example:

  clearpart --all --initlabel --disklabel=gpt
  part prepboot  --size=4    --fstype=prepboot
  part biosboot  --size=1    --fstype=biosboot
  part /boot/efi --size=100  --fstype=efi
  part /boot     --size=1000 --fstype=ext4 --label=boot
  part /         --grow      --fstype xfs

However, this kickstart snippet is not working with two or more disks.
The bootloader-related partitions should be all created on the disk
the computer will boot from, but Blivet does that only for platform
-specific partitions. The rest of them are created on any disk with
enough space.

It looks like this can be easily fixed by setting the same weight
to all of these partitions regardless of the current platform.

See: rhinstaller/anaconda#5298
@cgwalters
Copy link
Contributor Author

The "hybrid boot" thing is probably a bit of a distraction. I forget why I ended up doing that in the kickstart; I don't think there was a good reason.

poncovka added a commit to poncovka/centos-bootc that referenced this pull request Nov 27, 2023
The installer supports bootupd now, so we can drop the workaround.
See: rhinstaller/anaconda#5298
poncovka added a commit to poncovka/centos-bootc that referenced this pull request Nov 27, 2023
The partitioning defined in the example kickstart file suggests that the
installer supports hybrid boot. That's misleading and not true. Let's use
the `reqpart` kickstart command to automatically create partitions required
by the detected platform instead of creating all of them for all platforms.

Note: The `reqpart` command doesn't work with `bootloader --location=none` or
`bootloader --disabled`, so this commit depends on the installer's support
for bootupd: rhinstaller/anaconda#5298
poncovka added a commit to poncovka/centos-bootc that referenced this pull request Nov 27, 2023
The partitioning defined in the example kickstart file suggests that the
installer supports hybrid boot. That's misleading and not true. Let's use
the `reqpart` kickstart command to automatically create partitions required
by the detected platform instead of creating all of them for all platforms.

Note: The `reqpart` command doesn't work with `bootloader --location=none` or
`bootloader --disabled`, so this commit depends on the installer's support
for bootupd: rhinstaller/anaconda#5298
poncovka added a commit to poncovka/centos-bootc that referenced this pull request Nov 27, 2023
The installer supports bootupd now, so we can drop the workaround.
See: rhinstaller/anaconda#5298
poncovka added a commit to poncovka/centos-bootc that referenced this pull request Nov 27, 2023
The partitioning defined in the example kickstart file suggests that the
installer supports hybrid boot. That's misleading and not true. Let's use
the `reqpart` kickstart command to automatically create partitions required
by the detected platform instead of creating all of them for all platforms.

Note: The `reqpart` command doesn't work with `bootloader --location=none` or
`bootloader --disabled`, so this commit depends on the installer's support
for bootupd: rhinstaller/anaconda#5298
@poncovka
Copy link
Contributor

The "hybrid boot" thing is probably a bit of a distraction. I forget why I ended up doing that in the kickstart; I don't think there was a good reason.

I think I know why. The reqpart kickstart command doesn't work with bootloader --location=none or bootloader --disabled. I tested a modified kickstart file with our bootupd support and it seems to do what we want:

text

# Basic partitioning
clearpart --all --initlabel --disklabel=gpt
part /boot --size=1000  --fstype=ext4 --label=boot
part / --grow --fstype xfs
reqpart

ostreecontainer --url quay.io/centos-bootc/fedora-bootc:eln	--no-signature-verification
# Or: quay.io/centos-bootc/centos-bootc-dev:stream9

firewall --disabled
services --enabled=sshd

# Only inject a SSH key for root
rootpw --iscrypted locked
# Add your example SSH key here!
#sshkey --username root "ssh-ed25519 <key> demo@example.com"
reboot

I have opened a draft with these changes, because we need to know what we are aiming for: CentOS/centos-bootc#72

@VladimirSlavik
Copy link
Contributor

The PR from us #5342 is merged, so this can be closed as a code contribution. Feel free to continue discussion if needed...

poncovka added a commit to poncovka/centos-bootc that referenced this pull request Dec 15, 2023
The installer supports bootupd now, so we can drop the workaround.
See: rhinstaller/anaconda#5298
poncovka added a commit to poncovka/centos-bootc that referenced this pull request Dec 15, 2023
The partitioning defined in the example kickstart file suggests that the
installer supports hybrid boot. That's misleading and not true. Let's use
the `reqpart` kickstart command to automatically create partitions required
by the detected platform instead of creating all of them for all platforms.

Note: The `reqpart` command doesn't work with `bootloader --location=none` or
`bootloader --disabled`, so this commit depends on the installer's support
for bootupd: rhinstaller/anaconda#5298
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 participants