Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grub2: Don't add menu entries if GRUB supports parsing BLS snippets #2044

Merged

Conversation

martinezjavier
Copy link
Contributor

This is another attempt to avoid having duplicated menu entries caused by
GRUB having support to parse BLS snippets and the 15_ostree script adding
menu entries as well.

The previous attempt was in commit 985a141 ("grub2: Exit gracefully if
the configuration has BLS enabled") but that lead to users not having menu
entries at all, due having an old GRUB version that was not able to parse
the BLS snippets.

This happened because the GRUB bootloader is never updated in the ESP as
a part of the OSTree upgrade transaction.

The logic is similar to the previous commit, the 15_ostree script exits if
able to determine that the bootloader can parse the BLS snippets directly.

But this time it will not only check that a BLS configuration was enabled,
but also that a /boot/grub2/.grub2-blscfg-supported file exists. This file
has to be created by a component outside of OSTree that also takes care of
updating GRUB to a version that has proper BLS support.

@openshift-ci-robot
Copy link
Collaborator

Hi @martinezjavier. Thanks for your PR.

I'm waiting for a ostreedev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

We should definitely test all this before merging by e.g. provisioning an old Silverblue and upgrading it to f31, then running grub-switch-to-blscfg. I can look into that. (Or did you already do that test?)

@@ -26,6 +26,15 @@ if ! test -d /ostree/repo; then
exit 0
fi

# Gracefully exit if the grub2 configuration has BLS enabled,
# and the installed version has support for the blscfg module.
# Since there is no need to create menu entries for that case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cp ${grub_binary} ${grubdir} || exit 1

Eeek! You know that cp by default does open(... O_TRUNC) right? So if interrupted you'll get a half-written file...
From shell script the install binary is usually a better choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eeek! You know that cp by default does open(... O_TRUNC) right? So if interrupted you'll get a half-written file...

You are right. I'll fix that in the grub2 package.

@martinezjavier
Copy link
Contributor Author

We should definitely test all this before merging by e.g. provisioning an old Silverblue and upgrading it to f31, then running grub-switch-to-blscfg. I can look into that. (Or did you already do that test?)

Yes, what I did was to test was to get a Fedora Silverblue, update so the GRUB binary in /usr/lib/ostree-boot/efi/EFI/fedora/grubx64.efi is newer than the one in /boot/efi/EFI/fedora/grubx64.efi, then executed ostree admin unlock to have a writable overlayfs mounted in /usr, then updated the modified /usr/sbin/grub2-switch-to-blscfg and /etc/grub.d/15_ostree files and did:

$ grub2-switch-to-blscfg
$ grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

and then verified that /boot/efi/EFI/fedora/grubx64.efi was updated with the EFI binary in /usr/lib/ostree-boot/efi/EFI/fedora/grubx64.efi, that the /etc/grub.d/15_ostree section in /boot/efi/EFI/fedora/grub.cfg was empty and that after reboot no longer there were duplicated entries.

But if you can do a more extensive testing then that would be great. Thanks!

@jlebon
Copy link
Member

jlebon commented Mar 31, 2020

Awesome, that definitely raises confidence levels! :)

I think the main metric I'm concerned with here is whether there's any possible way this can break booting for people. For example, is there in your mind any way that grub2-switch-to-blscfg succeeds and 15_ostree now exits early, but in fact e.g. the BLS entries aren't actually picked up by GRUB on boot?

Edit: so just while perusing https://fedoraproject.org/wiki/Changes/BootLoaderSpecByDefault, blsdir came up which could do that. I.e. one could redefine it to point somewhere other than /boot/loader/entries which is where OSTree lives and users would get no boot entries. I feel like that's a corner-case, though could grub2-switch-to-blscfg be able to detect that and refuse to drop the stamp file if there's a non-default blsdir grubenv defined?

@martinezjavier
Copy link
Contributor Author

I feel like that's a corner-case, though could grub2-switch-to-blscfg be able to detect that and refuse to drop the stamp file if there's a non-default blsdir grubenv defined?

Yes, it's a corner case (the reason why that env var was introduced is because btrfs users may have the BLS snippets in a subvolume under /boot, and GRUB is only able to read the top level volume, so the path would be /subvolume_name/loader/entries).

But I agree to be extra cautious and only drop that file if the blsdir variable is not set. Still a user could set it afterwards, but I guess that's up to the user since it would only be set by grub2-mkconfig if the /boot filesystem is btrfs or zfs and /boot/loader/entries is in a subvolume (and that would be the correct thing to do or otherwise GRUB won't be able to find the BLS snippets).

The consequence of this would be that people using btrfs or zfs in /boot would have duplicated entries and will require manual intervention for them, but I guess that setup shouldn't be that common so is acceptable.

is there in your mind any way that grub2-switch-to-blscfg succeeds and 15_ostree now exits early, but in fact e.g. the BLS entries aren't actually picked up by GRUB on boot?

I can't think of one besides the blsdir variable being set that you mentioned. It may be that aren't picked due a bug in the GRUB blscfg module, but that may also be true for a regression in parsing the menuentry commands.

@jlebon
Copy link
Member

jlebon commented Apr 1, 2020

OK cool! Let me know when you've added that patch. As a last test before merging this, I'll run through the workflow on my own Silverblue laptop.

Probably what we'll want to do once this is merged and both ostree and grub2 are bumped in f31 is to update the common bugs entry and maybe send out an email on the devel mailing list or something.

At the same time as we talk about this to others, it's probably also worth raising awareness of bootloader=none since it skips the os-prober path entirely which is a waste of time for the great majority of users who don't dual-boot.

@martinezjavier
Copy link
Contributor Author

OK cool! Let me know when you've added that patch. As a last test before merging this, I'll run through the workflow on my own Silverblue laptop.

I pushed the change you suggeste to F31, F32 and Rawhide. I tested doing the following on a Silverblue F32 VM:

# update grub2-switch-to-blscfg in an /usr overlay with the version in latest build

$ ostree admin unlock

$ rpm -qf /usr/sbin/grub2-switch-to-blscfg 
grub2-tools-2.04-9.fc32.x86_64

$ curl https://kojipkgs.fedoraproject.org//packages/grub2/2.04/12.fc32/x86_64/grub2-tools-2.04-12.fc32.x86_64.rpm -O

$ rpm2cpio grub2-tools-2.04-12.fc32.x86_64.rpm | cpio -idm

# check that there's no .grub2-blscfg-supported marker file

$ ls -a /boot/grub2/
.  ..  grubenv  themes

# check that the file isn't created if blsdir is set

$ grub2-editenv - set blsdir="foo/bar"

$ grub2-switch-to-blscfg

$ ls -a /boot/grub2/
.  ..  grubenv  themes

# check that's created if blsdir isn't set

$ grub2-editenv - unset blsdir

$ grub2-switch-to-blscfg

$ ls -a /boot/grub2/
.  ..  .grub2-blscfg-supported  grubenv  themes

# re-generate the GRUB config file using the 15_ostree from this pull-request

$ curl https://raw.githubusercontent.com/ostreedev/ostree/786d9cceab27504944706060ea9754ce095aee37/src/boot/grub2/grub2-15_ostree > /etc/grub.d/15_ostree

$ grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

# check that there are no menu entries created by the 15_ostree script

$ grep "menuentry .* {" /boot/efi/EFI/fedora/grub.cfg
menuentry 'System setup' $menuentry_id_option 'uefi-firmware' {

Probably what we'll want to do once this is merged and both ostree and grub2 are bumped in f31 is to update the common bugs entry and maybe send out an email on the devel mailing list or something

Yes, and for x86_64 with legacy BIOS we can also add a note that the users could do the following if they want to get rid of the duplicated menu entries:

$ block_device=$(lsblk -spnlo name $(grub2-probe --target=device /boot/grub2) | tail -n1)
$ grub2-install $block_device
$ touch /boot/grub2/.grub2-blscfg-supported
$ grub2-mkconfig -o /boot/grub2/grub.cfg

The reason why I didn't suggest to make the grub2-switch-to-blscfg script execute the three first commands is that this will replace the GRUB that was installed in the gap between the end of the MBR and the start of the first partition, and also the bootstrap code area in the MBR.

And they may not want to do that if for example want to use the GRUB from another distro. In this case they won't have duplicated entries since the other distro would not have support to parse the BLS files so they will only have the entries that were added by the 15_ostree script.

At the same time as we talk about this to others, it's probably also worth raising awareness of bootloader=none since it skips the os-prober path entirely which is a waste of time for the great majority of users who don't dual-boot.

Indeed.

@martinezjavier
Copy link
Contributor Author

Yes, and for x86_64 with legacy BIOS we can also add a note that the users could do the following if

@jlebon and we could also add notes for ppc64le with OF and OPAL. I know that @sharkcz tests Silverblue on ppc64le so he may help testing the instructions.

I think that for OF is the same instructions than for x86_64 with legacy BIOS and for OPAL we just need to check if the Petitboot version in the machine is new enough to support parsing BLS snippets and drop the /boot/grub2/.grub2-bls-supported marker file in that case. The latter could even be done by the grub2-switch-to-blscfg script now that I think about it.

@cgwalters
Copy link
Member

/approve

# See: https://src.fedoraproject.org/rpms/grub2/c/7c2bab5e98d
. /etc/default/grub
if test -f /boot/grub2/.grub2-blscfg-supported && \
test ${GRUB_ENABLE_BLSCFG} = "true"; then
Copy link
Member

@cgwalters cgwalters Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test ${GRUB_ENABLE_BLSCFG} = "true"; then
test "${GRUB_ENABLE_BLSCFG}" = "true"; then

to avoid a syntax error and crash if the value isn't set. (Yes, shell script is an awful programming language)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid a syntax error and crash if the value isn't set. (Yes, shell script is an awful programming language)

Indeed, I usually run shellcheck to catch these but forgot this time. Pushed that change and also rebased the branch. Thanks a lot.

This is another attempt to avoid having duplicated menu entries caused by
GRUB having support to parse BLS snippets and the 15_ostree script adding
menu entries as well.

The previous attempt was in commit 985a141 ("grub2: Exit gracefully if
the configuration has BLS enabled") but that lead to users not having menu
entries at all, due having an old GRUB version that was not able to parse
the BLS snippets.

This happened because the GRUB bootloader is never updated in the ESP as
a part of the OSTree upgrade transaction.

The logic is similar to the previous commit, the 15_ostree script exits if
able to determine that the bootloader can parse the BLS snippets directly.

But this time it will not only check that a BLS configuration was enabled,
but also that a /boot/grub2/.grub2-blscfg-supported file exists. This file
has to be created by a component outside of OSTree that also takes care of
updating GRUB to a version that has proper BLS support.
@cgwalters
Copy link
Member

/lgtm

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, martinezjavier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 4a57204 into ostreedev:master Apr 7, 2020
@jlebon
Copy link
Member

jlebon commented Apr 7, 2020

Sorry for the delay in testing this myself too. Just did that now on my main FSB machine and it's working well!

One gotcha worth mentioning is that I initially also did bootloader=none in one shot with switching over to blscfg, but of course then the previous 15_ostree-emitted stale entries weren't actually being erased. So if one wants to do this, it has to be after at least one deployment & reboot.

Thanks a lot Javier!

@martinezjavier
Copy link
Contributor Author

martinezjavier commented Apr 16, 2020

Sorry for the delay in testing this myself too. Just did that now on my main FSB machine and it's working well!

@jlebon Thanks a lot for testing, I'm glad that it's working well for you. I've also fixed the issue pointed out by @cgwalters about using cp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants