Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grub2: Exit gracefully if the configuration has BLS enabled #1929

Conversation

martinezjavier
Copy link
Contributor

Since Fedora 30 grub2 has support to populate its menu entries from the
BootLoaderSpec fragments in /boot/loader/entries, so there's no need to
generate menu entries anymore using the /etc/grub.d/15_ostree script.

But since ostree doesn't update the bootloader, it may be that the grub2
installed is an old one that doesn't have BLS support.

For new installs, GRUB_ENABLE_BLSCFG=true is set in /etc/default/grub to
tell the /etc/grub.d/10_linux script if a blscfg command has to be added
to the generated grub2 config file.

So check if BLS is enabled in /etc/default/grub and only add the entries
if that's not the case. Otherwise the menu entries will be duplicated.

@rh-atomic-bot
Copy link

Can one of the admins verify this patch?
I understand the following commands:

  • bot, add author to whitelist
  • bot, test pull request
  • bot, test pull request once

@jlebon
Copy link
Member

jlebon commented Sep 26, 2019

bot, add author to whitelist

@jlebon
Copy link
Member

jlebon commented Sep 26, 2019

Can you add a link to https://bugzilla.redhat.com/show_bug.cgi?id=1751272#c27 in the commit message?

So there is one major caveat with this approach, which is that users upgrading from older versions have a spectacular chance of shooting themselves in the foot by setting GRUB_ENABLE_BLSCFG=true and ending up with no GRUB entries at all.

Which really, thinking on this more, is probably what people will try in an attempt to transition over to BLS-only. So I'm wondering if there's a way to catch this. E.g. is there an easy way for the script to check if the installed grub2 is new enough?

Since Fedora 30 grub2 has support to populate its menu entries from the
BootLoaderSpec fragments in /boot/loader/entries, so there's no need to
generate menu entries anymore using the /etc/grub.d/15_ostree script.

But since ostree doesn't update the bootloader, it may be that the grub2
installed is an old one that doesn't have BLS support.

For new installs, GRUB_ENABLE_BLSCFG=true is set in /etc/default/grub to
tell the /etc/grub.d/10_linux script if a blscfg command has to be added
to the generated grub2 config file.

So check if BLS is enabled in /etc/default/grub and only add the entries
if that's not the case. Otherwise the menu entries will be duplicated.

The approach has the drawback that if a user sets GRUB_ENABLE_BLSCFG=true
in /etc/default/grub without updating grub2, they will get an empty menu.
Since there won't be any entries created by the 30_ostree script and the
blscfg command won't work on the older grub2.

Unfortunately there is no way to know if the installed grub2 already has
BLS support or not.

Related: https://bugzilla.redhat.com/show_bug.cgi?id=1751272#c27
@martinezjavier
Copy link
Contributor Author

Can you add a link to https://bugzilla.redhat.com/show_bug.cgi?id=1751272#c27 in the commit message?

Done.

Which really, thinking on this more, is probably what people will try in an attempt to transition over to BLS-only. So I'm wondering if there's a way to catch this. E.g. is there an easy way for the script to check if the installed grub2 is new enough?

I also added the mentioned drawback in the commit message now. Unfortunately I don't think there's a way to know if the installed grub2 already has BLS support or not.

@cgwalters
Copy link
Member

E.g. is there an easy way for the script to check if the installed grub2 is new enough?

This is part of #1873 (comment)

I guess what may work on top of this is...for the EFI case at least, we could grab the checksums from the "Gold" Fedora releases that most people used to install, and compare vs that? Or...would timestamps work? For the BIOS case I guess we'd need to scrape the MBR and checksum that?

Dunno.

I guess I'd lean towards merging this as is and doing any additional work as a followup. We know a lot of people hit double entries; instinct says hopefully the risk of people turning on the flag is small.

@jlebon
Copy link
Member

jlebon commented Sep 26, 2019

Yeah, I'm leaning more towards merging this as well even with the caveat.

One idea I had: we could improve on this by doing something like the following:

  1. teach grub2-switch-to-blscfg to handle OSTree systems: updates the grub2 version, adds GRUB_ENABLE_BLSCFG=true, and drops a .grub2-blscfg-migrated somewhere.
  2. teach Anaconda to drop that file too.
  3. have 15_ostree exit gracefully only if GRUB_ENABLE_BLSCFG=true and the .grub2-blscfg-migrated is present.

I think we'll need (1) anyway if we want upgrading users to be able to switch to BLS, right?

But anyway, let's get this in for now and fix the common case. Tested working in an FSB31 VM.
@rh-atomic-bot r+ 94f60af

@rh-atomic-bot
Copy link

⚡ Test exempted: merge already tested.

@cgwalters
Copy link
Member

@cgwalters
Copy link
Member

Right...so following up on #1929 (comment) - we'd go back to having double entries though for people upgrading from older releases who haven't run that command.

But I think that's fine...if we document it in the wiki it's OK to start. Double entries is way less bad than none.

Does that script update the bootloader too?

@martinezjavier
Copy link
Contributor Author

Does that script update the bootloader too?

It doesn't because was thought for traditional Fedora where for EFI install the grub2 binary is updated (by the grub2-efi RPM copying the file in the ESP).

But we could make the script to copy the grub2 EFI binary in the ESP if /usr/lib/ostree-boot directory is present for example, so it's also updated for OSTree-based distros.

For legacy BIOS, the script currently only copies the blscfg module that's in /usr/lib/grub/i386-pc/blscfg.mod. Because updating the MBR and GRUB core.img was thought to be riskier (the script also copies the old grub.cfg as a grub.cfg.bk file so a user could load it with GRUB's configfile command if something goes wrong with the new BLS enabled grub.cfg).

Since that file should be also in the OSTree, there's nothing additional to be done for legacy BIOS.

@martinezjavier martinezjavier deleted the prevent-duplicated-entries branch October 7, 2019 12:01
@jlebon
Copy link
Member

jlebon commented Oct 7, 2019

So looking at the thread at https://discussion.fedoraproject.org/t/boot-entries-gone-after-upgrade/8026/22, it looks like people who used f30 install media with an older grub2 that can't read from the /boot/loader symlink will hit the "no boot entries" issue after updating to f31. And I'm guessing a lot more people are going to hit this once f31 hits GA. My suggestion now is to:

  1. revert this patch (probably just in dist-git for now?)
  2. document double boot entries (and possibly just a manual workaround by setting GRUB_ENABLE_BLSCFG=false)
  3. enhance grub2-switch-to-blscfg, OSTree, and Anaconda as mentioned above
  4. publicize that grub2-switch-to-blscfg on FSB now works

Should do at least do (1) and (2) before GA, and we could do (3) and (4) later. Does that make sense? Will cross-post in that thread too.

@martinezjavier
Copy link
Contributor Author

Should do at least do (1) and (2) before GA, and we could do (3) and (4) later. Does that make sense? Will cross-post in that thread too.

Agreed that we should do this ASAP, it's much better to have duplicated entries than no entries at all. I think is a good idea to only do the revert in dist-git for now.

@jlebon
Copy link
Member

jlebon commented Oct 7, 2019

@jlebon
Copy link
Member

jlebon commented Oct 7, 2019

jlebon added a commit to jlebon/ostree that referenced this pull request Oct 15, 2019
This reverts commit 985a141.

Reverting for now due to some users experiencing no boot entries after
upgrading. See for background:

ostreedev#1929
https://discussion.fedoraproject.org/t/boot-entries-gone-after-upgrade/8026
@jlebon jlebon mentioned this pull request Oct 24, 2019
@AdamWill
Copy link

So is the double-entry bug now resolved or still present? Need to know whether to keep the entry in the common bugs page with the release coming tomorrow. Thanks.

@martinezjavier
Copy link
Contributor Author

So is the double-entry bug now resolved or still present? Need to know whether to keep the entry in the common bugs page with the release coming tomorrow. Thanks.

Unfortunately the issue is still present so the common bug entry should be kept.

We wanted to fix it by only relying on the entries populated by the grub2 blscfg module and avoid ostree-grub2 to generate any entries, but then found that this lead to some users not having any entry at all. Since they were using an old grub2 with no BLS support or bugs and ostree doesn't update the bootloader as a part of an ostree update.

d4s pushed a commit to d4s/ostree that referenced this pull request Nov 4, 2019
Since Fedora 30 grub2 has support to populate its menu entries from the
BootLoaderSpec fragments in /boot/loader/entries, so there's no need to
generate menu entries anymore using the /etc/grub.d/15_ostree script.

But since ostree doesn't update the bootloader, it may be that the grub2
installed is an old one that doesn't have BLS support.

For new installs, GRUB_ENABLE_BLSCFG=true is set in /etc/default/grub to
tell the /etc/grub.d/10_linux script if a blscfg command has to be added
to the generated grub2 config file.

So check if BLS is enabled in /etc/default/grub and only add the entries
if that's not the case. Otherwise the menu entries will be duplicated.

The approach has the drawback that if a user sets GRUB_ENABLE_BLSCFG=true
in /etc/default/grub without updating grub2, they will get an empty menu.
Since there won't be any entries created by the 30_ostree script and the
blscfg command won't work on the older grub2.

Unfortunately there is no way to know if the installed grub2 already has
BLS support or not.

Related: https://bugzilla.redhat.com/show_bug.cgi?id=1751272#c27

Closes: ostreedev#1929
Approved by: jlebon
@bam80
Copy link

bam80 commented Mar 22, 2020

Just wonder is there any milestone we should keep the duplicated ostree-grub2 entries till?

@jlebon
Copy link
Member

jlebon commented Mar 23, 2020

@martinezjavier Have we implemented something like described in #1929 (comment) yet? (Specifically step 1).

@martinezjavier
Copy link
Contributor Author

@martinezjavier Have we implemented something like described in #1929 (comment) yet? (Specifically step 1).

@jlebon no, this fell through the cracks... sorry about that. But step 1 should be trivial so I can take a look to it, probably tomorrow.

@martinezjavier
Copy link
Contributor Author

@jlebon I did step 1 in the Fedora grub2 package (F32 and Rawhide) and also proposed PR #2044 for step 3. I can also take a look to do step 2 in Anaconda, but at least users should be able to workaround this by running grub2-switch-to-blscfg in F32 onwards.

@martinezjavier
Copy link
Contributor Author

Something that I forgot to mention is that only changed the grub2-switch-to-blscfg script to update the GRUB bootloader and drop the hidden .grub2-blscfg-supported file for EFI machines.

This is because updating GRUB for legacy BIOS machine is more error prone (e.g: users may be using the GRUB from another distro, don't want their MBR bootstrap code area to be overwritten, etc).

So I would prefer for legacy BIOS users to do this explicitly by running grub2-install and creating the /boot/grub2/.grub2-blscfg-supported file themselves.

@jlebon
Copy link
Member

jlebon commented Mar 26, 2020

Hmm, I wonder if there's some marker Anaconda leaves that we can use to figure out when the system was installed that we could use as a proxy for "GRUB is new enough". I don't think so, but worth asking. (I guess something analogous to the CoreOS aleph file).

@martinezjavier
Copy link
Contributor Author

Hmm, I wonder if there's some marker Anaconda leaves that we can use to figure out when the system was installed that we could use as a proxy for "GRUB is new enough". I don't think so, but worth asking. (I guess something analogous to the CoreOS aleph file).

AFAIK there isn't one, but I'll ask tomorrow to the Anaconda folks in there case there is.

@martinezjavier
Copy link
Contributor Author

AFAIK there isn't one, but I'll ask tomorrow to the Anaconda folks in there case there is.

@jlebon I talked with the Anaconda folks and they said that there isn't a marker but that the installation logs are kept in the installed system under /var/logs/anaconda, so in theory we should be able to figure out this by looking at what grub2 package version was installed.

The advantage would be that users that have a new enough GRUB won't need to execute grub2-switch-to-blscfg explicitly, although I don't really like the option of looking at Anaconda logs since I think that is leaking too much of Fedora installation specifics into the 15_ostree script.

So I prefer your suggestion of having an /boot/grub2/.grub2-blscfg-supported marker which is a generic interface for other distros to do the same (i.e: Endless OS also users ostree and their GRUB has the blscfg patches, so they could also just drop that file to denote that their bootloader supports parsing BLS snippets).

And also this wouldn't help for the legacy BIOS case either, because the place where GRUB is installed (the gap between the MBR and the start of the first partition) is shared between all distros in a multi OS setup.

So for example if a user first installs Silverblue and then another distro in a separate partition, the second distro will overwrite the Silverblue GRUB that was installed with its own GRUB that wouldn't have support to parse BLS snippets. So then the grub2 package version mentioned in the Anaconda logs is no longer relevant.

That's why I mentioned before that only for EFI the /boot/grub2/.grub2-blscfg-supported file is dropped, since in the case of EFI each OS has its own GRUB binary in the ESP, so the one in /boot/efi/EFI/fedora/ shouldn't be overwritten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants