Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework grub2 backend for new fedora grub model #717

Closed
cgwalters opened this issue Mar 3, 2017 · 31 comments
Closed

rework grub2 backend for new fedora grub model #717

cgwalters opened this issue Mar 3, 2017 · 31 comments

Comments

@cgwalters
Copy link
Member

There's some plans to change how grub2 works in Fedora to better align with BLS among other things.
Among other things this should allow us to finally fix the grub2 config updates aren't atomic on EFI problem.

TODO: link to plans

@vathpela also points out that we could try testing for /sys/firmware/efi instead of the EFI config path potentially

@cgwalters
Copy link
Member Author

See also #706

@cmurf
Copy link

cmurf commented Jul 18, 2017

Not knowing the basics of the plan yet, I'll stick my neck out:

  • I'm confident libostree can own the GRUB bls.mod code, near as I can no one is using it.

  • Also libostree can legitimately gut menu entries from grub.cfg so that it's a static configuration file that should not ever need changing. The comment giving dire warning can be more explanatory, so that curious/invasive users can follow the logic with a concise trail of breadcrumbs rather than trying to track down separate documentation.

  • And then GRUB + reworked bls.mod code, can discover and dynamically merge grub.cfg and bls snippets into a menu of options.

  • /etc/default/grub is annoying in that it's a configuration file GRUB proper doesn't use, instead it requires grub2-mkconfig to pick up those changes. Goofy. And in a libostree bls world it already is ignored. So consider that killed, but it needs some kind of replacement.

  • 'ostree admin instutil set-kargs' writes out arguments where? Seems obscure as a command and what files are modified. Why not have this command manage a bootloader.kargs file found along with the BLS snippets? That way the user can directly modify that file old school style, or use 'ostree admin' to manage it. All new BLS snippets generated will pick up the current state of that file. Old snippets remain unchanged. Users can safely edit or copy any of the snippets.

  • I don't really care where the bls snippets go but I'm inclined to believe they should be in the same location no matter the file system, firmware (UEFI or BIOS), or arch. Standardization is nice.

Now, GRUB + updated bls.mod + static grub.cfg + static bls snippets, can be discovered and dynamically generate the grub menu. And bootloader config changes can always be atomic.

@cgwalters
Copy link
Member Author

cgwalters commented Jul 18, 2017

Currently ostree is designed around /boot/loader, i.e. https://www.freedesktop.org/wiki/Specifications/BootLoaderSpec/. That's what instutil set-kargs writes.

@cmurf
Copy link

cmurf commented Jul 19, 2017

On journaled file systems, sync() only guarantees data and journal are committed to stable media, it does not guarantee the journal contents are flushed to fs metadata. This is relevant for bootloaders, none of which have the ability to read/replay the journal. So if the fs metadata is not yet updated, you can get boot failure in certain situations (write new, delete current, rename new to current).

Long version (with links for long long long versions trying to figure all of this out)
rhboot/grubby#25

Right now ostree is not susceptible because it requires /boot on a separate fs volume. But if that layout were to change (supporting /boot as a directory or Btrfs subvolume) it'd be susceptible but chances are it would not be unbootable as in the bug cited in the issue above, but rather the menu would only show the current deployment rather than the new deployment. As soon as the kernel mounts the file system, it would replay the journal and update fs metadata, so the next reboot would show the newly added BLS entry. But to be certain of avoiding this problem: guarantee umount or remount ro (which may not be in ostree's domain), or use fifreeze.

@cgwalters
Copy link
Member Author

Yep, this is tracked in #876

@cgwalters
Copy link
Member Author

See also http://marc.info/?l=linux-fsdevel&m=150179189820971&w=2

Basically I think the journaling problem may still require more sophistication.

@martinezjavier
Copy link
Contributor

There's some plans to change how grub2 works in Fedora to better align with BLS among other things.
Among other things this should allow us to finally fix the grub2 config updates aren't atomic on EFI problem.

@cgwalters the BLS support on grub2 was improved for Fedora 28. I believe this better aligns to what's needed by ostree. Basically you can have a static grub.cfg with no menu entries, and theblscfg command parses the BLS fragment to generate the boot menu entries.

This means that at least on non-EFI machines, grub2 should be able to use directly the BLS fragments managed by ostree. I've tried the following on a Fedora28 Atomic Workstation image (plus some fixes that are now in the grub2 Fedora package):

$ atomic host unlock
$ chmod -x /etc/grub.d/15_ostree
$ grub2-switch-to-blscfg

And on reboot, grub2 populates the boot options from the ostree BLS fragments in /boot/loader/entries. That's without the need to use neither the 15_ostree grub.d script nor the ostree admin instutil grub2-generate command.

The grub2-switch-to-blscfg is a script that's used to convert a normal grub2 configuration to BLS. This should only be called once (i.e: on grubby remove).

My understanding is that this should be safe w.r.t the journal not flushed issue, since libostree calls to the FIFREEZE and FITHAW ioctls on deploy.

Now, on the EFI case is a little more tricky. Since the BLS fragments are expected to be in the ESP, which is vfat so we can't do the /boot/loader symlink to the current boot version.

But grub2 uses a blsdir env var to override the default BLS entries directory to be scanned in the ESP. So what we think that may work, is to deploy the BLS fragments in the ESP for EFI, i.e:

/boot/efi/EFI/fedora/loader.1/entries/

And have two symlinks in /boot, i.e:

/boot/loader -> loader.1
/boot/loader.1 -> /boot/efi/EFI/fedora/loader.1

And on deploy, the blsdir var in /boot/efi/EFI/fedora/grubenv will be set to the current boot version entries dir, i.e:

blsdir=/EFI/fedora/loader.1/entries

So basically on EFI the blsdir is used as the indirection level for the current boot version, instead of the symbolic link. But I would like to know your opinion, since you may have a better idea on how this can be done.

@cgwalters
Copy link
Member Author

FWIW my ideal on this is actually that grub2*.rpm were removed from the host images entirely - instead, Anaconda would use its copy, write the data into the /boot partition, and ostree would never touch it. There's some nits on this like the fact that editions/spins can have their own themed splash screens etc.; I dunno if I really care much though. If we did care we could tweak Anaconda to look in the target root for such data.

@cgwalters
Copy link
Member Author

I should have said first, thanks a ton for looking at this and testing it! I'll see if I can play with this some, if we're going to productize it it'll have to work through Anaconda too. Which...is probably going to require some install-time option? How about adding bootloader --bls to kickstart?

So basically on EFI the blsdir is used as the indirection level for the current boot version,

But we still don't have atomic file replacement on VFAT right? That isn't a blocker per se, since it's the status quo today, but it kind of sucks to rework things and not solve this.

Did you have any thoughts on my suggestion to have a grub ⇔ kernel-installer protocol that adds layers of redundancy and avoids relying fully on the filesystem to synchronize data?

Maybe it's as simple as having grub read in order:

/boot/efi/EFI/fedora/grubenv.new
/boot/efi/EFI/fedora/grubenv

Then ostree does:

chdir(/boot/efi/EFI/fedora/)
unlink(grubenv.new)
f = open(grubenv.new, O_WRONLY)
write_new_grubenv(f)
fsync(f)
close(f)
unlink(grubenv)
rename(grubenv.new, grubenv)

?

@cmurf
Copy link

cmurf commented Apr 25, 2018 via email

@vathpela
Copy link

But we still don't have atomic file replacement on VFAT right? That isn't a blocker per se, since it's the status quo today, but it kind of sucks to rework things and not solve this.

I'm not seeing why not, at least if the file is 512B and the file size isn't changing.

@cmurf
Copy link

cmurf commented Apr 25, 2018 via email

@martinezjavier
Copy link
Contributor

martinezjavier commented Apr 26, 2018

FWIW my ideal on this is actually that grub2*.rpm were removed from the host images entirely - instead, Anaconda would use its copy, write the data into the /boot partition, and ostree would never touch it.

@cgwalters that would certainly be possible with BLS + a static grub2 config file, since there won't be need for any grub2 utility, script or configuration (besides grub.cfg and grubenv).

But do you mean not even having the grub2-efi-x64 package? I now noticed that ostree doesn't update grub2 like we do on the non-atomic Fedora version:

$ rpm -qa | grep grub2-efi-x64
grub2-efi-x64-2.02-26.fc28.x86_64

$ md5sum /boot/efi/EFI/fedora/grubx64.efi 
be1b3601da78bab1483690276c6e47f1  /boot/efi/EFI/fedora/grubx64.efi

$ atomic host upgrade && systemctl reboot

$ rpm -qa | grep grub2-efi-x64
grub2-efi-x64-2.02-34.fc28.x86_64

$ md5sum /boot/efi/EFI/fedora/grubx64.efi 
be1b3601da78bab1483690276c6e47f1  /boot/efi/EFI/fedora/grubx64.efi

@martinezjavier
Copy link
Contributor

martinezjavier commented Apr 26, 2018

I should have said first, thanks a ton for looking at this and testing it! I'll see if I can play with this some, if we're going to productize it it'll have to work through Anaconda too. Which...is probably going to require some install-time option? How about adding bootloader --bls to kickstart?

Yes, the plan is to do something like that.

About the grubenv change not being safe on vfat, @vathpela already commented on this. Please let us if you think that's sensible what I mentioned in #717 (comment), so I could give it a try and cook a patch for ostree.

@cgwalters
Copy link
Member Author

cgwalters commented Apr 26, 2018

But do you mean not even having the grub2-efi-x64 package?

Exactly; it'd be in Anaconda, not in the target tree/image.

I now noticed that ostree doesn't update grub2 like we do on the non-atomic Fedora version:

Yep, because I don't know how to make it transactional. See: coreos/rpm-ostree#969

There's also mailing list discussion about this, but the TL;DR is we today copy e.g. grubx64.efi into /usr/lib/ostree-boot/grubx64.efi, which is copied by Anaconda at install time. Then we never touch it thereafter.

In the model I'm proposing, grub2-efi-x64.rpm would live in Anaconda (inside the squashfs) and it'd be installed from there into the target's /boot. Now the $64k question - is there an rpmdb entry for it or not? I'd say there isn't, but maybe we drop a text file like /boot/efi/EFI/fedora/grub.version or something.

@cgwalters
Copy link
Member Author

I'm not seeing why not, at least if the file is 512B and the file size isn't changing.

I am not a lowlevel storage expert; I'd certainly believe that a small write is atomic today with linux+vfat. On this subject I found the sqlite docs useful. My worry (perhaps completely unfounded) is that some level in the stack would defeat this.

Anyways, I'm vacillating a bit here myself between whether to try to change+fix everything at once or polish what's landed so far in f28 grub.

@cgwalters
Copy link
Member Author

dynamically mount and unmount the ESP at /efi which keeps it safer

With #1503 we could move libostree to only touch /boot (and /boot/efi) at "deployment finalization" time (i.e. shutdown), and this would only be if we're changing the kernel, which would probably help here.

Having straced both grubby and grub-mkconfig, there is no fsync() or sync() at all,

Hi Chris - this has been rather extensively covered elsewhere, not sure we need to rehash it here.

@martinezjavier
Copy link
Contributor

Exactly; it'd be in Anaconda, not in the target tree/image.

Right, that's what I understood but just wanted to be sure.

Yep, because I don't know how to make it transactional. See: coreos/rpm-ostree#969

There's also mailing list discussion about this, but the TL;DR is we today copy e.g. grubx64.efi into /usr/lib/ostree-boot/grubx64.efi, which is copied by Anaconda at install time. Then we never touch it thereafter.

I see, then everything that's installed by packages in /boot gets translated to /usr/lib/ostree-boot by the the rpm-ostree unpacker. So that means shim, fwup, etc are never updated either.

You mentioned in one of the threads that for the server case, frequent re-provisioning is quite common which makes this less of an issue. But not everyone will do that and it's definitely an issue for the Workstation case (I for example only do a fresh install every few years when I change my laptop).

At least for EFI (which I guess is the most fragile case due the ESP using vfat), we could make the upgrade more robust. For example by installing the new shim as /boot/efi/EFI/fedora/shimx86-new.efi and add a Boot0002 variable that points to that EFI binary. And then set BootNext and only modify BootOrder if the system has booted correctly with BootCurrent == Boot0002.

And maybe shim could also be changed to check a digest for a grubx64-new.efi and fallback to grubx64.efi if the checksum doesn't match.

Probably @vathpela has better ideas, but what I'm trying to say is that I think we could find a way to make this work reliable.

@martinezjavier
Copy link
Contributor

Anyways, I'm vacillating a bit here myself between whether to try to change+fix everything at once or polish what's landed so far in f28 grub.

Right, I think these are orthogonal issues that could be tackled separately.

If anything, the BLS support should improve the current situation since it will avoid the grub2-mkconfig -o grub.cfg.new && cp grub.cfg grub.cfg.old && copy grub.cfg.new grub.cfg that happens on the ESP and would just write to a file with fixed size.

@cgwalters
Copy link
Member Author

Probably @vathpela has better ideas, but what I'm trying to say is that I think we could find a way to make this work reliable.

That'd be awesome. I know it adds a lot of complexity. In the big picture, none of this really matters for "classic" systems as nothing about it is transactional, so there isn't really pressure to fix the bootloader part. But since ostree makes /usr + bootloader configs transactional, it forces us to think about "the other stuff".

So that means shim, fwup, etc are never updated either.

Yep.

But not everyone will do that and it's definitely an issue for the Workstation case (I for example only do a fresh install every few years when I change my laptop).

Yeah. I definitely have a worry that we may run into an issue where we start installing new grub config files that old grub doesn't know how to parse.

The thing I wanted to avoid as much as possible is a UI screen that effectively says "Don't turn off the power to your computer now, or it may be toast". On a lot of classic package managed (apt/yum/etc) systems, they simply don't even say that, even though it's the case. If we don't make the bootloader installation transactional, then we need to have such as screen and think about when it's run.

@martinezjavier
Copy link
Contributor

Yeah. I definitely have a worry that we may run into an issue where we start installing new grub config files that old grub doesn't know how to parse.

Agreed, although I was referring more to having a first and second stage boot-loader that are many years old.

The thing I wanted to avoid as much as possible is a UI screen that effectively says "Don't turn off the power to your computer now, or it may be toast". On a lot of classic package managed (apt/yum/etc) systems, they simply don't even say that, even though it's the case. If we don't make the bootloader installation transactional, then we need to have such as screen and think about when it's run.

Yes, I understand the concern. As mentioned, I think it would be possible by using EFI variables and making shim smarter. And that's something that will be an improvement also for package managed distros since as you said shim or grub could contain a regression that leads to an unbootable system (although I don't remember having that issue and have been using Fedora for a long time).

But as said, I think that BLS and how to make upgrading the components in the boot chain more reliable are separate issues, and we shouldn't attempt to solve all the problems at once.

@martinezjavier
Copy link
Contributor

But as said, I think that BLS and how to make upgrading the components in the boot chain more reliable are separate issues, and we shouldn't attempt to solve all the problems at once.

I've filled #1649 to discuss the issue of the ESP never being updated by ostree.

Let's keep this only to discuss the grub2's BLS support and how it can be used from ostree.

@martinezjavier
Copy link
Contributor

Currently ostree is designed around /boot/loader, i.e. https://www.freedesktop.org/wiki/Specifications/BootLoaderSpec/. That's what instutil set-kargs writes.

From recent discussion in fedora-devel mailing list, the BLS snippets are going to be in /boot/loader/entries for both EFI and non-EFI systems. This means that the BLS snippets created by ostree could be used and it should automatically work. So by using grub2 BLS support there won't be any need to touch the grub.cfg on the ESP anymore.

The only remaining issue is that we want grub2 to sort the boot menu entries using the BLS snippets filenames, but ostree uses ostree-$ID-$VARIANT_ID-$index.conf instead of ostree-$ID-$VARIANT_ID-$version.conf. Which means that the boot menu entries will be sorted in the reverse order.

@cgwalters
Copy link
Member Author

The only remaining issue is that we want grub2 to sort the boot menu entries using the BLS snippets filenames, but ostree uses ostree-$ID-$VARIANT_ID-$index.conf instead of ostree-$ID-$VARIANT_ID-$version.conf. Which means that the boot menu entries will be sorted in the reverse order.

I'm OK changing this, but discussion also made it sound like we'd sort by version if it was available?

@martinezjavier
Copy link
Contributor

@cgwalters currently yes, but that's only true for grub2. Petitboot for example only uses the BLS filename to sort the entries, so it won't work on ppc64le.

So the discussion is if we want to do the same for grub2 to be consistent with what the other bootloaders do.

I see that in most places _ostree_sysroot_read_boot_loader_configs() is used to get the BLS files and this function already returns them sorted by the version field. The only place where the index trailing number is used AFAICT is in the ostree-grub-generator script that lists the BLS files to populate the grub config file.

So the actual change should just be the following (modulo fixing the tests to take this into account):

diff --git a/src/boot/grub2/ostree-grub-generator b/src/boot/grub2/ostree-grub-generator
index 97ef4d69fbc..5b7ea1ab933 100644
--- a/src/boot/grub2/ostree-grub-generator
+++ b/src/boot/grub2/ostree-grub-generator
@@ -66,7 +66,7 @@ populate_menu()
     else
         boot_prefix="${OSTREE_BOOT_PARTITION}"
     fi
-    for config in $(ls $entries_path/*.conf); do
+    for config in $(ls -v -r $entries_path/*.conf); do
         read_config ${config}
         menu="${menu}menuentry '${title}' {\n"
         menu="${menu}\t linux ${boot_prefix}${linux} ${options}\n"
diff --git a/src/libostree/ostree-sysroot-deploy.c b/src/libostree/ostree-sysroot-deploy.c
index 08bf4c0d0a1..17e25fef736 100644
--- a/src/libostree/ostree-sysroot-deploy.c
+++ b/src/libostree/ostree-sysroot-deploy.c
@@ -1629,7 +1629,7 @@ install_deployment_kernel (OstreeSysroot   *sysroot,
   g_autofree char *bootcsumdir = g_strdup_printf ("ostree/%s-%s", osname, bootcsum);
   g_autofree char *bootconfdir = g_strdup_printf ("loader.%d/entries", new_bootversion);
   g_autofree char *bootconf_name = g_strdup_printf ("ostree-%s-%d.conf", osname,
-                                   ostree_deployment_get_index (deployment));
+                                   n_deployments - ostree_deployment_get_index (deployment));

Now, this will only work if there are deployments for a single OS, but if you do:

$ ostree admin deploy --os=testos testos:testos/buildmaster/x86_64-runtime
$ ostree admin deploy --os=otheros testos:testos/buildmaster/x86_64-runtime

Then you will a ostree-testos-1.conf and ostree-otheros-2.conf, so these won't be sorted correctly. One option is to have the version number before the os name, i.e: ostree-1-testos.conf and ostree-2-otheros.conf.

@martinezjavier
Copy link
Contributor

BTW, when looking at this I found a bug in ostree-grub-generator, it's uses ls to list the BLS snippets and populate the grub.cfg but ls by default sorts alphabetically. So this will fail if you have more than 10 deployments, i.e:

$ ls -1 /boot/loader/entries/
ostree-fedora-workstation-0.conf
ostree-fedora-workstation-10.conf
ostree-fedora-workstation-1.conf
...

So instead it should use the -v option to sort by version:

$ ls -1 -v /boot/loader/entries/
ostree-fedora-workstation-0.conf
ostree-fedora-workstation-1.conf
...
ostree-fedora-workstation-10.conf

I'll propose a PR to fix that.

@cgwalters
Copy link
Member Author

One option is to have the version number before the os name, i.e: ostree-1-testos.conf and ostree-2-otheros.conf

That makes sense to me!

@martinezjavier
Copy link
Contributor

That makes sense to me!

Great, I'll propose that change then. Thanks a lot for your feedback!

@martinezjavier
Copy link
Contributor

@cgwalters I've created PR #1654 that changes the BLS filenames as discussed.

@vtolstov
Copy link

vtolstov commented Jun 30, 2019

does this issue relevant now after merging BLS support to ostree? (as i'm read before it already present in ostree now)

@cgwalters
Copy link
Member Author

Closing this in favor of #1951 - most of the other things here ended up being implemented for Fedora CoreOS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants