Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

35coreos-ignition: skip reboot if changed kargs match current boot #1409

Open
wants to merge 6 commits into
base: testing-devel
Choose a base branch
from
Open
Original file line number Diff line number Diff line change
@@ -1,34 +1,65 @@
#!/bin/bash
set -euo pipefail

# First check to see if the requested state is satisfied by the current boot
/usr/bin/rdcore kargs --current --create-if-changed /run/coreos-kargs-thisboot-differ "$@"

# There are a few cases we need to handle here. To illustrate this
# we'll use scenarios below where:
#
# - "booted": The kernel arguments from the currently booting system.
# - "ignition": The kernel arguments in the Ignition configuration.
# - "bls": The kernel arguments currently baked into the disk
# image BLS configs.
#
# The scenarios are:
#
# A.
# - Scenario:
# - booted: ""
# - ignition: "foobar"
# - bls: ""
# - Action: -> Update BLS configs, perform reboot
# B.
# - Scenario:
# - booted: "foobar"
# - ignition: "foobar"
# - bls: ""
# - Action: -> Update BLS configs, skip reboot
# C.
# - Scenario:
# - booted: ""
# - ignition: "foobar"
# - bls: "foobar"
# - Action: -> Skip update of BLS configs (they match already), perform reboot
#
# The logic here boils down to:
# if "ignition" != "booted"; then needreboot=1; fi
# if "ignition" != "bls"; then updatebls(); fi

# NOTE: we write info messages to kmsg here because stdout gets swallowed
# by Ignition if there is no failure. This forces the info into the
# journal, but sometimes the journal will miss these messages because
# of ratelimiting. We've decided to accept this limitation rather than
# add the systemd-cat or logger utlities to the initramfs.

# If the desired state isn't reflected by the current boot we'll need to reboot.
/usr/bin/rdcore kargs --current --create-if-changed /run/coreos-kargs-reboot "$@"
if [ -e /run/coreos-kargs-reboot ]; then
msg="Desired kernel arguments don't match current boot. Requesting reboot."
echo "$msg" > /dev/kmsg
Comment on lines +47 to +48
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: feels like we can collapse these here and elsewhere now that we're only outputting to /dev/kmsg?

fi

if is-live-image; then
# If we're in a live system and the kargs don't match then we must error.
Copy link
Member

@jlebon jlebon Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: tab

if [ -e /run/coreos-kargs-thisboot-differ ]; then
if [ -e /run/coreos-kargs-reboot ]; then
# Since we exit with error here the stderr will get shown by Ignition
echo "Need to modify kernel arguments, but cannot affect live system." >&2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go to /dev/kmsg for consistency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. In the case there is an error Ignition will bubble that up the user so we need to write to stderr. We only need to write to kmsg when the messages are informational (i.e. Ignition swallows any I/O when there isn't an error).

exit 1
fi
else
# Update the BLS configs if they need to be updated.
/usr/bin/rdcore kargs --boot-device /dev/disk/by-label/boot --create-if-changed /run/coreos-kargs-changed "$@"
# If the bootloader was changed and the kernel arguments don't match this boot
# then we must reboot. If they do match this boot then we can skip the reboot.
#
# Note we write info messages to kmsg here because stdout gets swallowed
# by Ignition if there is no failure. This forces the info into the journal,
# but sometimes the journal will miss these messages because of ratelimiting.
# We've decided to accept this limitation rather than add the systemd-cat or
# logger utlities to the initramfs.
if [ -e /run/coreos-kargs-changed ]; then
if [ -e /run/coreos-kargs-thisboot-differ ]; then
msg="Kernel arguments were changed. Requesting reboot."
echo "$msg" > /dev/kmsg
touch /run/coreos-kargs-reboot
else
msg="Kernel arguments were changed, but they match this boot. Skipping reboot."
echo "$msg" > /dev/kmsg
fi
msg="Kernel arguments in BLS config were updated."
echo "$msg" > /dev/kmsg
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there is also the case where the kargs in the BLS config already match the Ignition config, but neither match the current boot. Before and after this patch, we don't reboot, but we likely should now that we care about firstboot kargs in the diskful path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another commit for this case. Untested right now. If the strategy looks good I'll try to add an ext test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 100% sure how to add an ext test for this. Since we don't have any of the logs from the first boot it's hard to get that information. Any ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we put a karg in appendFirstbootKernelArgs and in the Ignition config's shouldNotExist section and then verify that the karg isn't present in /proc/cmdline in the test? We don't have logs from first boot, but we can deduce that it worked since it was present on first boot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think the problem is that the implementation for appendFirstbootKernelArgs (the file that exists in /boot) doesn't get cleaned up until Ignition fully runs. So any karg we put there will be present.

We could add another reboot and test then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offhand, I think one way to fix this is:

  1. store the fetched Ignition config in /boot
  2. store any NM keyfiles generated from kargs in /boot
  3. empty the stamp file but leave it there; it's still only deleted by firstboot complete

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a lot of effort and maybe not worth the risk.

We could prevent the bootloop by telling ourselves that we ran once before either through a stamp file or by adding another entry to the BLS config (coreos.karg.ran), but that wouldn't prevent the karg booted versus expected mismatch. It would only prevent the bootloop, though then we could error if we wanted to.

Interested to see what @bgilbert thinks here. We could just drop 35coreos-ignition: handle the case where booted kargs don't match desired too if it's not worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think caching the Ignition config in /boot is something we should probably do anyway. It's more efficient of course, but also because then config servers like the MCS can enforce a "one fetch only" rule as a way to protect secrets.

From there, adding the NM keyfiles and emptying the stamp file doesn't seem like much work. The underlying mechanism for getting NM keyfiles from /boot is already in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing a use case for adding a karg to firstboot-kargs and also Ignition shouldNotExist: that would imply the user wants to remove a default karg except on the first boot, and I don't think we have any default kargs where that's obviously reasonable. And in general, I don't think we need to be especially robust to invalid user configuration. If we deterministically boot-loop in this case, the user will notice and can fix their config.

If we think caching the config is worthwhile, I'm okay with pursuing it, but I don't think it needs to be a blocker here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An easy way this can happen I think is users relying on the auto-forwarding kargs at install time, and then wanting a different configuration at first boot time (e.g. different location, or different NIC used for install vs run). But agreed we don't need to block on this.

fi