Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Allow manual update checks and reboots #498

Open
kelvinfan001 opened this issue Mar 10, 2021 · 22 comments
Open

RFE: Allow manual update checks and reboots #498

kelvinfan001 opened this issue Mar 10, 2021 · 22 comments

Comments

@kelvinfan001
Copy link
Member

kelvinfan001 commented Mar 10, 2021

Feature Request

Desired Feature

Allow for manual, safe, Zincati-driven update checks and reboots.
Currently, in rpm-ostree (v2021.2 and later), if a user tries to rpm-ostree upgrade when an update driver (e.g. Zincati) "owns" updates on that machine, rpm-ostree will correctly refuse by default and instruct the user to refer to the updates driver's (Zincati's) documentation, implying that the user should perform an upgrade via the updates driver, instead. However, there is actually no convenient/user-friendly way to perform an upgrade immediately through Zincati, either.
Stemming from the conversation in coreos/rpm-ostree#2566 (comment), it would be nice if an admin could manually get Zincati to immediately check for an update and possibly reboot into it.
A possible use case for this feature would be when admins know that there is a new update available that contains a bug fix/feature and they want it immediately, but Zincati has not automatically updated into that release due to not checking for updates frequently enough, reboot strategy restraints, or phased rollouts (and wariness). Current ways admins could get around this:

  1. Reconfigure Zincati to e.g. have lower wariness, use immediate reboot strategy. Then restart Zincati.
  2. Use the more direct rpm-ostree upgrade --bypass-driver.

1. does not seem very user-friendly, and 2. is potentially unsafe as rpm-ostree has no knowledge of update graphs, barrier releases, reboot scheduling windows, etc.

Example Usage

check-update

Include a command for telling Zincati to check for updates immediately. This should probably temporarily set Zincati's rollout wariness to 0.0 in order to hint Cincinnati to respond with the latest possible release.

$ zincatictl check-update
No new updates.

$ zincatictl check-update
Release ... found and deployed. Use `zincatictl finalize` to unlock staged deployment and reboot into it.

finalize-update

Also include a finalize-update command to override the reboot strategy, unlock the staged deployment, and reboot immediately.

If strategy allows reboot now, machine will reboot:

$ zincatictl finalize-update

If strategy does not allow reboot:

$ zincatictl finalize-update
Update strategy does not allow for reboot. Use `--force` to force an update finalization.

Force a reboot, overriding the reboot strategy:

$ zincatictl finalize-update --force

Note: the --force flag (as opposed to force by default) is useful because Zincati has a DEFAULT_REFRESH_PERIOD_SECS that periodically checks for permission to reboot after an update is staged. finalize-update should get Zincati to check for permission immediately.

Other Information

Relevant rpm-ostree PRs and issues:

@kelvinfan001
Copy link
Member Author

/cc @jlebon @cgwalters to check if this makes sense and is compatible with rpm-ostree's proper usage (not sure if this functionality should be exposed to the user through rpm-ostree instead).

@jlebon
Copy link
Member

jlebon commented Mar 31, 2021

The RFE itself makes a lot of sense to me! Before adding a CLI, I'd definitely lean towards keeping the focus on integration with rpm-ostree instead to keep the UX simple. E.g. rpm-ostree already knows how to present available updates, diffs, etc...

So implementation-wise, we should probably brainstorm on what the API between rpm-ostree and update drivers should be. Maybe it's UNIX sockets, or D-Bus, etc... E.g. for D-Bus, we could standardize on a well-known bus name that we expect update drivers to acquire and then you'd have a GetUpdate method which rpm-ostree upgrade would call out to to check for updates and get the OSTree commit to upgrade to if so.

@kelvinfan001
Copy link
Member Author

kelvinfan001 commented Apr 8, 2021

I agree that it's more user-friendly to just have rpm-ostree communicate with the update driver, instead of referring the user to the update driver and learn another CLI.

for D-Bus, we could standardize on a well-known bus name that we expect update drivers to acquire and then you'd have a GetUpdate method which rpm-ostree upgrade would call out to to check for updates and get the OSTree commit to upgrade to if so.

#514 added a POC D-Bus interface to Zincati (currently the bus name for now is org.coreos.zincati). There's also a WIP PR here that has two methods, CheckUpdate and FinalizeUpdate, but I think first we should come up with a set of APIs that update drivers for rpm-ostree should have.

@kelvinfan001
Copy link
Member Author

kelvinfan001 commented Apr 8, 2021

Like @jlebon mentioned, we should at least have a GetUpdate method, where the update driver returns whether there is a possible version to "legally" update to.
One thing to consider about this option is, for drivers like Zincati that may have update "strategies", can we safely assume that a user who calls rpm-ostree upgrade always wishes to ignore any restrictions that the update driver's update strategy imposes? If not, then perhaps we should have a method that tells the update driver to "check update and try to reboot into it" in a single method call, so then if the update driver has a strategy that disallows an immediate reboot, it would just fail. Or maybe have another method along the lines of "check if reboot is allowed by updates driver", but that would mean all rpm-ostree update drivers have a update strategy or something else that prevents spontaneous reboots; I'm not sure if this is always the case.

@kelvinfan001 kelvinfan001 added the jira for syncing to jira label Apr 12, 2021
@jlebon
Copy link
Member

jlebon commented Apr 16, 2021

Random comment on this, I think a common use case for the "manual update" path will be to have Zincati actually be "almost disabled" (systemd unit enabled, but config files neutering automatic updates) so that all updates are done manually by a sysadmin. (See discussions in https://discussion.fedoraproject.org/t/28946.)

@kai-uwe-rommel
Copy link

@jlebon thanks for mentioning this. We have a couple of FCOS instances but not enough to warrant a zincati infrastructure. On the other hand, we must control the reboot windows and would rather use a generic automation tool via ssh (e.g. Ansible) for upgrading the instances. For that, such a CLI is needed.

@lucab
Copy link
Contributor

lucab commented Apr 16, 2021

@kai-uwe-rommel correct me if I misread your usecase, but I think you'd be properly served by doing #245 and letting Ansible own such file.

@kai-uwe-rommel
Copy link

@lucab yes, may be. But that feature does not yet exist, right?
At the moment I'm looking for a solution to upgrade the outdated instances through the barrier release now.
But of course, a long term solution for further updates is something needed as well.

@lucab
Copy link
Contributor

lucab commented Apr 16, 2021

@kai-uwe-rommel correct. The specific context was, going forward, how to better serve usecases/flows like your which are really a mix of automation plus human-control.

@kelvinfan001
Copy link
Member Author

Another use case for this feature could be for situations such as #554 (comment) where a user wants an immediate reboot into a staged update.

@kelvinfan001
Copy link
Member Author

Some concerns raised regarding this feature: #554 (comment)

@dghubble
Copy link
Member

dghubble commented Aug 4, 2021

I like the idea that Zincati offer some level of CLI interaction / introspection.

Initiating actions like check-update to check for an update now (was an update found? is it already being downloaded? progress?) would be useful. finalize to reboot now would be useful when testing the fleetlock protocol implementations (maybe --force overrides strategy as the OP suggests). And status commands to show where in the finite state machine Zincati thinks it is. Zero touch is somewhat opaque at the moment.

@lucab
Copy link
Contributor

lucab commented Aug 5, 2021

status commands to show where in the finite state machine Zincati thinks it is

This one specifically is now exposed to systemd:

$ systemctl show -p StatusText zincati.service
StatusText=periodically polling for updates (last checked Thu 2021-08-05 10:48:44 UTC)

Although it is meant as a human/debugging helper, not as a machine API.

@dustymabe
Copy link
Member

I think we bumped up into this overall topic again in https://discussion.fedoraproject.org/t/unable-to-upgrade-to-35-20211029-3-0/34925/4?u=dustymabe.

Basically if we're going to break the user expectation that reboot will land them in the new deployment we need to feed them something else. "Don't use $that, use $this", works better than "Don't use $that, just wait".

@lucab
Copy link
Contributor

lucab commented Dec 3, 2021

Uhm, for that specific post the user did configure a periodic window and in same spot is expecting an update/reboot to happen (immediately) outside of that timeframe. And then getting a reboot pending due to update strategy and wondering why it is still kept staging. And then rebooting manually.

So yes a finalize-update --force could have somehow worked here, but it's also an extreme point of conflicting/confused expectations from the user (which can't really be fixed by code).

@mitchellmaler
Copy link

Is there any update on this to allow bypassing the set update window for the staged upgrade?

@lucab
Copy link
Contributor

lucab commented May 4, 2022

@mitchellmaler no, otherwise it would have been noted here or in an associated PR.

@dwarf-king-hreidmar

This comment was marked as off-topic.

@lucab

This comment was marked as off-topic.

@napaalm
Copy link

napaalm commented Apr 13, 2024

@lucab any update on this issue?

@napaalm
Copy link

napaalm commented Apr 21, 2024

Whoops, maybe I pinged the wrong person: @kelvinfan001 @jlebon @dustymabe @dghubble is there anyone working on this?

@jlebon
Copy link
Member

jlebon commented Apr 22, 2024

Not currently, but I'm hoping that we'll be able to address this UX gap in a larger rework of Zincati.

Related: containers/bootc#337 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants