Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA-2135: Aktualizr's config and recipe to auto reboot just after update #486

Merged
merged 3 commits into from
Feb 25, 2019

Conversation

mike-sul
Copy link
Collaborator

No description provided.

Signed-off-by: Mike Sul <mykhaylo.sul@innoteka.com>
@lbonn
Copy link
Contributor

lbonn commented Feb 20, 2019

It looks good I think but it will need to have an update of the aktualizr recipe's git commit before merging.

So I will do a bit of manual testing first, then we can merge advancedtelematic/aktualizr#1098 when we'll clear all objections. And finally, this PR should be updated with the new (master) revision.

Mike Sul and others added 2 commits February 21, 2019 15:02
@pattivacek
Copy link
Collaborator

Not a big deal, but in general I prefer rebasing instead of merging. I'll run oe-selftest soon and give the feature a manual test, but it looks fine!

@mike-sul
Copy link
Collaborator Author

yes, agree on rebasing, just screwed/forgot in this case.

Copy link
Collaborator

@pattivacek pattivacek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passes oe-selftest, but I still want to manually test it.

@pattivacek
Copy link
Collaborator

Manual test worked as expected. However, I'm seeing two issues in the log that are confusing:

  1. I'm seeing messages like Target Target(qemux86-64-9ed7ff8877a57946bdb7cfca5a61752f1776028b5e99cac2cd4754a60df4f7b6 ecu_identifiers: (f8f326570855e0e5cd32bbba3a6f565bddb830ee880b17e4a6d4ec4f1c7be61f) length:0 hashes: (Hash: 9ED7FF8877A57946BDB7CFCA5A61752F1776028B5E99CAC2CD4754A60DF4F7B6, )) has unknown ECU ID when I get the got InstallTargetComplete event message, and yet the installation appears to succeed. Any idea what's going on?
  2. When auto-reboot is disabled, after got AllInstallsComplete event, I see got PutManifestComplete event and then a repeated could not put manifest. That's pretty unhelpful. We shouldn't even be trying to put the manifest until we've rebooted, and we should perhaps log that that's what is necessary.

And when we print these got xyz event messages, we should probably indicate if the action was actually successful.

We can merge this PR as is if you'd like, but we should fix these logging issues before updating the other release branches of meta-updater.

@lbonn
Copy link
Contributor

lbonn commented Feb 25, 2019

Those two are preexisting problems:

  1. came in Complain when tries to install target with unknown ECU ID aktualizr#1076: the target update for the primary shows up in the loop looking for secondary updates and is flagged as an error
  2. is my doing when I masked manifest sends before reboot.

Agree that we should review these but I would merge this PR now as I don't think they are newly introduced. I also think (would need to check to be sure) they already show up in the release branches.

@pattivacek
Copy link
Collaborator

@lbonn I'll defer to you on if you'd like to merge it, then. I'm fine either way. I did ask @mike-sul to look into these two issues in your absence, though, so I appreciate your insight on where they came from!

@lbonn lbonn merged commit 63b03d7 into master Feb 25, 2019
@lbonn lbonn deleted the feat/OTA-2135/auto-reboot branch February 25, 2019 10:23
@agners
Copy link
Contributor

agners commented Feb 26, 2019

Probably a bit late, but another elegant solution is to use a systemd path unit:
toradex/meta-toradex-torizon@3696923#diff-ed1c9fe3586c62a26812d226e5695ea9

The reboot is instant since systemd uses inotify...

@lbonn
Copy link
Contributor

lbonn commented Feb 26, 2019

Oh cool, that's interesting!

However we persist some state after creating this file, so wouldn't it potentially race with the reboot process?

@agners
Copy link
Contributor

agners commented Feb 26, 2019

Uh, yes, that definitely races... I guess in practise there is a slight delay since systemd activates the service with the same name (in the above example ostree-pending-reboot.service), which then calls systemctl again with --force reboot, which kills all running processes and reboots immediately (I assume SIGKILL?). But that might happen before aktualizr manages to persist all state.

But why is aktualizr persisting state after creating that file? That seems racy no matter what reboot mechanism is watching that file...

@lbonn
Copy link
Contributor

lbonn commented Feb 27, 2019

It was originally not intended to be watched by an external process but only checked at aktualizr startup:

  1. the sentinel file is created on a volatile filesystem
  2. a flag is set in persistent storage that indicates aktualizr has to wait for a reboot, so that it can finalize an installation in this case

So, when we read the flag at launch: if the file is here, aktualizr was just restarted. If it's not here, the system has most likely been rebooted.

Maybe we named this file poorly but I think the order is correct for our use case.

However, we could use another file to trigger the reboot through systemd instead of having this platform-specific logic in the C++ app.

@agners
Copy link
Contributor

agners commented Feb 27, 2019

Hm, I see... Yeah I guess ideally that needs to be documented, that it's internal state...

I like the idea of having a file to signal that reboot is required. This allows e.g. a UI process to monitor for it too, and e.g. print a user visible hint that a reboot is required (at earliest convince...). With systemd path unit its actually quite easy to achieve automatic reboot, however I can imagine that other init systems that might be not that straight forward.

I see the new reboot implementation directly calls the reboot syscall, which bypasses the init system completely. I am not sure if that is very safe to use in practise since data might get lost. I feel typically we want the init system to be notified. There are quite some standard commands to do that, e.g. executing init 6 or reboot work on systemd as well as on sysv systems. Maybe we should introduce a reboot command setting and call this command?

Btw, what I also realized when using OSTree, after the update is applied, changes to /etc can/do get lost! So a system needs to prevent any reconfiguration (which needs changes to /etc) after applying an update.

@agners
Copy link
Contributor

agners commented Feb 27, 2019

Just looked a bit closer into aktualizr, and see now that the SotaUpdateClient class actually sets the state of the update to InstalledVersionUpdateMode::kPending after OstreeManager::install completes, which creates said need_reboot file. So yes, this definitely seems not safe.

However, that said, from what I understand aktualizr tries to do updates such that in any point in time a power cut can happen, the system should recover from any such state...?

@lbonn
Copy link
Contributor

lbonn commented Feb 27, 2019

Yes, if a power cut occurs it should either run the old or the new version.

The danger here would not be bricking the device but wrongly reporting state changes to the backend, which should be an error that can be recovered from.

@ricardosalveti
Copy link
Contributor

Hm, I see... Yeah I guess ideally that needs to be documented, that it's internal state...

+1

I like the idea of having a file to signal that reboot is required. This allows e.g. a UI process to monitor for it too, and e.g. print a user visible hint that a reboot is required (at earliest convince...). With systemd path unit its actually quite easy to achieve automatic reboot, however I can imagine that other init systems that might be not that straight forward.

I see the new reboot implementation directly calls the reboot syscall, which bypasses the init system completely. I am not sure if that is very safe to use in practise since data might get lost. I feel typically we want the init system to be notified. There are quite some standard commands to do that, e.g. executing init 6 or reboot work on systemd as well as on sysv systems. Maybe we should introduce a reboot command setting and call this command?

I personally prefer the reboot to be managed by the init system, as then the user can easily add other hooks that can be executed during the shutdown process. Having in systemd also allows the user to set up a systemd timer, and only reboot at an specific time (e.g. early morning).

Btw, what I also realized when using OSTree, after the update is applied, changes to /etc can/do get lost! So a system needs to prevent any reconfiguration (which needs changes to /etc) after applying an update.

This can probably be improved by having a systemd job that performs another /etc merge during the shutdown process.

@ricardosalveti
Copy link
Contributor

This can probably be improved by having a systemd job that performs another /etc merge during the shutdown process.

Relevant upstream issues:

@lbonn
Copy link
Contributor

lbonn commented Feb 27, 2019

Thank you both for your feedback! I also agree it would fit better on the init system's side as it gives more control to the user.
And it would probably make sense to change the name of this sentinel file, to make its purpose less ambiguous.

To give a bit of context, we've added this reboot-trigger feature for our testing needs without really focusing on production use at the moment (also why it's disabled by default). But since you've raised the issue, we can probably take another go and polish it a bit.

@liuming50
Copy link
Contributor

Hi,

I have created a pull request to introduce "/sbin/init 6" as the way to gracefully reboot, please kindly help me review: advancedtelematic/aktualizr#1148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants