Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to reboot the machine after workflow is finished #71

Closed
invidian opened this issue Apr 22, 2020 · 12 comments
Closed

Add ability to reboot the machine after workflow is finished #71

invidian opened this issue Apr 22, 2020 · 12 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@invidian
Copy link
Contributor

invidian commented Apr 22, 2020

For workflows, which provision the OS, it would be nice if the workflow itself could reboot the machine, after it's done, so the machine can boot itself into target OS, so the upper orchestration system (e.g. person who monitors provisioning process, some kind of logic which use IPMI etc.) don't need to care about that.

Things to consider:

  • worker can be part of multiple workflows. Perhaps reboot should only happen when all workflows are successfully finished.
  • perhaps workflow could indicate, that after it's finished, the reboot is needed e.g. by setting reboot parameter to true.
  • the action or task can't trigger a reboot by itself, as this will shut down the worker and it won't be able to report that reboot task succeeded
@parauliya parauliya added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 30, 2020
@thebsdbox thebsdbox added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jul 21, 2020
@rgl
Copy link
Contributor

rgl commented May 25, 2021

it seems that we now have a documented way to do a reboot from an action at https://docs.tinkerbell.org/actions/action-architecture/#namespace:

When an action attempts to do these steps in a container in its own namespace, nothing will occur as PID 1 is usually the process in the action container. To allow the expected behaviour an action can use pid: host in its configuration, this will mean that the action processes will be amongst all of the processes on the host itself (including the "real" PID 1). With the action in the host process ID namespace both a reboot or kexec will be able to work as expected.

It this issue about improving on that?

@thebsdbox
Copy link
Contributor

This is fixed in tink-worker. This can probably be closed! 😀

@rgl
Copy link
Contributor

rgl commented May 25, 2021

@thebsdbox, by fixed, you mean using an action with pid: host?

having a docs example on how to reboot from a workflow would also be really nice :-)

I found a reboot example at https://docs.tinkerbell.org/deploying-operating-systems/examples-win/#creating-a-reboot-action-dockerfile:

FROM busybox
ENTRYPOINT [ "touch", "/worker/reboot" ]

is that it? we just need to create a new file named /worker/reboot?

@rgl
Copy link
Contributor

rgl commented May 26, 2021

Creating a file named /worker/reboot does not trigger a reboot from tink-worker:

Screenshot_rpi-tinkerbell-vagrant_bios_worker_2021-05-26_09:12:14

Here's the workflow status:

+----------------------+--------------------------------------+
| FIELD NAME           | VALUES                               |
+----------------------+--------------------------------------+
| Workflow ID          | be378bb1-bdf9-11eb-9be0-0242ac120005 |
| Workflow Progress    | 100%                                 |
| Current Task         | hello-world                          |
| Current Action       | reboot                               |
| Current Worker       | 00000000-0000-4000-8000-080027000001 |
| Current Action State | STATE_SUCCESS                        |
+----------------------+--------------------------------------+
+--------------------------------------+-------------+-------------+----------------+---------------------------------+---------------+
| WORKER ID                            | TASK NAME   | ACTION NAME | EXECUTION TIME | MESSAGE                         | ACTION STATUS |
+--------------------------------------+-------------+-------------+----------------+---------------------------------+---------------+
| 00000000-0000-4000-8000-080027000001 | hello-world | reboot      |              0 | Started execution               | STATE_RUNNING |
| 00000000-0000-4000-8000-080027000001 | hello-world | reboot      |              0 | finished execution successfully | STATE_SUCCESS |
+--------------------------------------+-------------+-------------+----------------+---------------------------------+---------------+

@thebsdbox
Copy link
Contributor

Ah this needs hook.. hook has the logic to watch for the reboot.

@tstromberg tstromberg added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Aug 27, 2021
@displague
Copy link
Member

displague commented Oct 28, 2021

Can we use sysrq-r from an action? https://hub.docker.com/r/mlafeldt/sysrq/ for example.

the action or task can't trigger a reboot by itself, as this will shut down the worker and it won't be able to report that reboot task succeeded

Does the action need to be Tinkerbell specific and act as the worker to signal success?

@crayzeigh crayzeigh added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Mar 8, 2022
@double-p
Copy link

double-p commented Apr 6, 2022

Built a docker image as per the example @rgl mentioned here already to no avail:

The "touch" is going nowhere and thus the rebootWatch() never fires.

A manual touch in the getty container to "/run/worker/reboot" works, so the watch is active. Just looks the volume mapping is wrong? (/worker:/worker)

Edit: it works; just the workflow was hanging somehow. recreated that and works as advertised:
-build docker image as in the windows example
-tag+push to local registry
-add the action as in the same example

profi...reboot :)

@yeahdongcn
Copy link

  - name: "reboot into Windows"
    image: reboot:latest
    timeout: 90
    volumes:
    - /worker:/worker

I encountered the same issue in rebooting into Windows, the action failed (STATE_FAILED). Is there any place I can lookup for the error message?

@yeahdongcn
Copy link

  - name: "reboot into Windows"
    image: reboot:latest
    timeout: 90
    volumes:
    - /worker:/worker

I encountered the same issue in rebooting into Windows, the action failed (STATE_FAILED). Is there any place I can lookup for the error message?

It turns out the document is incorrect. I just sent out a PR to fix it.

@chrisdoherty4 chrisdoherty4 self-assigned this May 8, 2023
@chrisdoherty4
Copy link
Member

We intend on drawing up a proposal for embedding restart capabilities into workflows so we don't need to rely on actions. This will compliment a want to see workflows consistently transition to an end state which doesn't happen if the restart beats the restart actions update currently.

@chrisdoherty4
Copy link
Member

tinkerbell/roadmap#29 will see this come to fruition.

@jacobweinstock
Copy link
Member

While tinkerbell/roadmap#29 will add builtin capabilities for rebooting, https://github.com/jacobweinstock/waitdaemon can achieve this from an action and still allow the Workflow to report successful.

I'm going to close this. If https://github.com/jacobweinstock/waitdaemon is not an acceptable solution please watch tinkerbell/roadmap#29.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests