Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

stage1: rework systemd service structure #1407

Merged
merged 1 commit into from
Sep 23, 2015

Conversation

iaguis
Copy link
Member

@iaguis iaguis commented Sep 15, 2015

This changes the way rkt starts apps in a pod.

The default.target has a Wants and After dependency on each app, making
sure they all start. Each app has a Wants dependency on an associated
reaper service that deals with writing the app's status exit. Each
reaper service has a Wants and After dependency with a shutdown service
that simply shuts down the pod.

The reaper services and the shutdown service all start at the beginning
but do nothing and remain after exit (with the RemainAfterExit flag). By
using the StopWhenUnneeded flag, whenever they stop being referenced,
they'll do the actual work via the ExecStop command.

This means that when an app service is stopped, its associated reaper
will run and will write its exit status to /rkt/status/${app} without
having to wait until the pod exits. When all apps' services stop, their
associated reaper services will also stop and will cease referencing the
shutdown service. This will cause the pod to exit.

A Conflicts dependency was also added between each reaper service and
the halt and poweroff targets (they are triggered when the pod is
stopped from the outside). This will activate all the reaper services
when one of the targets is activated, causing the exit statuses to be
saved and the pod to finish like it was described in the previous
paragraph.

rkt-systemd-nuclear

@iaguis
Copy link
Member Author

iaguis commented Sep 15, 2015

After this PR, when an app finishes its exit status will be written but it will not bring down the whole pod. I'm not sure how to do that yet with this new scheme and we should decide how to expose it to the user (if we want to).

Comments welcome :)

@vcaputo
Copy link
Contributor

vcaputo commented Sep 15, 2015

This looks good to me.

@yifan-gu
Copy link
Contributor

they are triggered when the pod is stopped from the outside

How can I trigger them?

@iaguis
Copy link
Member Author

iaguis commented Sep 15, 2015

How can I trigger them?

Starting rkt from a systemd unit file and stopping it, machinectl poweroff $MACHINE or sending SIGTERM to the rkt (systemd-nspawn) process.

@yifan-gu
Copy link
Contributor

@iaguis Thanks, tried sigterm, didn't work. Turns out the pid returned by rkt status is the sytemd process not the nspawn

@iaguis
Copy link
Member Author

iaguis commented Sep 15, 2015

@yifan-gu you can send SIGRTMIN+3 or SIGRTMIN+4 to systemd:

SIGRTMIN+3
   Halts the machine, starts the halt.target unit. This is mostly equivalent
   to systemctl start halt.target.

SIGRTMIN+4
   Powers off the machine, starts the poweroff.target unit. This is mostly
   equivalent to systemctl start poweroff.target.

http://www.freedesktop.org/software/systemd/man/systemd.html

@yifan-gu
Copy link
Contributor

@iaguis Nice, it works! LGTM

@jonboulle
Copy link
Contributor

I'm not sure how to do that yet with this new scheme and we should decide how to expose it to the user (if we want to).

I would like to have some plan here. No ideas for how we might implement it?

As far as how it's exposed to the user, I imagine it'll be part of the spec in the pod manifest, perhaps with a corresponding rkt run flag.

@jonboulle
Copy link
Contributor

BTW, does this fix #1365 ?

@iaguis
Copy link
Member Author

iaguis commented Sep 15, 2015

I would like to have some plan here. No ideas for how we might implement it?

I'll try to find way to implement it tomorrow.

BTW, does this fix #1365 ?

I can't reproduce it anymore with this rework.

@yifan-gu
Copy link
Contributor

@jonboulle As for the spec, we can just make something like the kubernetes' pod lifecycle.
But I noticed that even if Restart=always, the shutting down service will still be triggered.

@jonboulle
Copy link
Contributor

@iaguis awesome diagram btw, would love to add a similar one to the documentation :-)

@jonboulle jonboulle added this to the v0.9.0 milestone Sep 16, 2015
@iaguis
Copy link
Member Author

iaguis commented Sep 16, 2015

BTW, does this fix #1365 ?

I can't reproduce it anymore with this rework.

Actually I can still reproduce it 😞. I don't get it using src-v225 but I do if I use host-v225. This is very weird.

@steveej
Copy link
Contributor

steveej commented Sep 16, 2015

ACK that this doesn't fix host flavor with systemd v226. As discussed on IRC, the message

busybox.service: Cannot add dependency job, ignoring: Operation refused, unit may not be isolated.

is confusing because isolate is not used by the rkt services anymore. It's unclear what dependency is pulled in and why.

@iaguis iaguis force-pushed the dont-fear-the-reaper branch 3 times, most recently from 00134c4 to 0f81603 Compare September 16, 2015 14:55
@iaguis
Copy link
Member Author

iaguis commented Sep 16, 2015

Updated:

TODO:

  • Copy systemd-journald and its units in a better way (if we want to make it more generic instead of listing the files manually)
  • Add documentation
  • Find a proper name for the flag and add it to run-prepared

@yifan-gu
Copy link
Contributor

@iaguis @jonboulle Though a little bit about how we can implement kubernete's restart policy by using systemd service to run a rkt pod:

  • RestartPolicy = Never: Set [Service].Restart=no, and invoke rkt run with normal flags
  • RestartPolicy = Always: Set [Service].Restart=always, and invoke rkt run with --nuclear-exit. (maybe call it --exit-on-any?)
  • RestartPolicy = OnFailure: Set [Service].Restart=on-failure, and invoke rkt run with something like exit-on-failure.

This PR solves the first two options. For the last one, we would need to:

  • Add a flag, and add OnFailure to the app's service file.
  • Instead of execing the stage1, we need to wait the stage1, and then check each app's exit code, if any is non-zero, then return non-zero.

@yifan-gu
Copy link
Contributor

Other things we need to consider:
Each app can have a health prober that periodically probes the app's health, and will kill the app. Not sure how we can do with that.

(FWIW, for now in rkt/kubernetes, systemd's [Service].Restart is not being used. There is a loop that checking pod's state and restarting them according to the restart policy periodically, which I feel is suboptimal.

@yifan-gu
Copy link
Contributor

Thought more on the prober stuff. I think it can be solved by just killing the whole pod when any app fails the probing. But would need a way for rkt to return non-zero exit code. (Currently this can be achieved by killing the rkt process)

@iaguis
Copy link
Member Author

iaguis commented Sep 23, 2015

TestPodManifest #10 is failing. It tests a pod with two apps, one writes a value to a file and the other reads this value after 10 seconds. The problem is that the write app finishes writing and shuts down the pod so the read app will never actually read the file.

I did some tests with current master and it seems the pod exits when a single app fails or when all apps are finished. If an app exits with Success, the pod doesn't exit.

To preserve the current behavior we should add the mentioned line but changing ExecStop with OnFailure.

@iaguis iaguis force-pushed the dont-fear-the-reaper branch 2 times, most recently from f2ab380 to dab8f6f Compare September 23, 2015 09:37
@iaguis
Copy link
Member Author

iaguis commented Sep 23, 2015

Updated and green :)

@steveej
Copy link
Contributor

steveej commented Sep 23, 2015

Unimportant notes: I just tried to rethink this so that the shutdown.service wouldn't is uneeded, but it would probably be really ugly since the this additional service makes it easier to understand what's happening.

@steveej
Copy link
Contributor

steveej commented Sep 23, 2015

Unimportant notes: I just tried to rethink this so that the shutdown.service wouldn't is uneeded, but it would probably be really ugly. The additional service makes it easy to grasp the point, so
--> LGTM!

unitsPath := filepath.Join(common.Stage1RootfsPath(p.Root), unitsDir)
file, err := os.OpenFile(filepath.Join(unitsPath, "default.target"), os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0644)
if err != nil {
return fmt.Errorf("failed to create service unit file: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the caller also builds up an error message, this would print something like:

Failed to write default.target: failed to create service unit file: Permission denied

Can it be made shorter? Also, it's not a service unit file but a target..

@alban
Copy link
Member

alban commented Sep 23, 2015

A couple of small things, but it looks good otherwise :)

Do you want to add the documentation with the diagram in a separate PR or in this one?

This changes the way rkt starts apps in a pod.

The default.target has a Wants and After dependency on each app, making
sure they all start. Each app has a Wants dependency on an associated
reaper service that deals with writing the app's status exit. Each
reaper service has a Wants and After dependency with a shutdown service
that simply shuts down the pod.

The reaper services and the shutdown service all start at the beginning
but do nothing and remain after exit (with the RemainAfterExit flag). By
using the StopWhenUnneeded flag, whenever they stop being referenced,
they'll do the actual work via the ExecStop command.

This means that when an app service is stopped, its associated reaper
will run and will write its exit status to /rkt/status/${app} without
having to wait until the pod exits. When all apps' services stop, their
associated reaper services will also stop and will cease referencing the
shutdown service. This will cause the pod to exit.

A Conflicts dependency was also added between each reaper service and
the halt and poweroff targets (they are triggered when the pod is
stopped from the outside). This will activate all the reaper services
when one of the targets is activated, causing the exit statuses to be
saved and the pod to finish like it was described in the previous
paragraph.

For now we preserve the current rkt lifecycle by shutting down the pod
when the first app exits with failure.
@iaguis
Copy link
Member Author

iaguis commented Sep 23, 2015

Updated.

Do you want to add the documentation with the diagram in a separate PR or in this one?

Let's discuss the documentation in #1459

@alban
Copy link
Member

alban commented Sep 23, 2015

LGTM 👍

alban added a commit that referenced this pull request Sep 23, 2015
stage1: rework systemd service structure
@alban alban merged commit 01374f5 into rkt:master Sep 23, 2015
@jonboulle
Copy link
Contributor

please file a follow-up re: nuclear option?

@iaguis
Copy link
Member Author

iaguis commented Sep 23, 2015

#1461

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants