stage1: rework systemd service structure #1407

iaguis · 2015-09-15T20:02:43Z

This changes the way rkt starts apps in a pod.

The default.target has a Wants and After dependency on each app, making
sure they all start. Each app has a Wants dependency on an associated
reaper service that deals with writing the app's status exit. Each
reaper service has a Wants and After dependency with a shutdown service
that simply shuts down the pod.

The reaper services and the shutdown service all start at the beginning
but do nothing and remain after exit (with the RemainAfterExit flag). By
using the StopWhenUnneeded flag, whenever they stop being referenced,
they'll do the actual work via the ExecStop command.

This means that when an app service is stopped, its associated reaper
will run and will write its exit status to /rkt/status/${app} without
having to wait until the pod exits. When all apps' services stop, their
associated reaper services will also stop and will cease referencing the
shutdown service. This will cause the pod to exit.

A Conflicts dependency was also added between each reaper service and
the halt and poweroff targets (they are triggered when the pod is
stopped from the outside). This will activate all the reaper services
when one of the targets is activated, causing the exit statuses to be
saved and the pod to finish like it was described in the previous
paragraph.

iaguis · 2015-09-15T20:04:41Z

After this PR, when an app finishes its exit status will be written but it will not bring down the whole pod. I'm not sure how to do that yet with this new scheme and we should decide how to expose it to the user (if we want to).

Comments welcome :)

vcaputo · 2015-09-15T21:35:16Z

This looks good to me.

yifan-gu · 2015-09-15T22:04:35Z

they are triggered when the pod is stopped from the outside

How can I trigger them?

iaguis · 2015-09-15T22:09:16Z

How can I trigger them?

Starting rkt from a systemd unit file and stopping it, machinectl poweroff $MACHINE or sending SIGTERM to the rkt (systemd-nspawn) process.

yifan-gu · 2015-09-15T22:14:19Z

@iaguis Thanks, tried sigterm, didn't work. Turns out the pid returned by rkt status is the sytemd process not the nspawn

iaguis · 2015-09-15T22:17:49Z

@yifan-gu you can send SIGRTMIN+3 or SIGRTMIN+4 to systemd:

SIGRTMIN+3
   Halts the machine, starts the halt.target unit. This is mostly equivalent
   to systemctl start halt.target.

SIGRTMIN+4
   Powers off the machine, starts the poweroff.target unit. This is mostly
   equivalent to systemctl start poweroff.target.

http://www.freedesktop.org/software/systemd/man/systemd.html

yifan-gu · 2015-09-15T22:34:14Z

@iaguis Nice, it works! LGTM

jonboulle · 2015-09-15T22:39:16Z

I'm not sure how to do that yet with this new scheme and we should decide how to expose it to the user (if we want to).

I would like to have some plan here. No ideas for how we might implement it?

As far as how it's exposed to the user, I imagine it'll be part of the spec in the pod manifest, perhaps with a corresponding rkt run flag.

jonboulle · 2015-09-15T22:42:55Z

BTW, does this fix #1365 ?

iaguis · 2015-09-15T22:46:24Z

I would like to have some plan here. No ideas for how we might implement it?

I'll try to find way to implement it tomorrow.

BTW, does this fix #1365 ?

I can't reproduce it anymore with this rework.

yifan-gu · 2015-09-16T00:23:54Z

@jonboulle As for the spec, we can just make something like the kubernetes' pod lifecycle.
But I noticed that even if Restart=always, the shutting down service will still be triggered.

jonboulle · 2015-09-16T01:28:16Z

@iaguis awesome diagram btw, would love to add a similar one to the documentation :-)

iaguis · 2015-09-16T09:32:26Z

BTW, does this fix #1365 ?

I can't reproduce it anymore with this rework.

Actually I can still reproduce it 😞. I don't get it using src-v225 but I do if I use host-v225. This is very weird.

steveej · 2015-09-16T12:31:41Z

ACK that this doesn't fix host flavor with systemd v226. As discussed on IRC, the message

busybox.service: Cannot add dependency job, ignoring: Operation refused, unit may not be isolated.

is confusing because isolate is not used by the rkt services anymore. It's unclear what dependency is pulled in and why.

iaguis · 2015-09-16T14:58:54Z

Updated:

It now copies systemd-journald in flavor host. This seems to fix usr_from_host: broken container exit behavior with systemd >= 224 #1365 (thanks @steveej for investigating it)
Adds a new --nuclear-exit (provisional name 😜) flag to rkt run that exits the pod when an app exits

TODO:

Copy systemd-journald and its units in a better way (if we want to make it more generic instead of listing the files manually)
Add documentation
Find a proper name for the flag and add it to run-prepared

yifan-gu · 2015-09-16T17:44:23Z

@iaguis @jonboulle Though a little bit about how we can implement kubernete's restart policy by using systemd service to run a rkt pod:

RestartPolicy = Never: Set [Service].Restart=no, and invoke rkt run with normal flags
RestartPolicy = Always: Set [Service].Restart=always, and invoke rkt run with --nuclear-exit. (maybe call it --exit-on-any?)
RestartPolicy = OnFailure: Set [Service].Restart=on-failure, and invoke rkt run with something like exit-on-failure.

This PR solves the first two options. For the last one, we would need to:

Add a flag, and add OnFailure to the app's service file.
Instead of execing the stage1, we need to wait the stage1, and then check each app's exit code, if any is non-zero, then return non-zero.

yifan-gu · 2015-09-16T17:50:49Z

Other things we need to consider:
Each app can have a health prober that periodically probes the app's health, and will kill the app. Not sure how we can do with that.

(FWIW, for now in rkt/kubernetes, systemd's [Service].Restart is not being used. There is a loop that checking pod's state and restarting them according to the restart policy periodically, which I feel is suboptimal.

yifan-gu · 2015-09-16T19:11:57Z

Thought more on the prober stuff. I think it can be solved by just killing the whole pod when any app fails the probing. But would need a way for rkt to return non-zero exit code. (Currently this can be achieved by killing the rkt process)

iaguis · 2015-09-23T09:32:06Z

TestPodManifest #10 is failing. It tests a pod with two apps, one writes a value to a file and the other reads this value after 10 seconds. The problem is that the write app finishes writing and shuts down the pod so the read app will never actually read the file.

I did some tests with current master and it seems the pod exits when a single app fails or when all apps are finished. If an app exits with Success, the pod doesn't exit.

To preserve the current behavior we should add the mentioned line but changing ExecStop with OnFailure.

iaguis · 2015-09-23T09:57:23Z

Updated and green :)

steveej · 2015-09-23T13:17:28Z

Unimportant notes: I just tried to rethink this so that the shutdown.service wouldn't is uneeded, but it would probably be really ugly since the this additional service makes it easier to understand what's happening.

steveej · 2015-09-23T13:18:32Z

Unimportant notes: I just tried to rethink this so that the shutdown.service wouldn't is uneeded, but it would probably be really ugly. The additional service makes it easy to grasp the point, so
--> LGTM!

alban · 2015-09-23T13:35:59Z

stage1/init/pod.go

+	unitsPath := filepath.Join(common.Stage1RootfsPath(p.Root), unitsDir)
+	file, err := os.OpenFile(filepath.Join(unitsPath, "default.target"), os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0644)
+	if err != nil {
+		return fmt.Errorf("failed to create service unit file: %v", err)


Given that the caller also builds up an error message, this would print something like:

Failed to write default.target: failed to create service unit file: Permission denied

Can it be made shorter? Also, it's not a service unit file but a target..

alban · 2015-09-23T13:46:49Z

A couple of small things, but it looks good otherwise :)

Do you want to add the documentation with the diagram in a separate PR or in this one?

This changes the way rkt starts apps in a pod. The default.target has a Wants and After dependency on each app, making sure they all start. Each app has a Wants dependency on an associated reaper service that deals with writing the app's status exit. Each reaper service has a Wants and After dependency with a shutdown service that simply shuts down the pod. The reaper services and the shutdown service all start at the beginning but do nothing and remain after exit (with the RemainAfterExit flag). By using the StopWhenUnneeded flag, whenever they stop being referenced, they'll do the actual work via the ExecStop command. This means that when an app service is stopped, its associated reaper will run and will write its exit status to /rkt/status/${app} without having to wait until the pod exits. When all apps' services stop, their associated reaper services will also stop and will cease referencing the shutdown service. This will cause the pod to exit. A Conflicts dependency was also added between each reaper service and the halt and poweroff targets (they are triggered when the pod is stopped from the outside). This will activate all the reaper services when one of the targets is activated, causing the exit statuses to be saved and the pod to finish like it was described in the previous paragraph. For now we preserve the current rkt lifecycle by shutting down the pod when the first app exits with failure.

iaguis · 2015-09-23T14:54:38Z

Updated.

Do you want to add the documentation with the diagram in a separate PR or in this one?

Let's discuss the documentation in #1459

alban · 2015-09-23T15:27:44Z

LGTM 👍

stage1: rework systemd service structure

jonboulle · 2015-09-23T15:57:16Z

please file a follow-up re: nuclear option?

iaguis · 2015-09-23T16:53:30Z

#1461

iaguis mentioned this pull request Sep 15, 2015

stage1/reaper: make reaper to reap when each exits. #1396

Closed

jonboulle added needs/review component/stage1 kind/cleanup kind/design labels Sep 15, 2015

iaguis force-pushed the dont-fear-the-reaper branch from 4f3173d to 0a8a587 Compare September 15, 2015 20:46

yifan-gu mentioned this pull request Sep 15, 2015

tests: check exit status on a 3-app pod #1405

Merged

jonboulle added the priority/P1 label Sep 16, 2015

jonboulle added this to the v0.9.0 milestone Sep 16, 2015

iaguis force-pushed the dont-fear-the-reaper branch 3 times, most recently from 00134c4 to 0f81603 Compare September 16, 2015 14:55

iaguis force-pushed the dont-fear-the-reaper branch 2 times, most recently from f2ab380 to dab8f6f Compare September 23, 2015 09:37

alban added the needs/review label Sep 23, 2015

alban reviewed Sep 23, 2015
View reviewed changes

alban added needs/some-changes and removed needs/review labels Sep 23, 2015

iaguis force-pushed the dont-fear-the-reaper branch from dab8f6f to 1a2bcb6 Compare September 23, 2015 14:40

iaguis mentioned this pull request Sep 23, 2015

Documentation: document systemd stage1 #1459

Closed

iaguis added needs/review and removed needs/some-changes labels Sep 23, 2015

alban added a commit that referenced this pull request Sep 23, 2015

Merge pull request #1407 from iaguis/dont-fear-the-reaper

01374f5

stage1: rework systemd service structure

alban merged commit 01374f5 into rkt:master Sep 23, 2015

iaguis mentioned this pull request Sep 23, 2015

rkt: add an option to always exit the pod when an app exits (--nuclear-exit) #1461

Open

alban mentioned this pull request Sep 23, 2015

rkt run: exit status not propagated from apps #1460

Closed

This was referenced Sep 24, 2015

spec: define pod lifecycle appc/spec#276

Open

spec: add ExitPolicy type in pod manifest. appc/spec#500

Closed

steveej mentioned this pull request Oct 4, 2015

usr_from_host: broken container exit behavior with systemd >= 224 #1365

Closed

yifan-gu mentioned this pull request Oct 21, 2015

rkt/api_service: add implementation for basic api service interfaces. #1508

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stage1: rework systemd service structure #1407

stage1: rework systemd service structure #1407

iaguis commented Sep 15, 2015

iaguis commented Sep 15, 2015

vcaputo commented Sep 15, 2015

yifan-gu commented Sep 15, 2015

iaguis commented Sep 15, 2015

yifan-gu commented Sep 15, 2015

iaguis commented Sep 15, 2015

yifan-gu commented Sep 15, 2015

jonboulle commented Sep 15, 2015

jonboulle commented Sep 15, 2015

iaguis commented Sep 15, 2015

yifan-gu commented Sep 16, 2015

jonboulle commented Sep 16, 2015

iaguis commented Sep 16, 2015

steveej commented Sep 16, 2015

iaguis commented Sep 16, 2015

yifan-gu commented Sep 16, 2015

yifan-gu commented Sep 16, 2015

yifan-gu commented Sep 16, 2015

iaguis commented Sep 23, 2015

iaguis commented Sep 23, 2015

steveej commented Sep 23, 2015

steveej commented Sep 23, 2015

alban Sep 23, 2015

alban commented Sep 23, 2015

iaguis commented Sep 23, 2015

alban commented Sep 23, 2015

jonboulle commented Sep 23, 2015

iaguis commented Sep 23, 2015

stage1: rework systemd service structure #1407

stage1: rework systemd service structure #1407

Conversation

iaguis commented Sep 15, 2015

iaguis commented Sep 15, 2015

vcaputo commented Sep 15, 2015

yifan-gu commented Sep 15, 2015

iaguis commented Sep 15, 2015

yifan-gu commented Sep 15, 2015

iaguis commented Sep 15, 2015

yifan-gu commented Sep 15, 2015

jonboulle commented Sep 15, 2015

jonboulle commented Sep 15, 2015

iaguis commented Sep 15, 2015

yifan-gu commented Sep 16, 2015

jonboulle commented Sep 16, 2015

iaguis commented Sep 16, 2015

steveej commented Sep 16, 2015

iaguis commented Sep 16, 2015

yifan-gu commented Sep 16, 2015

yifan-gu commented Sep 16, 2015

yifan-gu commented Sep 16, 2015

iaguis commented Sep 23, 2015

iaguis commented Sep 23, 2015

steveej commented Sep 23, 2015

steveej commented Sep 23, 2015

alban Sep 23, 2015

Choose a reason for hiding this comment

alban commented Sep 23, 2015

iaguis commented Sep 23, 2015

alban commented Sep 23, 2015

jonboulle commented Sep 23, 2015

iaguis commented Sep 23, 2015