Add timeout while waiting for StartTransinetUnit completion signal #1754

vikaschoudhary16 · 2018-03-07T05:53:16Z

This PR adds one second timeout for waiting on channel for StartTransientUnit completion signal.

This channel was introduced to avoid a rare race between systemd and runc where systemd could delete pids cgroup created by runc. #1683

If for some reason signal is not received like the case here, runc will hang forever. Adding a timeout will help recover in such a case.

In the timeout handling, we have two options either return Error or continue. Since originally purpose of introducing channel was to avoid the race mentioned above, one second time duration should be enough in ensuring the same and to continue seems a better option of the two.

/cc @derekwaynecarr @sjenning

cyphar · 2018-03-07T08:53:59Z

libcontainer/cgroups/systemd/apply_systemd.go

-	<-statusChan
+	select {
+	case <-statusChan:
+	case <-time.After(time.Second):


Should have some sort of warning at least. I would also recommend investigating why we don't get the status from systemd. In the issue you said:

Not sure how dbus works from within the containerized env or how we might fix this.

I'm not sure what containerised environment you're using, but dbus should be running as a daemon inside whatever container you're spawning runc inside as it normally would on your host. Can you reproduce the issue using runc on the host?

@cyphar added warning.

what containerised environment ...

It occured in containerized OpenShift. I am trying to reproduce it on my local machine. Will update on that further.

I believe the way this has worked in the past is that the origin node container runs as privileged, which implies a shared IPC and PID namespace with the host such that the containerized node can get dbus signals from the host dbus.

…om dbus Signed-off-by: vikaschoudhary16 <choudharyvikas16@gmail.com>

sjenning · 2018-03-07T18:58:24Z

@cyphar we are trying to run down the root cause, but in the meantime, I think we should merge this. Even if we get the dbus issue figured out, I'm not sure if we should consider dbus reliable. If, for whatever reason, it drops the message or systemd fails to send the signal we expect, we are still hung. Said another way, this probably should have never been an unconditional wait on the channel with no timeout.

Do you agree?

mrunalp · 2018-03-07T21:04:06Z

LGTM

sjenning · 2018-03-07T22:33:06Z

@hqhq could we get lgtm on this since you approved the previous PR to which the fix applies?

hqhq · 2018-03-08T01:09:23Z

LGTM

cyphar · 2018-03-08T02:58:27Z

@sjenning

If you don't want to worry about what systemd does then you can just not use the systemd cgroup code (the whole thing is questionable because by design it works around systemd's lack of support of things we need for runc -- not to mention we've had nothing but problems from it historically because systemd loves to mess with cgroups you tell it about through TransientUnits).

IMHO if there's a DBus reliability issue within containers -- which shouldn't be the case and if it is that's a pretty major problem to sweep under the rug -- then continuing to use the systemd cgroup code and ignoring the problem makes little more sense than just using the cgroupfs code. But you're free to do whatever you like. The obvious problem is that now that the problem is "fixed" it's less likely someone will be bothered enough to debug what the actual issue was.

derekwaynecarr · 2018-03-08T14:52:25Z

i agree this should have had a timeout from its inception, i would like to understand the root cause.

@derekwaynecarr

Automatic merge from submit-queue. [3.9] UPSTREAM: opencontainers/runc: 1754: Add timeout while waiting for StartTransinetUnit completion signal master PR #18876 opencontainers/runc#1754 xref https://bugzilla.redhat.com/show_bug.cgi?id=1548358 Hold until upstream merge. @derekwaynecarr @vikaschoudhary16

@derekwaynecarr

Automatic merge from submit-queue (batch tested with PRs 18778, 18709, 18876, 18897, 18652). UPSTREAM: opencontainers/runc: 1754: Add timeout while waiting for StartTransinetUnit completion signal opencontainers/runc#1754 xref https://bugzilla.redhat.com/show_bug.cgi?id=1548358 Hold until upstream merge. @derekwaynecarr @vikaschoudhary16

The channel was introduced in opencontainers#1683 to work around a race condition. However, the check for error in StartTransientUnit ignores the error for an already existing unit, and in that case there will be no notification from DBus (so waiting on the channel will make it hang.) Later PR opencontainers#1754 added a timeout, which worked around the issue, but we can fix this correctly by only waiting on the channel when there is no error. Fix the code to do so. The timeout handling was kept, since there might be other cases where this situation occurs (https://bugzilla.redhat.com/show_bug.cgi?id=1548358 mentions calling this code from inside a container, it's unclear whether an existing container was in use or not, so not sure whether this would have fixed that bug as well.)

The channel was introduced in opencontainers#1683 to work around a race condition. However, the check for error in StartTransientUnit ignores the error for an already existing unit, and in that case there will be no notification from DBus (so waiting on the channel will make it hang.) Later PR opencontainers#1754 added a timeout, which worked around the issue, but we can fix this correctly by only waiting on the channel when there is no error. Fix the code to do so. The timeout handling was kept, since there might be other cases where this situation occurs (https://bugzilla.redhat.com/show_bug.cgi?id=1548358 mentions calling this code from inside a container, it's unclear whether an existing container was in use or not, so not sure whether this would have fixed that bug as well.) Signed-off-by: Filipe Brandenburger <filbranden@google.com>

So that, if a timeout happens and we decide to stop blocking on the operation, the writer will not block when they try to report the result of the operation. This should address Issue opencontainers#1780 and it's a follow up for PR opencontainers#1683, PR opencontainers#1754 and PR opencontainers#1772.

So that, if a timeout happens and we decide to stop blocking on the operation, the writer will not block when they try to report the result of the operation. This should address Issue opencontainers#1780 and it's a follow up for PR opencontainers#1683, PR opencontainers#1754 and PR opencontainers#1772. Signed-off-by: Filipe Brandenburger <filbranden@google.com>

PR opencontainers/runc#1754 works around an issue in manager.Apply(-1) that makes Kubelet startup hang when using systemd cgroup driver (by adding a timeout) and further PR opencontainers/runc#1772 fixes that bug by checking the proper error status before waiting on the channel. PR opencontainers/runc#1776 checks whether Delegate works in slices, which keeps libcontainer systemd cgroup driver working on systemd v237+. PR opencontainers/runc#1781 makes the channel buffered, so if we time out waiting on the channel, the updater will not block trying to it since there are no longer any consumers.

@derekwaynecarr

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Update libcontainer to include PRs with fixes to systemd cgroup driver **What this PR does / why we need it**: PR opencontainers/runc#1754 works around an issue in manager.Apply(-1) that makes Kubelet startup hang when using systemd cgroup driver (by adding a timeout) and further PR opencontainers/runc#1772 fixes that bug by checking the proper error status before waiting on the channel. PR opencontainers/runc#1776 checks whether Delegate works in slices, which keeps libcontainer systemd cgroup driver working on systemd v237+. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #61474 **Special notes for your reviewer**: /assign @derekwaynecarr cc @vikaschoudhary16 @sjenning @adelton @mrunalp **Release note**: ```release-note NONE ```

PR opencontainers/runc#1754 works around an issue in manager.Apply(-1) that makes Kubelet startup hang when using systemd cgroup driver (by adding a timeout) and further PR opencontainers/runc#1772 fixes that bug by checking the proper error status before waiting on the channel. PR opencontainers/runc#1776 checks whether Delegate works in slices, which keeps libcontainer systemd cgroup driver working on systemd v237+. PR opencontainers/runc#1781 makes the channel buffered, so if we time out waiting on the channel, the updater will not block trying to it since there are no longer any consumers.

opencontainers/runc#1683 opencontainers/runc#1754 opencontainers/runc#1772 opencontainers/runc#1781 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

vikaschoudhary16 force-pushed the add-timeout branch from 76f4814 to c39bc45 Compare March 7, 2018 08:24

cyphar reviewed Mar 7, 2018

View reviewed changes

Add timeout while waiting for StartTransinetUnit completion signal fr…

04e95b5

…om dbus Signed-off-by: vikaschoudhary16 <choudharyvikas16@gmail.com>

vikaschoudhary16 force-pushed the add-timeout branch from c39bc45 to 04e95b5 Compare March 7, 2018 10:12

hqhq merged commit 9facb87 into opencontainers:master Mar 8, 2018

filbranden mentioned this pull request Mar 30, 2018

Update libcontainer to include PRs with fixes to systemd cgroup driver kubernetes/kubernetes#61926

Merged

filbranden mentioned this pull request Mar 31, 2018

Fix systemd.Apply() to check for DBus error before waiting on a channel. #1772

Merged

filbranden mentioned this pull request Apr 14, 2018

Making systemd StartTransientUnit synchronous (mini post-mortem on that) #1780

Open

filbranden mentioned this pull request Apr 14, 2018

Make channel for StartTransientUnit buffered #1781

Merged

filbranden mentioned this pull request Jun 12, 2018

Use uint64 for resources to keep consistency with runtime-spec projectatomic/runc#10

Closed

mrunalp mentioned this pull request Jun 12, 2018

cgroups: Backport of upstream fixes around starting units projectatomic/runc#12

Merged

mrunalp mentioned this pull request Jun 12, 2018

cgroups: Backport of upstream fixes around starting units projectatomic/runc#13

Merged

kolyshkin mentioned this pull request Mar 23, 2023

runc systemd cgroup driver logic is wrong #3780

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timeout while waiting for StartTransinetUnit completion signal #1754

Add timeout while waiting for StartTransinetUnit completion signal #1754

vikaschoudhary16 commented Mar 7, 2018

cyphar Mar 7, 2018

vikaschoudhary16 Mar 7, 2018

sjenning Mar 7, 2018

sjenning commented Mar 7, 2018 •

edited

Loading

mrunalp commented Mar 7, 2018 •

edited by caniszczyk

Loading

sjenning commented Mar 7, 2018

hqhq commented Mar 8, 2018 •

edited by caniszczyk

Loading

cyphar commented Mar 8, 2018

derekwaynecarr commented Mar 8, 2018

Add timeout while waiting for StartTransinetUnit completion signal #1754

Add timeout while waiting for StartTransinetUnit completion signal #1754

Conversation

vikaschoudhary16 commented Mar 7, 2018

cyphar Mar 7, 2018

Choose a reason for hiding this comment

vikaschoudhary16 Mar 7, 2018

Choose a reason for hiding this comment

sjenning Mar 7, 2018

Choose a reason for hiding this comment

sjenning commented Mar 7, 2018 • edited Loading

mrunalp commented Mar 7, 2018 • edited by caniszczyk Loading

sjenning commented Mar 7, 2018

hqhq commented Mar 8, 2018 • edited by caniszczyk Loading

cyphar commented Mar 8, 2018

derekwaynecarr commented Mar 8, 2018

sjenning commented Mar 7, 2018 •

edited

Loading

mrunalp commented Mar 7, 2018 •

edited by caniszczyk

Loading

hqhq commented Mar 8, 2018 •

edited by caniszczyk

Loading