Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic runc exec failures #1884

Closed
liron-l opened this issue Sep 2, 2018 · 10 comments
Closed

Sporadic runc exec failures #1884

liron-l opened this issue Sep 2, 2018 · 10 comments

Comments

@liron-l
Copy link

liron-l commented Sep 2, 2018

As of runc 1.0.0-rc5+dev we've started noticing increase rate in sporadic errors during k8s liveness proves.
When the test fails, it produces the following error:

false Error:  OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "process_linux.go:138: adding pid 3101 to cgroups caused \"failed to write 3101 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/1d7b29b96dfd7bc97c6a2d6cbff82b00509cdcc4dbf2ac72ef5dd2bef9db7067/cgroup.procs: invalid argument\"": unknown

This might be related to #1326 and moby/moby#31230, but without a root cause or a resolution we can't be sure.
When playing with runc code, it appears that the error disappears if we retry running cgroups.EnterPid on failure (which might indicate this is a transient/race issue).

We've found that it's easier to reproduce this issue if we run perf together with excessive docker exec load (we tried this on ubuntu xenial and other OSs).

  1. Run perf
perf trace --no-syscalls --event 'sched:*'
  1. Run the following code (similar to how k8s is invoking the liveness tests)
package main

import (
	"fmt"
	"github.com/fsouza/go-dockerclient"
	"time"
	"strings"
)

func main() {
	client, err := docker.NewClient("unix:///var/run/docker.sock")
	if err != nil {
		panic(err)
	}
	go func() {
		for {
			fmt.Println("Running", time.Now().String())
			time.Sleep(time.Hour)
		}
	}()
	const container = "test-container"
	client.RemoveContainer(docker.RemoveContainerOptions{
		ID:    container,
		Force: true,
	})
	_, err = client.CreateContainer(docker.CreateContainerOptions{
		Name: container,
		Config: &docker.Config{
			Image:        "busybox",
			Cmd:          []string{"sleep", "10000"},
			Tty:          true,
		},
	})
	if err != nil {
		panic(err)
	}
	if err := client.StartContainer(container, nil); err != nil {
		panic(err)
	}
	var tty bool

	for {
		tty = !tty
		exec, err := client.CreateExec(docker.CreateExecOptions{AttachStdout: true, AttachStderr: true, Tty: tty, Container: container, Cmd: []string{"echo", "A"}})
		if err != nil {
			panic(err)
		}
		m := logger{}
		err = client.StartExec(exec.ID, docker.StartExecOptions{Tty: tty, OutputStream: m, ErrorStream: m})
		if err != nil && strings.Contains(err.Error(), "invalid") {
			fmt.Println(err.Error())
			panic(err)
		} else if err != nil {
			fmt.Println("err", err, time.Now())
		}
	}
}

type logger struct{}
func (m logger) Write(p []byte) (n int, err error) {
	if strings.Contains(string(p), "cgroup") {
		fmt.Println(time.Now().String(), "Error:", string(p))
	}
	return len(p), nil
}

Runc version

runc version 1.0.0-rc5+dev
commit: 69663f0bd4b60df09991c08812a60108003fa340-dirty
spec: 1.0.0
@liron-l
Copy link
Author

liron-l commented Sep 6, 2018

Hi guys,

We think it's a problem related to the fact that runc is written in go.

Specifically, we tracked the issue to the following condition in the cpu_cgroup_can_attach kernel function:

                /*
		 * Avoid calling sched_move_task() before wake_up_new_task()
		 * has happened. This would lead to problems with PELT, due to
		 * move wanting to detach+attach while we're not attached yet.
		 */
		if (task->state == TASK_NEW)
			ret = -EINVAL;

This check was added in kernel version v4.8.

What we see from the kernel state, is that the rejected pid belongs to a thread whose parent is the pid that runc is trying to add to the cgroup.proc.
This probably means that it's a go thread that is scheduled by the go runtime.

Here is a full example of the repro and kernel state:

adding pid 18413 to cgroups caused \"failed to write 18413 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/b906c3bcd1d0cbe999b9ba48bb07fc6cedbb05cc5a0fb5b15645c3f5660ef181/cgroup.procs: invalid argument\""

And the task states in the kernel are:

PPID PID TID Description
- 18412 18412 docker runc init (process that is spawned by runc)
18412 18413 18413 docker runc init (process that is added to the cgroup)
18413 18414 18413 thread that causes adding pid 18413 cgroup failure

@hqhq @crosbymichael:
There are few possible mitigations for this issue, e.g.,

  1. Check the threads state (from /proc) prior to adding the runc-init pid to the cgroup (not a perfect fix, but eliminates the problem with high probability)
  2. Write a small shim in c that will be added to the cgroup.

We would be happy to own this fix, just let us know if you have other ideas in mind.

@cyphar
Copy link
Member

cyphar commented Sep 10, 2018

Awesome. Another day, another reason why using Go was a mistake. 😉

In all seriousness, yeah this is a pretty bad issue. I think the easiest solution would be to just carry #1184 -- which includes a patch where the cgroup attachment code is done while runc init is still inside the C shim (which is necessary so that we can unshare(CLONE_NEWCGROUP) after the attach). This should solve the Go runtime thread problem.

@liron-l
Copy link
Author

liron-l commented Sep 11, 2018

Thanks @cyphar!
Is #1184 still active?
Will it help if we continue the remaining work or extract the synchronization mechanism to a seperate PR?

@cyphar
Copy link
Member

cyphar commented Sep 11, 2018

It isn't active, we'd need to carry it. To be honest, the patch was mostly done it just needed some review -- I had quite a few concerns in the early review process but I think the patch just stagnated. I can try to carry it this week and see if it solves the issue for you.

@liron-l
Copy link
Author

liron-l commented Sep 27, 2018

Hi @cyphar any updates about the fix? Can we assist in any-way?

@cyphar
Copy link
Member

cyphar commented Sep 28, 2018

Sorry, I've been working on some kernel patches related to container runtime security. I will work on carrying the PR next week.

@crosbymichael
Copy link
Member

I'm going to look into this today and see if we can get a smaller diff to fix the issues you are seeing.

@jaeco
Copy link

jaeco commented Oct 17, 2018

Hello, I have users that are experiencing pain from this issue as well, do we have any updates on a fix?

@crosbymichael
Copy link
Member

@jaeco I just submited a rebased PR to fix this issue. We were able to reproduce and I can confirm the fix resolves this issue.

@jaeco
Copy link

jaeco commented Oct 22, 2018

@jaeco I just submited a rebased PR to fix this issue. We were able to reproduce and I can confirm the fix resolves this issue.

Thank you so much!!

dims added a commit to dims/image-builder that referenced this issue Oct 14, 2019
Fix for opencontainers/runc#1884
is in opencontainers/runc#1916

- problem described in 1884 is independent of kubernetes version.
- happens with kernel >= v4.8
- fixed in runc version v1.0.0-rc6 and above ( opencontainers/runc@9a3a8a5 )
- which got pulled into containerd v1.3.0-beta.0 and above ( containerd/containerd@97dd5df#diff-4061fcef378a6d912e14e2ce162a1995)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants