Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] make unittest and make integration broken on local machines #3955

Closed
cyphar opened this issue Aug 2, 2023 · 11 comments
Closed

[ci] make unittest and make integration broken on local machines #3955

cyphar opened this issue Aug 2, 2023 · 11 comments
Labels

Comments

@cyphar
Copy link
Member

cyphar commented Aug 2, 2023

Description

It seems that some aspect of the cgroup setup for integration tests was broken for make integration and make unittest:

not ok 11 runc create (limits + cgrouppath + permission on the cgroup dir) succeeds
# (from function `check_cgroup_value' in file tests/integration/helpers.bash, line 267,
#  in test file tests/integration/cgroups.bats, line 56)
#   `check_cgroup_value "cgroup.controllers" "$(cat /sys/fs/cgroup/cgroup.controllers)"' failed
# runc spec (status=0):
#
# runc run -d --console-socket /tmp/bats-run-Q1ppq1/runc.cCqphr/tty/sock test_cgroups_permissions (status=0):
#
# current cpuset cpu pids !? cpuset cpu io memory hugetlb pids rdma misc
ok 12 runc exec (limits + cgrouppath + permission on the cgroup dir) succeeds
not ok 13 runc exec (cgroup v2 + init process in non-root cgroup) succeeds
# (in test file tests/integration/cgroups.bats, line 86)
#   `[[ ${lines[0]} == *"memory"* ]]' failed
# runc spec (status=0):
#
# runc run -d --console-socket /tmp/bats-run-Q1ppq1/runc.ZGbeOZ/tty/sock test_cgroups_group (status=0):
#
# runc exec test_cgroups_group cat /sys/fs/cgroup/cgroup.controllers (status=0):
# cpuset cpu pids
ok 14 runc run (cgroup v1 + unified resources should fail) # skip test requires cgroups_v1
not ok 15 runc run (blkio weight)
# (in test file tests/integration/cgroups.bats, line 142)
#   `[ "$status" -eq 0 ]' failed
# runc spec (status=0):
#
# runc run -d --console-socket /tmp/bats-run-Q1ppq1/runc.oXmFME/tty/sock test_cgroups_unified (status=1):
# time="2023-08-02T01:35:37Z" level=warning msg="unable to get oom kill count" error="openat2 /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-22074/memory.events: no such
file or directory"
# time="2023-08-02T01:35:37Z" level=error msg="runc run failed: unable to start container process: unable to apply cgroup configuration: cannot enter cgroupv2 \"/sys/fs/cgroup/runc
-cgroups-integration-test\" with domain controllers -- it is in an invalid state"
# rmdir: failed to remove '/sys/fs/cgroup//runc-cgroups-integration-test': No such file or directory

(Most of the tests fail.)

Steps to reproduce the issue

  1. make unittest or make integration

Describe the results you received and expected

Tests should succeed on main, as per CI. They fail, as above.

What version of runc are you using?

main

Host OS information

NAME="openSUSE Tumbleweed"
# VERSION="20230731"
ID="opensuse-tumbleweed"
ID_LIKE="opensuse suse"
VERSION_ID="20230731"
PRETTY_NAME="openSUSE Tumbleweed"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:tumbleweed:20230731"
BUG_REPORT_URL="https://bugzilla.opensuse.org"
SUPPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org"
DOCUMENTATION_URL="https://en.opensuse.org/Portal:Tumbleweed"
LOGO="distributor-logo-Tumbleweed"

Host kernel information

Linux senku 6.3.9-1-default #1 SMP PREEMPT_DYNAMIC Thu Jun 22 03:53:43 UTC 2023 (0df701d) x86_64 x86_64 x86_64 GNU/Linux

@cyphar cyphar added the area/ci label Aug 2, 2023
@kolyshkin
Copy link
Contributor

This is probably because you're not running docker/podman as root. Not all cgroup controllers are available for docker/podman this way.

Something like this (taken from Vagrantfile.fedora may help:

# Delegate cgroup v2 controllers to rootless user via --systemd-cgroup
mkdir -p /etc/systemd/system/user@.service.d
cat > /etc/systemd/system/user@.service.d/delegate.conf << EOF
[Service]
# default: Delegate=pids memory
# NOTE: delegation of cpuset requires systemd >= 244 (Fedora >= 32, Ubuntu >= 20.04).
Delegate=yes
EOF
systemctl daemon-reload

But maybe it's (also?) something else. Will look tomorrow.

@cyphar
Copy link
Member Author

cyphar commented Aug 2, 2023

My dockerd is definitely running as root, and we have Delegate=yes in the docker.service setup for openSUSE.

@kolyshkin
Copy link
Contributor

Reproduced locally (very different setup from reporter's -- Fedora, Podman, sudo):

$ sudo make shell
....
root@38bc99e50653:/go/src/github.com/opencontainers/runc# cat /sys/fs/cgroup/cgroup.controllers 
cpuset cpu io memory pids
root@38bc99e50653:/go/src/github.com/opencontainers/runc# cat /sys/fs/cgroup/cgroup.subtree_control
root@38bc99e50653:/go/src/github.com/opencontainers/runc# echo +cpuset > /sys/fs/cgroup/cgroup.subtree_control
root@38bc99e50653:/go/src/github.com/opencontainers/runc# cat /sys/fs/cgroup/cgroup.subtree_control
cpuset
root@38bc99e50653:/go/src/github.com/opencontainers/runc# echo +cpu > /sys/fs/cgroup/cgroup.subtree_control
root@38bc99e50653:/go/src/github.com/opencontainers/runc# echo +io > /sys/fs/cgroup/cgroup.subtree_control
bash: echo: write error: Operation not supported
root@38bc99e50653:/go/src/github.com/opencontainers/runc# echo +memory > /sys/fs/cgroup/cgroup.subtree_control
bash: echo: write error: Operation not supported
root@38bc99e50653:/go/src/github.com/opencontainers/runc# cat /sys/fs/cgroup/cgroup.subtree_control
cpuset cpu
root@38bc99e50653:/go/src/github.com/opencontainers/runc# echo +pids > /sys/fs/cgroup/cgroup.subtree_control
root@38bc99e50653:/go/src/github.com/opencontainers/runc# cat /sys/fs/cgroup/cgroup.subtree_control
cpuset cpu pids

So, in the container we're not allowed to delegate some cgroups. This most probably has to do with what systemd sets to cgroup.subtree_control.

More to say, systemd does not know about some controllers, so it does not allow them even when Delegate=yes is set. The following is on the host:

$ cat /sys/fs/cgroup/cgroup.controllers 
cpuset cpu io memory hugetlb pids rdma misc
$ cat /sys/fs/cgroup/cgroup.subtree_control 
cpuset cpu io memory hugetlb pids

That is not a problem per se, as long as dockerd/podman cgroup has cgroup.subtree_control contents identical to cgroup.controllers'. The way to check it would be to find dockerd pid, check its cgroup via cat /proc/$PID/cgroup, and then check that cgroup's cgroup.subtree_control`.

For me, I get:

$ pidof podman
1407576

$ cat /proc/1407576/cgroup 
0::/user.slice/user-1000.slice/user@1000.service/app.slice/vte-spawn-2e0ee5be-4af4-41fe-81b8-8a82675e4472.scope

$ cat /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/vte-spawn-2e0ee5be-4af4-41fe-81b8-8a82675e4472.scope/cgroup.controllers 
cpuset cpu io memory pids

$ cat /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/vte-spawn-2e0ee5be-4af4-41fe-81b8-8a82675e4472.scope/cgroup.subtree_control 
cpu

$

Also, in my case, systemctl --user show vte-spawn-2e0ee5be-4af4-41fe-81b8-8a82675e4472.scope shows Delegate=no.

To fix that, I had to add this file:

$ cat /etc/systemd/user/vte-spawn-.scope.d/delegate.conf
[Scope]
Delegate=yes

and do

$ systemct --user daemon-reload

After that, in a new shell:

[kir@kir-rhat ~]$ cat /proc/self/cgroup 
0::/user.slice/user-1000.slice/user@1000.service/app.slice/vte-spawn-1ad84904-ba4e-4866-9797-d450995c1aa9.scope
[kir@kir-rhat ~]$ systemctl --user show vte-spawn-1ad84904-ba4e-4866-9797-d450995c1aa9.scope | grep Dele
Delegate=yes
DelegateControllers=cpu cpuacct cpuset io blkio memory devices pids bpf-firewall bpf-devices bpf-foreign bpf-socket-bind bpf-restrict-network-interfaces
[kir@kir-rhat ~]$ cat /sys/fs/cgroup//user.slice/user-1000.slice/user@1000.service/app.slice/vte-spawn-1ad84904-ba4e-4866-9797-d450995c1aa9.scope/cgroup.controllers 
cpuset cpu io memory pids
[kir@kir-rhat ~]$ cat /sys/fs/cgroup//user.slice/user-1000.slice/user@1000.service/app.slice/vte-spawn-1ad84904-ba4e-4866-9797-d450995c1aa9.scope/cgroup.subtree_control 
[kir@kir-rhat ~]$ # ^^^ Still empty :(
[kir@kir-rhat ~]$ cat /sys/fs/cgroup//user.slice/user-1000.slice/user@1000.service/app.slice/cgroup.subtree_control 
cpuset cpu io memory pids
[kir@kir-rhat ~]$ # ^^^ Parent one is good though

and I still can't delegate memory controller for some reason:

[kir@kir-rhat runc]$ sudo make shell
...
root@a2ab8418acb4:/go/src/github.com/opencontainers/runc# cat /sys/fs/cgroup/cgroup.controllers 
cpuset cpu io memory hugetlb pids
root@a2ab8418acb4:/go/src/github.com/opencontainers/runc# cat /sys/fs/cgroup/cgroup.subtree_control  
root@a2ab8418acb4:/go/src/github.com/opencontainers/runc# echo +cpu > /sys/fs/cgroup/cgroup.subtree_control 
root@a2ab8418acb4:/go/src/github.com/opencontainers/runc# echo +memory > /sys/fs/cgroup/cgroup.subtree_control 
bash: echo: write error: Operation not supported
root@a2ab8418acb4:/go/src/github.com/opencontainers/runc# 

@cyphar
Copy link
Member Author

cyphar commented Aug 2, 2023

I'm confused though -- in my case the container is being spawned with --privileged with a root daemon configured with Delegate=yes (and runc sets Delegate=yes for container cgroups as well AFAIK). I don't use rootless docker.

% cat /proc/$(pgrep dockerd)/cgroup
0::/system.slice/docker.service
% systemctl show docker.service | grep Delegate
Delegate=yes
DelegateControllers=cpu cpuacct cpuset io blkio memory devices pids bpf-firewall bpf-devices bpf-foreign bpf-socket-bind bpf-restrict-network-interfaces
% cat /sys/fs/cgroup/system.slice/docker.service/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc
% cat /sys/fs/cgroup/system.slice/docker.service/cgroup.subtree_control
%

(That's not a typo -- there is nothing in subtree_control.)

Why is cgroup.subtree_control not including everything? Is this a systemd bug?

The container's scope is similarly configured:

% cat /proc/$pid1/cgroup
0::/system.slice/docker-72f09c7c55f7d9a80baca78f8a08875745ca023246547f2863f4d0722dc3dca6.scope
% sudo systemctl show docker-72f09c7c55f7d9a80baca78f8a08875745ca023246547f2863f4d0722dc3dca6.scope | grep Delegate
Delegate=yes
DelegateControllers=cpu cpuacct cpuset io blkio memory devices pids bpf-firewall bpf-devices bpf-foreign bpf-socket-bind bpf-restrict-network-interfaces
% cat /sys/fs/cgroup/system.slice/docker-72f09c7c55f7d9a80baca78f8a08875745ca023246547f2863f4d0722dc3dca6.scope/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc
% cat /sys/fs/cgroup/system.slice/docker-72f09c7c55f7d9a80baca78f8a08875745ca023246547f2863f4d0722dc3dca6.scope/cgroup.subtree_control
%

@cyphar
Copy link
Member Author

cyphar commented Aug 2, 2023

and I still can't delegate memory controller for some reason:

This is a cgroupfs restriction. cgroups that cannot be converted to threaded mode cannot have subtree delegated if there are processes in the cgroup:

static int cgroup_vet_subtree_control_enable(struct cgroup *cgrp, u16 enable)
{
	u16 domain_enable = enable & ~cgrp_dfl_threaded_ss_mask;

	/* if nothing is getting enabled, nothing to worry about */
	if (!enable)
		return 0;

	/* can @cgrp host any resources? */
	if (!cgroup_is_valid_domain(cgrp->dom_cgrp))
		return -EOPNOTSUPP;

	/* mixables don't care */
	if (cgroup_is_mixable(cgrp))
		return 0;

	if (domain_enable) {
		/* can't enable domain controllers inside a thread subtree */
		if (cgroup_is_thread_root(cgrp) || cgroup_is_threaded(cgrp))
			return -EOPNOTSUPP;
	} else {
		/*
		 * Threaded controllers can handle internal competitions
		 * and are always allowed inside a (prospective) thread
		 * subtree.
		 */
		if (cgroup_can_be_thread_root(cgrp) || cgroup_is_threaded(cgrp))
			return 0;
	}

	/*
	 * Controllers can't be enabled for a cgroup with tasks to avoid
	 * child cgroups competing against tasks.
	 */
	if (cgroup_has_tasks(cgrp))
		return -EBUSY;

	return 0;
}

Basically, you can't add to the subtree set once the cgroup has processes except in some special cases.

@kolyshkin
Copy link
Contributor

Basically, you can't add to the subtree set once the cgroup has processes except in some special cases.

Yes, figured that one out already. The workaround would be to start container init process in a sub-cgroup, and then change the top-level cgroup's cgroup.subtree_control.

I think we should do something like what is done in kind tool here: kubernetes-sigs/kind@3c9c318.

Here's what I ended up with: #3960.

@kolyshkin
Copy link
Contributor

As a side note, I think we should need to add CI jobs that do make integration and make unittest (as currently in CI we only do make localintegration and make localunittest, so we do not test that test-in-docker works).

@cyphar
Copy link
Member Author

cyphar commented Aug 3, 2023

I think we used to use Docker in CI and then switched it to be local after we split out the test runs into a proper matrix.

@kolyshkin
Copy link
Contributor

OK, #3960 is ready and (together with just-merged #3954) fixes this issue (on my laptop, that is)

@kolyshkin
Copy link
Contributor

I think we used to use Docker in CI and then switched it to be local after we split out the test runs into a proper matrix.

One thing with testing inside Docker is, unless we can run systemd inside that testing container, we do not and can't test systemd-related functionality (systemd cgroup driver).

Having said that, we can add jobs to CI to make sure make integration unittest works via docker.

@cyphar
Copy link
Member Author

cyphar commented Aug 5, 2023

#3960 fixed the issue. We should make a separate PR to add make integration unittest to CI.

@cyphar cyphar closed this as completed Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants