-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CRIU's 'ignore' manage-cgroup mode #3447
Add support for CRIU's 'ignore' manage-cgroup mode #3447
Conversation
|
No, I don't think this is related.
Not sure. Do you have any ideas how to create a container with a cgroup and check that it has another after restore. I have seen a couple of cgroup functions in the bats files, but if you have a quick idea what would work here, we can add it. |
I am asking because (as I said in #2924 (comment)) I suspect that criu restores the container into the same cgroup, while systemd removes this cgroup (as the container was gone). With That is just my understanding, I might be wrong. |
Ah, so I tried that before (7d763aa) but for some reason I haven't finished it (maybe it did not work for me). I will retry. |
No, I don't think this is related.
Not sure. Do you have any ideas how to create a container with a cgroup and check that it has another after restore. I have seen a couple of cgroup functions in the bats files, but if you have a quick idea what would work here, we can add it.
Ah, right. Then this could indeed help.
There is also this discussion where the kernel might have a problem with correct cgroup deletion: checkpoint-restore/criu#1774 (comment) But with your explanation it sounds like a correctly working |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrianreber I think that setManageCgroupsMode
call is missing from restore.go
, and so the option is ignored upon restore. Can you add it here?
4cb2488
to
822eadf
Compare
Good point, seems like the parameter never had any effect on restore until now. Updated. |
Well, let's do it, since without such tests we do not know if the fix really works or not Changing the cgroup path in container config is pretty simple -- you call Here is a draft for the test (you can append this to tests/integration/checkpoint.bats): @test "checkpoint and restore into a different cgroup" {
local pid name=test-diff-cgroup
set_cgroups_path
runc run -d --console-socket "$CONSOLE_SOCKET" --pid-file pid "$name"
[ "$status" -eq 0 ]
testcontainer "$name" running
# Check the cgroup.
pid="$(cat pid)"
echo "OCI_CGROUPS_PATH=$OCI_CGROUPS_PATH"
cat "/proc/$pid/cgroup"
runc checkpoint --manage-cgroups-mode ignore --work-path ./work-dir "$name"
grep -B 5 Error ./work-dir/dump.log || true
[ "$status" -eq 0 ]
testcontainer "$name" checkpointed
# Modify the cgroup path.
set_cgroups_path
runc restore --manage-cgroups-mode ignore --pid-file pid -d \
--work-path ./work-dir --console-socket "$CONSOLE_SOCKET" "$name"
grep -B 5 Error ./work-dir/restore.log || true
[ "$status" -eq 0 ]
testcontainer "$name" running
# Check the cgroup name has changed.
pid="$(cat pid)"
echo "OCI_CGROUPS_PATH=$OCI_CGROUPS_PATH"
cat "/proc/$pid/cgroup"
exit 1
} This test deliberately fails so you can see all the debug output. Here is how to run it, and here's what it shows on my machine: $ make
...
$ sudo bats -f different tests/integration/checkpoint.bats
✗ checkpoint and restore into a different cgroup
(in test file tests/integration/checkpoint.bats, line 441)
`exit 1' failed
runc spec (status=0):
runc run -d --console-socket /tmp/bats-run-OlQVIP/runc.urXhFf/tty/sock --pid-file pid test-diff-cgroup (status=0):
runc state test-diff-cgroup (status=0):
{
"ociVersion": "1.0.2-dev",
"id": "test-diff-cgroup",
"pid": 181707,
"status": "running",
"bundle": "/tmp/bats-run-OlQVIP/runc.urXhFf/bundle",
"rootfs": "/tmp/bats-run-OlQVIP/runc.urXhFf/bundle/rootfs",
"created": "2022-05-12T02:31:20.689876979Z",
"owner": ""
}
OCI_CGROUPS_PATH=/runc-cgroups-integration-test/test-cgroup-1873
0::/runc-cgroups-integration-test/test-cgroup-1873
runc checkpoint --manage-cgroups-mode ignore --work-path ./work-dir test-diff-cgroup (status=0):
runc state test-diff-cgroup (status=1):
time="2022-05-11T19:31:20-07:00" level=error msg="container does not exist"
runc restore --manage-cgroups-mode ignore --pid-file pid -d --work-path ./work-dir --console-socket /tmp/bats-run-OlQVIP/runc.urXhFf/tty/sock test-diff-cgroup (status=0):
runc state test-diff-cgroup (status=0):
{
"ociVersion": "1.0.2-dev",
"id": "test-diff-cgroup",
"pid": 181780,
"status": "running",
"bundle": "/tmp/bats-run-OlQVIP/runc.urXhFf/bundle",
"rootfs": "/tmp/bats-run-OlQVIP/runc.urXhFf/bundle/rootfs",
"created": "2022-05-12T02:31:21.053776806Z",
"owner": ""
}
OCI_CGROUPS_PATH=/runc-cgroups-integration-test/test-cgroup-21203
0::/runc-cgroups-integration-test/test-cgroup-1873
rmdir: failed to remove '/sys/fs/cgroup//runc-cgroups-integration-test': Device or resource busy
1 test, 1 failure So, from what I see it restores the container into the old cgroup (which ends in This is probably happening because I am using criu 3.16.1: [kir@kir-rhat runc]$ rpm -qf $(which criu)
criu-3.16.1-2.fc35.x86_64 Let's try with 3.17 (I am still on Fedora 35 so have to compile it). Well, this is the same with 3.17. I guess some more changes are needed for runc, so it can put the container into proper cgroup itself (not via criu). I have some code somewhere that did it...will take a look next week. |
@adrianreber my old code is at https://github.com/kolyshkin/runc/commits/restore-cgroup (last 4 commits), maybe you will find something that's missing from this PR. In particular, the changes in |
822eadf
to
0ffacf8
Compare
@kolyshkin Thanks for the test example. I used it and it works correctly for me. Let's see what CI thinks about it. |
fedora CI failure seems to be related, can you PTAL @adrianreber ? |
I see the following error in CI:
With my added test the rootless test seems to fail. Not sure why. I do not see similar problems locally. Strange. |
This means there's a process or a container left after a previous test run. IOW some container was not removed/killed. You can add some debug to see what is going on, and run the CI as much as you want. |
This adds an already existing CRIU cgroup mode to be selectable from runc: 'ignore'. This is a fix for errors seen with Podman containers that have been restored multiple times into new cgroups. CRIU fails to checkpoint a container that has already been restored once in a cgroup named differently than the original container. CRIU keeps information about the original cgroup and during restore in a differently named cgroup CRIU still tries to restore to original cgroup. This leads to a process with wrong cgroup settings and a second checkpoint of that container fails. With 'ignore' CRIU will ignore any cgroups and leave the cgroup setting to runc. Signed-off-by: Adrian Reber <areber@redhat.com>
0ffacf8
to
d0818de
Compare
@adrianreber the test still fails, and it seems there's a reason for that. I wrote a small test (you can append this to the end of checkpoint.bats file): @test "my" {
set_cgroups_path
runc run -d --console-socket "$CONSOLE_SOCKET" test_busybox
[ "$status" -eq 0 ]
testcontainer test_busybox running
echo "0 CGROUP_PATH=$CGROUP_PATH"
ls -ld "$CGROUP_PATH" || true
# checkpoint the running container
runc checkpoint --work-path ./work-dir --manage-cgroups-mode ignore test_busybox
grep -B 5 Error ./work-dir/dump.log || true
[ "$status" -eq 0 ]
# after checkpoint busybox is no longer running
testcontainer test_busybox checkpointed
echo "1 CGROUP_PATH=$CGROUP_PATH"
ls -ld "$CGROUP_PATH" || true
orig_cgroup_path=$CGROUP_PATH
# restore from checkpoint
set_cgroups_path
runc "$@" restore -d --work-path ./work-dir --manage-cgroups-mode ignore --console-socket "$CONSOLE_SOCKET" test_busybox
grep -B 5 Error ./work-dir/restore.log || true
[ "$status" -eq 0 ]
# busybox should be back up and running
testcontainer test_busybox running
echo "2 CGROUP_PATH=$CGROUP_PATH"
ls -ld "$CGROUP_PATH" || true
echo "3 orig_cgroup_path=$orig_cgroup_path"
ls -ld "$orig_cgroup_path" || true
exit 1 # this is for the test to always fails and show logs
} Next, make sure everything is compiled: $ make all
go build -trimpath "-buildmode=pie" -tags "seccomp urfave_cli_no_docs" -ldflags "-X main.gitCommit=v1.1.0-254-g0e8b1e7e-dirty -X main.version=1.1.0+dev " -o runc .
go build -trimpath "-buildmode=pie" -tags "seccomp urfave_cli_no_docs" -ldflags "-X main.gitCommit=v1.1.0-254-g0e8b1e7e-dirty -X main.version=1.1.0+dev " -o contrib/cmd/recvtty/recvtty ./contrib/cmd/recvtty
go build -trimpath "-buildmode=pie" -tags "seccomp urfave_cli_no_docs" -ldflags "-X main.gitCommit=v1.1.0-254-g0e8b1e7e-dirty -X main.version=1.1.0+dev " -o contrib/cmd/sd-helper/sd-helper ./contrib/cmd/sd-helper
go build -trimpath "-buildmode=pie" -tags "seccomp urfave_cli_no_docs" -ldflags "-X main.gitCommit=v1.1.0-254-g0e8b1e7e-dirty -X main.version=1.1.0+dev " -o contrib/cmd/seccompagent/seccompagent ./contrib/cmd/seccompagent Finally, run the above test: $ sudo bats -f "my" tests/integration/checkpoint.bats
checkpoint.bats
✗ my
(in test file tests/integration/checkpoint.bats, line 481)
`exit 1' failed
runc spec (status=0):
runc run -d --console-socket /tmp/bats-run-vahX7f/runc.3kYwVb/tty/sock test_busybox (status=0):
runc state test_busybox (status=0):
{
"ociVersion": "1.0.2-dev",
"id": "test_busybox",
"pid": 1194856,
"status": "running",
"bundle": "/tmp/bats-run-vahX7f/runc.3kYwVb/bundle",
"rootfs": "/tmp/bats-run-vahX7f/runc.3kYwVb/bundle/rootfs",
"created": "2022-07-30T01:38:56.151243561Z",
"owner": ""
}
0 CGROUP_PATH=/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
drwxr-xr-x. 2 root root 0 Jul 29 18:38 /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
runc checkpoint --work-path ./work-dir --manage-cgroups-mode ignore test_busybox (status=0):
runc state test_busybox (status=1):
time="2022-07-29T18:38:56-07:00" level=error msg="container does not exist"
1 CGROUP_PATH=/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
ls: cannot access '/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881': No such file or directory
runc restore -d --work-path ./work-dir --manage-cgroups-mode ignore --console-socket /tmp/bats-run-vahX7f/runc.3kYwVb/tty/sock test_busybox (status=0):
runc state test_busybox (status=0):
{
"ociVersion": "1.0.2-dev",
"id": "test_busybox",
"pid": 1194928,
"status": "running",
"bundle": "/tmp/bats-run-vahX7f/runc.3kYwVb/bundle",
"rootfs": "/tmp/bats-run-vahX7f/runc.3kYwVb/bundle/rootfs",
"created": "2022-07-30T01:38:56.551376328Z",
"owner": ""
}
2 CGROUP_PATH=/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-9654
drwxr-xr-x. 2 root root 0 Jul 29 18:38 /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-9654
3 orig_cgroup_path=/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
drwxr-xr-x. 2 root root 0 Jul 29 18:38 /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
rmdir: failed to remove '/sys/fs/cgroup//runc-cgroups-integration-test': Device or resource busy
1 test, 1 failure
As you can see, the old/original cgroup path (the one that ends with I am not sure why, but at least you have easy repro. |
echo "OCI_CGROUPS_PATH=$OCI_CGROUPS_PATH" | ||
cat "/proc/$pid/cgroup" | ||
new_cgroup="$OCI_CGROUPS_PATH" | ||
[ "$original_cgroup" != "$new_cgroup" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ughm, so here you basically check that set_cgroups_path
changed the value of OCI_CGROUPS_PATH
.
Well yes it did.
What should have done is checking that
- the new cgroup path exists
- the old cgroup path does not exist
- the contents of
/proc/$pid/cgroup
differs from the old one
Here's a high level overview of what is going on in a test from the prev. comment:
Apparently, what happens is
|
@kolyshkin thanks for looking so closely at this. I probably will not have time to look into this for the next couple of weeks, but I will try to see why it fails then. |
@adrianreber I have finally figured it out in #3546. |
Superseded by #3546. |
This adds an already existing CRIU cgroup mode to be selectable from runc: 'ignore'.
This is a fix for errors seen with Podman containers that have been restored multiple times into new cgroups. CRIU fails to checkpoint a container that has already been restored once in a cgroup named differently than the original container.
CRIU keeps information about the original cgroup and during restore in a differently named cgroup CRIU still tries to restore to original cgroup. This leads to a process with wrong cgroup settings and a second checkpoint of that container fails.
With 'ignore' CRIU will ignore any cgroups and leave the cgroup setting to runc.