Add support for CRIU's 'ignore' manage-cgroup mode #3447

adrianreber · 2022-04-03T17:25:44Z

This adds an already existing CRIU cgroup mode to be selectable from runc: 'ignore'.

This is a fix for errors seen with Podman containers that have been restored multiple times into new cgroups. CRIU fails to checkpoint a container that has already been restored once in a cgroup named differently than the original container.

CRIU keeps information about the original cgroup and during restore in a differently named cgroup CRIU still tries to restore to original cgroup. This leads to a process with wrong cgroup settings and a second checkpoint of that container fails.

With 'ignore' CRIU will ignore any cgroups and leave the cgroup setting to runc.

kolyshkin · 2022-04-04T17:29:35Z

Do you think it will help the likes of [centos7] not ok 15 checkpoint --lazy-pages and restore #2760 and [fedora] not ok 16 checkpoint --lazy-pages and restore #2924?
Do we need a test case for this?

adrianreber · 2022-04-05T14:27:59Z

Do you think it will help the likes of [centos7] not ok 15 checkpoint --lazy-pages and restore #2760 and [fedora] not ok 16 checkpoint --lazy-pages and restore #2924?

No, I don't think this is related.

2. Do we need a test case for this?

Not sure. Do you have any ideas how to create a container with a cgroup and check that it has another after restore. I have seen a couple of cgroup functions in the bats files, but if you have a quick idea what would work here, we can add it.

kolyshkin · 2022-04-06T18:36:16Z

Do you think it will help the likes of [centos7] not ok 15 checkpoint --lazy-pages and restore #2760 and [fedora] not ok 16 checkpoint --lazy-pages and restore #2924?

No, I don't think this is related.

I am asking because (as I said in #2924 (comment)) I suspect that criu restores the container into the same cgroup, while systemd removes this cgroup (as the container was gone).

With --manage-cgroups-mode ignore CRIU will do nothing to restore cgroups, and runc will correctly create a proper cgroup (and set all the parameters), so the above issues should be gone.

That is just my understanding, I might be wrong.

kolyshkin · 2022-04-06T18:47:44Z

Ah, so I tried that before (7d763aa) but for some reason I haven't finished it (maybe it did not work for me).

I will retry.

adrianreber · 2022-04-07T05:42:46Z

Do you think it will help the likes of [centos7] not ok 15 checkpoint --lazy-pages and restore #2760 and [fedora] not ok 16 checkpoint --lazy-pages and restore #2924?

No, I don't think this is related.

2. Do we need a test case for this?

Not sure. Do you have any ideas how to create a container with a cgroup and check that it has another after restore. I have seen a couple of cgroup functions in the bats files, but if you have a quick idea what would work here, we can add it.

Do you think it will help the likes of [centos7] not ok 15 checkpoint --lazy-pages and restore #2760 and [fedora] not ok 16 checkpoint --lazy-pages and restore #2924?

No, I don't think this is related.

I am asking because (as I said in #2924 (comment)) I suspect that criu restores the container into the same cgroup, while systemd removes this cgroup (as the container was gone).

Ah, right. Then this could indeed help.

With --manage-cgroup-mode ignore CRIU will do nothing to restore cgroups, and runc will correctly create a proper cgroup (and set all the parameters), so the above issues should be gone.

There is also this discussion where the kernel might have a problem with correct cgroup deletion: checkpoint-restore/criu#1774 (comment)

But with your explanation it sounds like a correctly working ignore could help you.

kolyshkin

@adrianreber I think that setManageCgroupsMode call is missing from restore.go, and so the option is ignored upon restore. Can you add it here?

adrianreber · 2022-05-05T06:09:56Z

@adrianreber I think that setManageCgroupsMode call is missing from restore.go, and so the option is ignored upon restore. Can you add it here?

Good point, seems like the parameter never had any effect on restore until now. Updated.

kolyshkin · 2022-05-12T02:42:17Z

Do you have any ideas how to create a container with a cgroup and check that it has another after restore. I have seen a couple of cgroup functions in the bats files, but if you have a quick idea what would work here, we can add it.

Well, let's do it, since without such tests we do not know if the fix really works or not

Changing the cgroup path in container config is pretty simple -- you call set_cgroups_path and it randomizes it (well, just the suffix).

Here is a draft for the test (you can append this to tests/integration/checkpoint.bats):

@test "checkpoint and restore into a different cgroup" {
        local pid name=test-diff-cgroup

        set_cgroups_path
        runc run -d --console-socket "$CONSOLE_SOCKET" --pid-file pid "$name"
        [ "$status" -eq 0 ]

        testcontainer "$name" running
        # Check the cgroup.
        pid="$(cat pid)"
        echo "OCI_CGROUPS_PATH=$OCI_CGROUPS_PATH"
        cat "/proc/$pid/cgroup"

        runc checkpoint --manage-cgroups-mode ignore --work-path ./work-dir "$name"
        grep -B 5 Error ./work-dir/dump.log || true
        [ "$status" -eq 0 ]

        testcontainer "$name" checkpointed

        # Modify the cgroup path.
        set_cgroups_path
        runc restore --manage-cgroups-mode ignore --pid-file pid -d \
                --work-path ./work-dir --console-socket "$CONSOLE_SOCKET" "$name"
        grep -B 5 Error ./work-dir/restore.log || true
        [ "$status" -eq 0 ]

        testcontainer "$name" running

        # Check the cgroup name has changed.
        pid="$(cat pid)"
        echo "OCI_CGROUPS_PATH=$OCI_CGROUPS_PATH"
        cat "/proc/$pid/cgroup"
        exit 1
}

This test deliberately fails so you can see all the debug output.

Here is how to run it, and here's what it shows on my machine:

$ make
...
$ sudo bats -f different tests/integration/checkpoint.bats 
 ✗ checkpoint and restore into a different cgroup
   (in test file tests/integration/checkpoint.bats, line 441)
     `exit 1' failed
   runc spec (status=0):
   
   runc run -d --console-socket /tmp/bats-run-OlQVIP/runc.urXhFf/tty/sock --pid-file pid test-diff-cgroup (status=0):
   
   runc state test-diff-cgroup (status=0):
   {
     "ociVersion": "1.0.2-dev",
     "id": "test-diff-cgroup",
     "pid": 181707,
     "status": "running",
     "bundle": "/tmp/bats-run-OlQVIP/runc.urXhFf/bundle",
     "rootfs": "/tmp/bats-run-OlQVIP/runc.urXhFf/bundle/rootfs",
     "created": "2022-05-12T02:31:20.689876979Z",
     "owner": ""
   }
   OCI_CGROUPS_PATH=/runc-cgroups-integration-test/test-cgroup-1873
   0::/runc-cgroups-integration-test/test-cgroup-1873
   runc checkpoint --manage-cgroups-mode ignore --work-path ./work-dir test-diff-cgroup (status=0):
   
   runc state test-diff-cgroup (status=1):
   time="2022-05-11T19:31:20-07:00" level=error msg="container does not exist"
   runc restore --manage-cgroups-mode ignore --pid-file pid -d --work-path ./work-dir --console-socket /tmp/bats-run-OlQVIP/runc.urXhFf/tty/sock test-diff-cgroup (status=0):
   
   runc state test-diff-cgroup (status=0):
   {
     "ociVersion": "1.0.2-dev",
     "id": "test-diff-cgroup",
     "pid": 181780,
     "status": "running",
     "bundle": "/tmp/bats-run-OlQVIP/runc.urXhFf/bundle",
     "rootfs": "/tmp/bats-run-OlQVIP/runc.urXhFf/bundle/rootfs",
     "created": "2022-05-12T02:31:21.053776806Z",
     "owner": ""
   }
   OCI_CGROUPS_PATH=/runc-cgroups-integration-test/test-cgroup-21203
   0::/runc-cgroups-integration-test/test-cgroup-1873
   rmdir: failed to remove '/sys/fs/cgroup//runc-cgroups-integration-test': Device or resource busy

1 test, 1 failure

So, from what I see it restores the container into the old cgroup (which ends in 1873, while the config is changed before the restore to have cgroupPath which ends in 21203).

This is probably happening because I am using criu 3.16.1:

[kir@kir-rhat runc]$ rpm -qf $(which criu)
criu-3.16.1-2.fc35.x86_64

Let's try with 3.17 (I am still on Fedora 35 so have to compile it).

Well, this is the same with 3.17. I guess some more changes are needed for runc, so it can put the container into proper cgroup itself (not via criu). I have some code somewhere that did it...will take a look next week.

kolyshkin · 2022-05-12T02:49:33Z

@adrianreber my old code is at https://github.com/kolyshkin/runc/commits/restore-cgroup (last 4 commits), maybe you will find something that's missing from this PR. In particular, the changes in libcontainer/container_linux.go are interesting.

adrianreber · 2022-05-24T11:18:19Z

@kolyshkin Thanks for the test example. I used it and it works correctly for me. Let's see what CI thinks about it.

kolyshkin · 2022-06-09T23:12:12Z

fedora CI failure seems to be related, can you PTAL @adrianreber ?

adrianreber · 2022-06-17T14:15:25Z

fedora CI failure seems to be related, can you PTAL @adrianreber ?

I see the following error in CI:

+ disable_cgroup
+ '[' -d /sys/fs/cgroup//runc-cgroups-integration-test ']'
+ rmdir /sys/fs/cgroup//runc-cgroups-integration-test
rmdir: failed to remove '/sys/fs/cgroup//runc-cgroups-integration-test': Device or resource busy
make: *** [Makefile:131: localrootlessintegration] Error 1
make: Leaving directory '/vagrant'
Connection to 192.168.121.237 closed.

With my added test the rootless test seems to fail. Not sure why. I do not see similar problems locally. Strange.

kolyshkin · 2022-06-30T21:50:37Z

rmdir: failed to remove '/sys/fs/cgroup//runc-cgroups-integration-test': Device or resource busy

This means there's a process or a container left after a previous test run. IOW some container was not removed/killed.

You can add some debug to see what is going on, and run the CI as much as you want.

This adds an already existing CRIU cgroup mode to be selectable from runc: 'ignore'. This is a fix for errors seen with Podman containers that have been restored multiple times into new cgroups. CRIU fails to checkpoint a container that has already been restored once in a cgroup named differently than the original container. CRIU keeps information about the original cgroup and during restore in a differently named cgroup CRIU still tries to restore to original cgroup. This leads to a process with wrong cgroup settings and a second checkpoint of that container fails. With 'ignore' CRIU will ignore any cgroups and leave the cgroup setting to runc. Signed-off-by: Adrian Reber <areber@redhat.com>

kolyshkin · 2022-07-30T01:41:38Z

@adrianreber the test still fails, and it seems there's a reason for that.

I wrote a small test (you can append this to the end of checkpoint.bats file):

@test "my" {
        set_cgroups_path
        runc run -d --console-socket "$CONSOLE_SOCKET" test_busybox
        [ "$status" -eq 0 ]
        testcontainer test_busybox running
        echo "0 CGROUP_PATH=$CGROUP_PATH"
        ls -ld "$CGROUP_PATH" || true

        # checkpoint the running container
        runc checkpoint --work-path ./work-dir --manage-cgroups-mode ignore test_busybox
        grep -B 5 Error ./work-dir/dump.log || true
        [ "$status" -eq 0 ]
        # after checkpoint busybox is no longer running
        testcontainer test_busybox checkpointed

        echo "1 CGROUP_PATH=$CGROUP_PATH"
        ls -ld "$CGROUP_PATH" || true
        orig_cgroup_path=$CGROUP_PATH

        # restore from checkpoint
        set_cgroups_path
        runc "$@" restore -d --work-path ./work-dir --manage-cgroups-mode ignore --console-socket "$CONSOLE_SOCKET" test_busybox
        grep -B 5 Error ./work-dir/restore.log || true
        [ "$status" -eq 0 ]
        # busybox should be back up and running
        testcontainer test_busybox running

        echo "2 CGROUP_PATH=$CGROUP_PATH"
        ls -ld "$CGROUP_PATH" || true

        echo "3 orig_cgroup_path=$orig_cgroup_path"
        ls -ld "$orig_cgroup_path" || true
        exit 1 # this is for the test to always fails and show logs
}

Next, make sure everything is compiled:

$ make all
go build -trimpath "-buildmode=pie"  -tags "seccomp urfave_cli_no_docs" -ldflags "-X main.gitCommit=v1.1.0-254-g0e8b1e7e-dirty -X main.version=1.1.0+dev " -o runc .
go build -trimpath "-buildmode=pie"  -tags "seccomp urfave_cli_no_docs" -ldflags "-X main.gitCommit=v1.1.0-254-g0e8b1e7e-dirty -X main.version=1.1.0+dev " -o contrib/cmd/recvtty/recvtty ./contrib/cmd/recvtty
go build -trimpath "-buildmode=pie"  -tags "seccomp urfave_cli_no_docs" -ldflags "-X main.gitCommit=v1.1.0-254-g0e8b1e7e-dirty -X main.version=1.1.0+dev " -o contrib/cmd/sd-helper/sd-helper ./contrib/cmd/sd-helper
go build -trimpath "-buildmode=pie"  -tags "seccomp urfave_cli_no_docs" -ldflags "-X main.gitCommit=v1.1.0-254-g0e8b1e7e-dirty -X main.version=1.1.0+dev " -o contrib/cmd/seccompagent/seccompagent ./contrib/cmd/seccompagent

Finally, run the above test:

$  sudo bats -f "my" tests/integration/checkpoint.bats
checkpoint.bats
 ✗ my
   (in test file tests/integration/checkpoint.bats, line 481)
     `exit 1' failed
   runc spec (status=0):
   
   runc run -d --console-socket /tmp/bats-run-vahX7f/runc.3kYwVb/tty/sock test_busybox (status=0):
   
   runc state test_busybox (status=0):
   {
     "ociVersion": "1.0.2-dev",
     "id": "test_busybox",
     "pid": 1194856,
     "status": "running",
     "bundle": "/tmp/bats-run-vahX7f/runc.3kYwVb/bundle",
     "rootfs": "/tmp/bats-run-vahX7f/runc.3kYwVb/bundle/rootfs",
     "created": "2022-07-30T01:38:56.151243561Z",
     "owner": ""
   }
   0 CGROUP_PATH=/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
   drwxr-xr-x. 2 root root 0 Jul 29 18:38 /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
   runc checkpoint --work-path ./work-dir --manage-cgroups-mode ignore test_busybox (status=0):
   
   runc state test_busybox (status=1):
   time="2022-07-29T18:38:56-07:00" level=error msg="container does not exist"
   1 CGROUP_PATH=/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
   ls: cannot access '/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881': No such file or directory
   runc restore -d --work-path ./work-dir --manage-cgroups-mode ignore --console-socket /tmp/bats-run-vahX7f/runc.3kYwVb/tty/sock test_busybox (status=0):
   
   runc state test_busybox (status=0):
   {
     "ociVersion": "1.0.2-dev",
     "id": "test_busybox",
     "pid": 1194928,
     "status": "running",
     "bundle": "/tmp/bats-run-vahX7f/runc.3kYwVb/bundle",
     "rootfs": "/tmp/bats-run-vahX7f/runc.3kYwVb/bundle/rootfs",
     "created": "2022-07-30T01:38:56.551376328Z",
     "owner": ""
   }
   2 CGROUP_PATH=/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-9654
   drwxr-xr-x. 2 root root 0 Jul 29 18:38 /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-9654
   3 orig_cgroup_path=/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
   drwxr-xr-x. 2 root root 0 Jul 29 18:38 /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
   rmdir: failed to remove '/sys/fs/cgroup//runc-cgroups-integration-test': Device or resource busy

1 test, 1 failure

As you can see, the old/original cgroup path (the one that ends with 17881) is non-existent after checkpoint, but it reappears upon restore.

I am not sure why, but at least you have easy repro.

kolyshkin · 2022-08-01T18:16:55Z

tests/integration/checkpoint.bats

+	echo "OCI_CGROUPS_PATH=$OCI_CGROUPS_PATH"
+	cat "/proc/$pid/cgroup"
+	new_cgroup="$OCI_CGROUPS_PATH"
+	[ "$original_cgroup" != "$new_cgroup" ]


Ughm, so here you basically check that set_cgroups_path changed the value of OCI_CGROUPS_PATH.

Well yes it did.

What should have done is checking that

the new cgroup path exists

the old cgroup path does not exist

the contents of /proc/$pid/cgroup differs from the old one

kolyshkin · 2022-08-01T18:17:49Z

Here's a high level overview of what is going on in a test from the prev. comment:

We set cgroups path and create a container. Container is running, cgroup path exists.
We checkpoint the container with --manage-cgroups-mode ignore. Container is stopped, cgroup path does not exist.
We change the cgroupsPath in container's config.json.
We restore the container.

Apparently, what happens is

runc creates new cgroup as per config.json
the container is restored to the old cgroup :-\

adrianreber · 2022-08-01T18:22:31Z

@kolyshkin thanks for looking so closely at this. I probably will not have time to look into this for the next couple of weeks, but I will try to see why it fails then.

kolyshkin · 2022-08-04T17:14:20Z

@adrianreber I have finally figured it out in #3546.

kolyshkin · 2022-11-02T17:15:03Z

Superseded by #3546.

This was referenced Apr 3, 2022

Checkpoint of imported container with different name fails containers/podman#13672

Closed

criu: add support for different manage cgroups modes containers/crun#903

Merged

kolyshkin requested changes May 4, 2022

View reviewed changes

adrianreber force-pushed the 2022-04-03-add-cgroup-ignore-mode branch from 4cb2488 to 822eadf Compare May 5, 2022 06:09

adrianreber requested a review from kolyshkin May 5, 2022 06:10

adrianreber force-pushed the 2022-04-03-add-cgroup-ignore-mode branch from 822eadf to 0ffacf8 Compare May 24, 2022 11:15

kolyshkin added this to the 1.2.0 milestone Jun 9, 2022

kolyshkin force-pushed the 2022-04-03-add-cgroup-ignore-mode branch from 0ffacf8 to d0818de Compare July 28, 2022 19:08

kolyshkin mentioned this pull request Jul 29, 2022

Criu ignore cgroup #3545

Closed

kolyshkin requested changes Aug 1, 2022

View reviewed changes

kolyshkin mentioned this pull request Aug 2, 2022

checkpoint/restore: implement --manage-cgroups-mode ignore #3546

Merged

kolyshkin closed this Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for CRIU's 'ignore' manage-cgroup mode #3447

Add support for CRIU's 'ignore' manage-cgroup mode #3447

adrianreber commented Apr 3, 2022

kolyshkin commented Apr 4, 2022

adrianreber commented Apr 5, 2022

kolyshkin commented Apr 6, 2022 •

edited

Loading

kolyshkin commented Apr 6, 2022

adrianreber commented Apr 7, 2022

kolyshkin left a comment •

edited

Loading

adrianreber commented May 5, 2022

kolyshkin commented May 12, 2022

kolyshkin commented May 12, 2022

adrianreber commented May 24, 2022

kolyshkin commented Jun 9, 2022

adrianreber commented Jun 17, 2022

kolyshkin commented Jun 30, 2022

kolyshkin commented Jul 30, 2022

kolyshkin Aug 1, 2022

kolyshkin commented Aug 1, 2022

adrianreber commented Aug 1, 2022

kolyshkin commented Aug 4, 2022

kolyshkin commented Nov 2, 2022

Add support for CRIU's 'ignore' manage-cgroup mode #3447

Add support for CRIU's 'ignore' manage-cgroup mode #3447

Conversation

adrianreber commented Apr 3, 2022

kolyshkin commented Apr 4, 2022

adrianreber commented Apr 5, 2022

kolyshkin commented Apr 6, 2022 • edited Loading

kolyshkin commented Apr 6, 2022

adrianreber commented Apr 7, 2022

kolyshkin left a comment • edited Loading

Choose a reason for hiding this comment

adrianreber commented May 5, 2022

kolyshkin commented May 12, 2022

kolyshkin commented May 12, 2022

adrianreber commented May 24, 2022

kolyshkin commented Jun 9, 2022

adrianreber commented Jun 17, 2022

kolyshkin commented Jun 30, 2022

kolyshkin commented Jul 30, 2022

kolyshkin Aug 1, 2022

Choose a reason for hiding this comment

kolyshkin commented Aug 1, 2022

adrianreber commented Aug 1, 2022

kolyshkin commented Aug 4, 2022

kolyshkin commented Nov 2, 2022

kolyshkin commented Apr 6, 2022 •

edited

Loading

kolyshkin left a comment •

edited

Loading