Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CRIU's 'ignore' manage-cgroup mode #3447

Conversation

adrianreber
Copy link
Contributor

This adds an already existing CRIU cgroup mode to be selectable from runc: 'ignore'.

This is a fix for errors seen with Podman containers that have been restored multiple times into new cgroups. CRIU fails to checkpoint a container that has already been restored once in a cgroup named differently than the original container.

CRIU keeps information about the original cgroup and during restore in a differently named cgroup CRIU still tries to restore to original cgroup. This leads to a process with wrong cgroup settings and a second checkpoint of that container fails.

With 'ignore' CRIU will ignore any cgroups and leave the cgroup setting to runc.

@kolyshkin
Copy link
Contributor

  1. Do you think it will help the likes of [centos7] not ok 15 checkpoint --lazy-pages and restore #2760 and [fedora] not ok 16 checkpoint --lazy-pages and restore #2924?
  2. Do we need a test case for this?

@adrianreber
Copy link
Contributor Author

  1. Do you think it will help the likes of [centos7] not ok 15 checkpoint --lazy-pages and restore #2760 and [fedora] not ok 16 checkpoint --lazy-pages and restore #2924?

No, I don't think this is related.

2. Do we need a test case for this?

Not sure. Do you have any ideas how to create a container with a cgroup and check that it has another after restore. I have seen a couple of cgroup functions in the bats files, but if you have a quick idea what would work here, we can add it.

@kolyshkin
Copy link
Contributor

kolyshkin commented Apr 6, 2022

  1. Do you think it will help the likes of [centos7] not ok 15 checkpoint --lazy-pages and restore #2760 and [fedora] not ok 16 checkpoint --lazy-pages and restore #2924?

No, I don't think this is related.

I am asking because (as I said in #2924 (comment)) I suspect that criu restores the container into the same cgroup, while systemd removes this cgroup (as the container was gone).

With --manage-cgroups-mode ignore CRIU will do nothing to restore cgroups, and runc will correctly create a proper cgroup (and set all the parameters), so the above issues should be gone.

That is just my understanding, I might be wrong.

@kolyshkin
Copy link
Contributor

Ah, so I tried that before (7d763aa) but for some reason I haven't finished it (maybe it did not work for me).

I will retry.

@adrianreber
Copy link
Contributor Author

  1. Do you think it will help the likes of [centos7] not ok 15 checkpoint --lazy-pages and restore #2760 and [fedora] not ok 16 checkpoint --lazy-pages and restore #2924?

No, I don't think this is related.

2. Do we need a test case for this?

Not sure. Do you have any ideas how to create a container with a cgroup and check that it has another after restore. I have seen a couple of cgroup functions in the bats files, but if you have a quick idea what would work here, we can add it.

  1. Do you think it will help the likes of [centos7] not ok 15 checkpoint --lazy-pages and restore #2760 and [fedora] not ok 16 checkpoint --lazy-pages and restore #2924?

No, I don't think this is related.

I am asking because (as I said in #2924 (comment)) I suspect that criu restores the container into the same cgroup, while systemd removes this cgroup (as the container was gone).

Ah, right. Then this could indeed help.

With --manage-cgroup-mode ignore CRIU will do nothing to restore cgroups, and runc will correctly create a proper cgroup (and set all the parameters), so the above issues should be gone.

There is also this discussion where the kernel might have a problem with correct cgroup deletion: checkpoint-restore/criu#1774 (comment)

But with your explanation it sounds like a correctly working ignore could help you.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrianreber I think that setManageCgroupsMode call is missing from restore.go, and so the option is ignored upon restore. Can you add it here?

@adrianreber adrianreber force-pushed the 2022-04-03-add-cgroup-ignore-mode branch from 4cb2488 to 822eadf Compare May 5, 2022 06:09
@adrianreber
Copy link
Contributor Author

@adrianreber I think that setManageCgroupsMode call is missing from restore.go, and so the option is ignored upon restore. Can you add it here?

Good point, seems like the parameter never had any effect on restore until now. Updated.

@adrianreber adrianreber requested a review from kolyshkin May 5, 2022 06:10
@kolyshkin
Copy link
Contributor

Do you have any ideas how to create a container with a cgroup and check that it has another after restore. I have seen a couple of cgroup functions in the bats files, but if you have a quick idea what would work here, we can add it.

Well, let's do it, since without such tests we do not know if the fix really works or not

Changing the cgroup path in container config is pretty simple -- you call set_cgroups_path and it randomizes it (well, just the suffix).

Here is a draft for the test (you can append this to tests/integration/checkpoint.bats):

@test "checkpoint and restore into a different cgroup" {
        local pid name=test-diff-cgroup

        set_cgroups_path
        runc run -d --console-socket "$CONSOLE_SOCKET" --pid-file pid "$name"
        [ "$status" -eq 0 ]

        testcontainer "$name" running
        # Check the cgroup.
        pid="$(cat pid)"
        echo "OCI_CGROUPS_PATH=$OCI_CGROUPS_PATH"
        cat "/proc/$pid/cgroup"

        runc checkpoint --manage-cgroups-mode ignore --work-path ./work-dir "$name"
        grep -B 5 Error ./work-dir/dump.log || true
        [ "$status" -eq 0 ]

        testcontainer "$name" checkpointed

        # Modify the cgroup path.
        set_cgroups_path
        runc restore --manage-cgroups-mode ignore --pid-file pid -d \
                --work-path ./work-dir --console-socket "$CONSOLE_SOCKET" "$name"
        grep -B 5 Error ./work-dir/restore.log || true
        [ "$status" -eq 0 ]

        testcontainer "$name" running

        # Check the cgroup name has changed.
        pid="$(cat pid)"
        echo "OCI_CGROUPS_PATH=$OCI_CGROUPS_PATH"
        cat "/proc/$pid/cgroup"
        exit 1
}

This test deliberately fails so you can see all the debug output.

Here is how to run it, and here's what it shows on my machine:

$ make
...
$ sudo bats -f different tests/integration/checkpoint.bats 
 ✗ checkpoint and restore into a different cgroup
   (in test file tests/integration/checkpoint.bats, line 441)
     `exit 1' failed
   runc spec (status=0):
   
   runc run -d --console-socket /tmp/bats-run-OlQVIP/runc.urXhFf/tty/sock --pid-file pid test-diff-cgroup (status=0):
   
   runc state test-diff-cgroup (status=0):
   {
     "ociVersion": "1.0.2-dev",
     "id": "test-diff-cgroup",
     "pid": 181707,
     "status": "running",
     "bundle": "/tmp/bats-run-OlQVIP/runc.urXhFf/bundle",
     "rootfs": "/tmp/bats-run-OlQVIP/runc.urXhFf/bundle/rootfs",
     "created": "2022-05-12T02:31:20.689876979Z",
     "owner": ""
   }
   OCI_CGROUPS_PATH=/runc-cgroups-integration-test/test-cgroup-1873
   0::/runc-cgroups-integration-test/test-cgroup-1873
   runc checkpoint --manage-cgroups-mode ignore --work-path ./work-dir test-diff-cgroup (status=0):
   
   runc state test-diff-cgroup (status=1):
   time="2022-05-11T19:31:20-07:00" level=error msg="container does not exist"
   runc restore --manage-cgroups-mode ignore --pid-file pid -d --work-path ./work-dir --console-socket /tmp/bats-run-OlQVIP/runc.urXhFf/tty/sock test-diff-cgroup (status=0):
   
   runc state test-diff-cgroup (status=0):
   {
     "ociVersion": "1.0.2-dev",
     "id": "test-diff-cgroup",
     "pid": 181780,
     "status": "running",
     "bundle": "/tmp/bats-run-OlQVIP/runc.urXhFf/bundle",
     "rootfs": "/tmp/bats-run-OlQVIP/runc.urXhFf/bundle/rootfs",
     "created": "2022-05-12T02:31:21.053776806Z",
     "owner": ""
   }
   OCI_CGROUPS_PATH=/runc-cgroups-integration-test/test-cgroup-21203
   0::/runc-cgroups-integration-test/test-cgroup-1873
   rmdir: failed to remove '/sys/fs/cgroup//runc-cgroups-integration-test': Device or resource busy

1 test, 1 failure

So, from what I see it restores the container into the old cgroup (which ends in 1873, while the config is changed before the restore to have cgroupPath which ends in 21203).

This is probably happening because I am using criu 3.16.1:

[kir@kir-rhat runc]$ rpm -qf $(which criu)
criu-3.16.1-2.fc35.x86_64

Let's try with 3.17 (I am still on Fedora 35 so have to compile it).

Well, this is the same with 3.17. I guess some more changes are needed for runc, so it can put the container into proper cgroup itself (not via criu). I have some code somewhere that did it...will take a look next week.

@kolyshkin
Copy link
Contributor

@adrianreber my old code is at https://github.com/kolyshkin/runc/commits/restore-cgroup (last 4 commits), maybe you will find something that's missing from this PR. In particular, the changes in libcontainer/container_linux.go are interesting.

@adrianreber adrianreber force-pushed the 2022-04-03-add-cgroup-ignore-mode branch from 822eadf to 0ffacf8 Compare May 24, 2022 11:15
@adrianreber
Copy link
Contributor Author

@kolyshkin Thanks for the test example. I used it and it works correctly for me. Let's see what CI thinks about it.

@kolyshkin kolyshkin added this to the 1.2.0 milestone Jun 9, 2022
@kolyshkin
Copy link
Contributor

fedora CI failure seems to be related, can you PTAL @adrianreber ?

@adrianreber
Copy link
Contributor Author

fedora CI failure seems to be related, can you PTAL @adrianreber ?

I see the following error in CI:

+ disable_cgroup
+ '[' -d /sys/fs/cgroup//runc-cgroups-integration-test ']'
+ rmdir /sys/fs/cgroup//runc-cgroups-integration-test
rmdir: failed to remove '/sys/fs/cgroup//runc-cgroups-integration-test': Device or resource busy
make: *** [Makefile:131: localrootlessintegration] Error 1
make: Leaving directory '/vagrant'
Connection to 192.168.121.237 closed.

With my added test the rootless test seems to fail. Not sure why. I do not see similar problems locally. Strange.

@kolyshkin
Copy link
Contributor

rmdir: failed to remove '/sys/fs/cgroup//runc-cgroups-integration-test': Device or resource busy

This means there's a process or a container left after a previous test run. IOW some container was not removed/killed.

You can add some debug to see what is going on, and run the CI as much as you want.

This adds an already existing CRIU cgroup mode to be selectable from
runc: 'ignore'.

This is a fix for errors seen with Podman containers that have been
restored multiple times into new cgroups. CRIU fails to checkpoint a
container that has already been restored once in a cgroup named
differently than the original container.

CRIU keeps information about the original cgroup and during restore in
a differently named cgroup CRIU still tries to restore to original
cgroup. This leads to a process with wrong cgroup settings and a second
checkpoint of that container fails.

With 'ignore' CRIU will ignore any cgroups and leave the cgroup setting
to runc.

Signed-off-by: Adrian Reber <areber@redhat.com>
@kolyshkin kolyshkin force-pushed the 2022-04-03-add-cgroup-ignore-mode branch from 0ffacf8 to d0818de Compare July 28, 2022 19:08
@kolyshkin kolyshkin mentioned this pull request Jul 29, 2022
@kolyshkin
Copy link
Contributor

@adrianreber the test still fails, and it seems there's a reason for that.

I wrote a small test (you can append this to the end of checkpoint.bats file):

@test "my" {
        set_cgroups_path
        runc run -d --console-socket "$CONSOLE_SOCKET" test_busybox
        [ "$status" -eq 0 ]
        testcontainer test_busybox running
        echo "0 CGROUP_PATH=$CGROUP_PATH"
        ls -ld "$CGROUP_PATH" || true

        # checkpoint the running container
        runc checkpoint --work-path ./work-dir --manage-cgroups-mode ignore test_busybox
        grep -B 5 Error ./work-dir/dump.log || true
        [ "$status" -eq 0 ]
        # after checkpoint busybox is no longer running
        testcontainer test_busybox checkpointed

        echo "1 CGROUP_PATH=$CGROUP_PATH"
        ls -ld "$CGROUP_PATH" || true
        orig_cgroup_path=$CGROUP_PATH

        # restore from checkpoint
        set_cgroups_path
        runc "$@" restore -d --work-path ./work-dir --manage-cgroups-mode ignore --console-socket "$CONSOLE_SOCKET" test_busybox
        grep -B 5 Error ./work-dir/restore.log || true
        [ "$status" -eq 0 ]
        # busybox should be back up and running
        testcontainer test_busybox running

        echo "2 CGROUP_PATH=$CGROUP_PATH"
        ls -ld "$CGROUP_PATH" || true

        echo "3 orig_cgroup_path=$orig_cgroup_path"
        ls -ld "$orig_cgroup_path" || true
        exit 1 # this is for the test to always fails and show logs
}

Next, make sure everything is compiled:

$ make all
go build -trimpath "-buildmode=pie"  -tags "seccomp urfave_cli_no_docs" -ldflags "-X main.gitCommit=v1.1.0-254-g0e8b1e7e-dirty -X main.version=1.1.0+dev " -o runc .
go build -trimpath "-buildmode=pie"  -tags "seccomp urfave_cli_no_docs" -ldflags "-X main.gitCommit=v1.1.0-254-g0e8b1e7e-dirty -X main.version=1.1.0+dev " -o contrib/cmd/recvtty/recvtty ./contrib/cmd/recvtty
go build -trimpath "-buildmode=pie"  -tags "seccomp urfave_cli_no_docs" -ldflags "-X main.gitCommit=v1.1.0-254-g0e8b1e7e-dirty -X main.version=1.1.0+dev " -o contrib/cmd/sd-helper/sd-helper ./contrib/cmd/sd-helper
go build -trimpath "-buildmode=pie"  -tags "seccomp urfave_cli_no_docs" -ldflags "-X main.gitCommit=v1.1.0-254-g0e8b1e7e-dirty -X main.version=1.1.0+dev " -o contrib/cmd/seccompagent/seccompagent ./contrib/cmd/seccompagent

Finally, run the above test:

$  sudo bats -f "my" tests/integration/checkpoint.bats
checkpoint.bats
 ✗ my
   (in test file tests/integration/checkpoint.bats, line 481)
     `exit 1' failed
   runc spec (status=0):
   
   runc run -d --console-socket /tmp/bats-run-vahX7f/runc.3kYwVb/tty/sock test_busybox (status=0):
   
   runc state test_busybox (status=0):
   {
     "ociVersion": "1.0.2-dev",
     "id": "test_busybox",
     "pid": 1194856,
     "status": "running",
     "bundle": "/tmp/bats-run-vahX7f/runc.3kYwVb/bundle",
     "rootfs": "/tmp/bats-run-vahX7f/runc.3kYwVb/bundle/rootfs",
     "created": "2022-07-30T01:38:56.151243561Z",
     "owner": ""
   }
   0 CGROUP_PATH=/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
   drwxr-xr-x. 2 root root 0 Jul 29 18:38 /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
   runc checkpoint --work-path ./work-dir --manage-cgroups-mode ignore test_busybox (status=0):
   
   runc state test_busybox (status=1):
   time="2022-07-29T18:38:56-07:00" level=error msg="container does not exist"
   1 CGROUP_PATH=/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
   ls: cannot access '/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881': No such file or directory
   runc restore -d --work-path ./work-dir --manage-cgroups-mode ignore --console-socket /tmp/bats-run-vahX7f/runc.3kYwVb/tty/sock test_busybox (status=0):
   
   runc state test_busybox (status=0):
   {
     "ociVersion": "1.0.2-dev",
     "id": "test_busybox",
     "pid": 1194928,
     "status": "running",
     "bundle": "/tmp/bats-run-vahX7f/runc.3kYwVb/bundle",
     "rootfs": "/tmp/bats-run-vahX7f/runc.3kYwVb/bundle/rootfs",
     "created": "2022-07-30T01:38:56.551376328Z",
     "owner": ""
   }
   2 CGROUP_PATH=/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-9654
   drwxr-xr-x. 2 root root 0 Jul 29 18:38 /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-9654
   3 orig_cgroup_path=/sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
   drwxr-xr-x. 2 root root 0 Jul 29 18:38 /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup-17881
   rmdir: failed to remove '/sys/fs/cgroup//runc-cgroups-integration-test': Device or resource busy

1 test, 1 failure

As you can see, the old/original cgroup path (the one that ends with 17881) is non-existent after checkpoint, but it reappears upon restore.

I am not sure why, but at least you have easy repro.

echo "OCI_CGROUPS_PATH=$OCI_CGROUPS_PATH"
cat "/proc/$pid/cgroup"
new_cgroup="$OCI_CGROUPS_PATH"
[ "$original_cgroup" != "$new_cgroup" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ughm, so here you basically check that set_cgroups_path changed the value of OCI_CGROUPS_PATH.

Well yes it did.

What should have done is checking that

  • the new cgroup path exists
  • the old cgroup path does not exist
  • the contents of /proc/$pid/cgroup differs from the old one

@kolyshkin
Copy link
Contributor

Here's a high level overview of what is going on in a test from the prev. comment:

  1. We set cgroups path and create a container. Container is running, cgroup path exists.
  2. We checkpoint the container with --manage-cgroups-mode ignore. Container is stopped, cgroup path does not exist.
  3. We change the cgroupsPath in container's config.json.
  4. We restore the container.

Apparently, what happens is

  • runc creates new cgroup as per config.json
  • the container is restored to the old cgroup :-\

@adrianreber
Copy link
Contributor Author

@kolyshkin thanks for looking so closely at this. I probably will not have time to look into this for the next couple of weeks, but I will try to see why it fails then.

@kolyshkin
Copy link
Contributor

@adrianreber I have finally figured it out in #3546.

@kolyshkin
Copy link
Contributor

Superseded by #3546.

@kolyshkin kolyshkin closed this Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants