Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.1] runc delete: call systemd's reset-failed #3932

Merged
merged 5 commits into from
Jul 12, 2023

Conversation

kolyshkin
Copy link
Contributor

This is a backport of #3888 to release-1.1 branch. Original description follows.


runc delete is supposed to remove all the container's artefacts. In case systemd cgroup driver is used, and the systemd unit has failed (e.g. oom-killed), systemd won't remove the unit (that is, unless the "CollectMode: inactive-or-failed" property is set).

Call reset-failed from manager.Destroy so the failed unit will be removed during "runc delete".

This fixes Issue A from #3780 (which, in its original form, can only be reproduced with RHEL/CentOS 9 systemd version < 252.14, i.e. before they've added redhat-plumbers/systemd-rhel9#149). A test case that works with any recent systemd version is also added.

There is no such thing as linux.resources.memorySwap (the mem+swap is
set as linux.resources.memory.swap).

As it is not used in this test anyway, remove it.

Fixes: 4929c05
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit dacb3aa)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Sometimes we call resetFailedUnit as a cleanup measure, and we don't
care if it fails or not. So, move error reporting to its callers, and
ignore error in cases we don't really expect it to succeed.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 91b4cd2)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
runc delete is supposed to remove all the container's artefacts.
In case systemd cgroup driver is used, and the systemd unit has failed
(e.g. oom-killed), systemd won't remove the unit (that is, unless the
"CollectMode: inactive-or-failed" property is set).

Call reset-failed from manager.Destroy so the failed unit will be
removed during "runc delete".

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 43564a7)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 58a811f)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The passing run (with the fix) looks like this:

----
delete.bats
 ✓ runc delete removes failed systemd unit [4556]
   runc spec (status=0):

   runc run -d --console-socket /tmp/bats-run-B08vu1/runc.lbQwU5/tty/sock test-failed-unit (status=0):

   Warning: The unit file, source configuration file or drop-ins of runc-cgroups-integration-test-12869.scope changed on disk. Run 'systemctl daemon-reload' to reload units.
   × runc-cgroups-integration-test-12869.scope - libcontainer container integration-test-12869
        Loaded: loaded (/run/systemd/transient/runc-cgroups-integration-test-12869.scope; transient)
     Transient: yes
       Drop-In: /run/systemd/transient/runc-cgroups-integration-test-12869.scope.d
                └─50-DevicePolicy.conf, 50-DeviceAllow.conf
        Active: failed (Result: timeout) since Tue 2023-06-13 14:41:38 PDT; 751ms ago
      Duration: 2.144s
           CPU: 8ms

   Jun 13 14:41:34 kir-rhat systemd[1]: Started runc-cgroups-integration-test-12869.scope - libcontainer container integration-test-12869.
   Jun 13 14:41:37 kir-rhat systemd[1]: runc-cgroups-integration-test-12869.scope: Scope reached runtime time limit. Stopping.
   Jun 13 14:41:38 kir-rhat systemd[1]: runc-cgroups-integration-test-12869.scope: Stopping timed out. Killing.
   Jun 13 14:41:38 kir-rhat systemd[1]: runc-cgroups-integration-test-12869.scope: Killing process 1107438 (sleep) with signal SIGKILL.
   Jun 13 14:41:38 kir-rhat systemd[1]: runc-cgroups-integration-test-12869.scope: Failed with result 'timeout'.
   runc delete test-failed-unit (status=0):

   Unit runc-cgroups-integration-test-12869.scope could not be found.
----

Before the fix, the test was failing like this:

----
delete.bats
 ✗ runc delete removes failed systemd unit
   (in test file tests/integration/delete.bats, line 194)
     `run -4 systemctl status "$SD_UNIT_NAME"' failed, expected exit code 4, got 3
  ....
----

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit ad040b1)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin added this to the 1.1.8 milestone Jul 8, 2023
@kolyshkin kolyshkin added area/systemd backport/1.1-pr A backport to 1.1.x release. labels Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/systemd backport/1.1-pr A backport to 1.1.x release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants