executor: stop joining executor to container cgroup #6839

notnoop · 2019-12-11T16:45:28Z

Stop joining libcontainer executor process into the newly created task
container cgroup, to ensure that the cgroups are fully destroyed on
shutdown, and to make it consistent with other plugin processes.

Previously, executor process is added to the container cgroup so the
executor process resources get aggregated along with user processes in
our metric aggregation.

However, adding executor process to container cgroup adds some
complications without much benefits:

First, it complicates cleanup. We must ensure that the executor is
removed from container cgroup on shutdown. Though, we had a bug where
we missed removing it from the systemd cgroup. Because executor uses
containerState.CgroupPaths on launch, which includes systemd, but
cgroups.GetAllSubsystems which doesn't.

Second, it may have adverse side-effects. When a user process is cpu
bound or uses too much memory, executor should remain functioning
without risk of being killed (by OOM killer) or throttled.

Third, it is inconsistent with other drivers and plugins. Logmon and
DockerLogger processes aren't in the task cgroups. Neither are
containerd processes, though it is equivalent to executor in
responsibility.

Fourth, in my experience when executor process moves cgroup while it's
running, the cgroup aggregation is odd. The cgroup
memory.usage_in_bytes doesn't seem to capture the full memory usage of
the executor process and becomes a red-harring when investigating memory
issues.

For all the reasons above, I opted to have executor remain in nomad
agent cgroup and we can revisit this when we have a better story for
plugin process cgroup management.

Fixes #6823 .

I've added a test to capture the problem above - it's failing in https://circleci.com/gh/hashicorp/nomad/25824

Stop joining libcontainer executor process into the newly created task container cgroup, to ensure that the cgroups are fully destroyed on shutdown, and to make it consistent with other plugin processes. Previously, executor process is added to the container cgroup so the executor process resources get aggregated along with user processes in our metric aggregation. However, adding executor process to container cgroup adds some complications with much benefits: First, it complicates cleanup. We must ensure that the executor is removed from container cgroup on shutdown. Though, we had a bug where we missed removing it from the systemd cgroup. Because executor uses `containerState.CgroupPaths` on launch, which includes systemd, but `cgroups.GetAllSubsystems` which doesn't. Second, it may have advese side-effects. When a user process is cpu bound or uses too much memory, executor should remain functioning without risk of being killed (by OOM killer) or throttled. Third, it is inconsistent with other drivers and plugins. Logmon and DockerLogger processes aren't in the task cgroups. Neither are containerd processes, though it is equivalent to executor in responsibility. Fourth, in my experience when executor process moves cgroup while it's running, the cgroup aggregation is odd. The cgroup `memory.usage_in_bytes` doesn't seem to capture the full memory usage of the executor process and becomes a red-harring when investigating memory issues. For all the reasons above, I opted to have executor remain in nomad agent cgroup and we can revisit this when we have a better story for plugin process cgroup management.

schmichael

Looks great.

I think executor being in the cgroups was vestigial from before we used libcontainer and executor had to enter the cgroups before fork/execing the task process. Since we're leaving that double forking up to libcontainer I think you outline the compelling reasons to remove the executor from cgroups.

Could you add a test that asserts the executor process is not cgrouped? The added test only appears to assert the behavior of the task's cgroups.

Otherwise LGTM.

notnoop · 2019-12-13T14:11:43Z

Could you add a test that asserts the executor process is not cgrouped? The added test only appears to assert the behavior of the task's cgroups.

I've toyed with this but ultimately didn't like any of the approaches, and we can follow up with a test after the PR is merged. The issue is that to test for the negative (lack of change), we must start a long running task, sleep for enough time and check self cgroup processes; otherwise, we risk test succeeding because timing effects rather than because the cgroup didn't move. Also, for test to fail, a developer need to explicitly move the task cgroup by making a series of method calls, rather than accidentally or implicitly. As such, I believe adding such a test will slow test suite without helping us protect against future regression. Open for suggestions? I suspect a comment or a general exec driver design doc would suffice.

schmichael · 2019-12-13T19:50:14Z

Ah that does sound tricky @notnoop. I think I've written a test before that has a task do touch /tmp/$ALLOCID && sleep infinity and then polls for /tmp/$ALLOCID to exist. At that point you can peek at the executor process's cgroups and know that in the old code it would have cgroups at the point of the task process running. Unfortunately tricky, but that form of test seems nice to be able to assert any number of conditions about the runtime properties of tasks.

github-actions · 2023-01-23T02:15:50Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Mahmood Ali added 3 commits December 11, 2019 11:12

drivers/exec: test all cgroups are destroyed

2f4b9da

simplify cgroup path lookup

f794b49

notnoop force-pushed the b-cgroup-cleanup branch from c1ff6ec to f794b49 Compare December 11, 2019 17:44

notnoop mentioned this pull request Dec 11, 2019

executor: move out of all of container cgroups #6840

Closed

notnoop requested review from nickethier and schmichael December 11, 2019 19:58

notnoop self-assigned this Dec 11, 2019

notnoop added this to Triaged in Nomad - Community Issues Triage via automation Dec 11, 2019

notnoop moved this from Triaged to In Review in Nomad - Community Issues Triage Dec 11, 2019

notnoop added this to the 0.10.3 milestone Dec 11, 2019

schmichael approved these changes Dec 11, 2019

View reviewed changes

notnoop merged commit 93694f8 into master Dec 13, 2019

Nomad - Community Issues Triage automation moved this from In Review to Done Dec 13, 2019

notnoop deleted the b-cgroup-cleanup branch December 13, 2019 14:05

fho mentioned this pull request Dec 13, 2019

Nomad task got OOM killed when it was using only ~70% of it's MemoryMB limit #4495

Closed

notnoop mentioned this pull request Dec 13, 2019

Nomad exec driver leaks cgroups, causing host system running out of memory #6823

Closed

github-actions bot locked as resolved and limited conversation to collaborators Jan 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executor: stop joining executor to container cgroup #6839

executor: stop joining executor to container cgroup #6839

notnoop commented Dec 11, 2019 •

edited

Loading

schmichael left a comment

notnoop commented Dec 13, 2019

schmichael commented Dec 13, 2019

github-actions bot commented Jan 23, 2023

executor: stop joining executor to container cgroup #6839

executor: stop joining executor to container cgroup #6839

Conversation

notnoop commented Dec 11, 2019 • edited Loading

schmichael left a comment

Choose a reason for hiding this comment

notnoop commented Dec 13, 2019

schmichael commented Dec 13, 2019

github-actions bot commented Jan 23, 2023

notnoop commented Dec 11, 2019 •

edited

Loading