Prow cluster resource leak #1988

howardjohn · 2019-10-24T17:29:16Z

When we started using Kind, we found all of the sudden our tests started flaking often. It turns out this was due to a resource leak by not cleaning up kind clusters at all. I am not 100% what was leaking, but something related to hostPath mount to /lib/modules or /sys/fs/cgroup. Once we added cleanup configuration, things went back to normal.

The best reference we had was the "resting CPU" rate of the nodes. In the original case, after a 2 week period was at between 30-90% with no tests running.

Now, months later, the same problem has returned 🙁

This time it it much more subtle though, only 20% max usage after 2 weeks.

My suspicion is that we still have some leaks from kind.

Action items:

Investigate stronger cleanup, as recommended by @BenTheElder

The docker in docker runner / wrapper script we use in test-infra / prow.k8s.io also terminates all containers in an exit handler, amongst other things, redundantly to the cluster deletion we do in the kind specific scripts.

Problem seemed to start on October 1st. Was there any changes there?
Can we identify that we have a leak and clean up proactively as a periodic job?

The text was updated successfully, but these errors were encountered:

BenTheElder · 2019-10-24T17:33:19Z

You should be able to check the cgroups on a host node for if there's actually something leaking, there shouldn't be. I've done fairly extensive testing of that with kind itself locally, but it's possible something else is up in CI.

xref: kubernetes-sigs/kind#421 (note that the "memory cgroups leak" is actually just a kernel bug in flushing them after deletion, we'e also seen this just running containers without any nesting etc.)

howardjohn · 2019-10-24T17:45:16Z

Likely root cause 01e1bf6, adds kind to a new repo. This repo does have the trap cleanup stuff but for some reason its not running.. When I run locally it cleans up, but in CI I don't see the cleanup occuring: https://prow.istio.io/view/gcs/istio-prow/pr-logs/pull/istio_installer/457/base_installer/96

BenTheElder · 2019-10-24T17:50:35Z

is this trap in the image or in the repo? where are the image sources? 👀

howardjohn · 2019-10-24T17:59:36Z

I think the trap doesn't call because of exec? The tests run entrypoint bin/with-kind make ...., where entrypoint is https://github.com/istio/tools/blob/9b776975d118941657d63e29084f144613c9598f/docker/build-tools/prow-entrypoint.sh#L38 and with-kind is https://github.com/istio/installer/blob/fe4079d536f2d1b94eb4a7d377527a102734189b/bin/with-kind.sh#L52

BenTheElder · 2019-10-24T18:02:23Z

yes, that should just be "$@", exec will replace the current process.

BenTheElder · 2019-10-24T18:11:59Z

https://github.com/istio/installer/blob/fe4079d536f2d1b94eb4a7d377527a102734189b/bin/with-kind.sh#L26 should also be kind export logs --name istio-testing "${ARTIFACTS}/kind" || true to ensure that we continue to cleanup if export fails for some reason.

it's separately concerning that we see leaks without this, but that should mitigate.

howardjohn · 2019-10-24T20:12:00Z

On a node that looks good:

$ cat /proc/cgroups | tr '\t' ',' | column -t -s,
#subsys_name  hierarchy  num_cgroups  enabled
cpuset        9          209          1
cpu           6          375          1
cpuacct       6          375          1
blkio         7          375          1
memory        12         1195         1
devices       4          368          1
freezer       11         209          1
net_cls       10         209          1
perf_event    3          209          1
net_prio      10         209          1
hugetlb       8          209          1
pids          5          375          1
rdma          2          1            1

On a node that looks bad:

$ cat /proc/cgroups | tr '\t' ',' | column -t -s,
#subsys_name  hierarchy  num_cgroups  enabled
cpuset        8          1026         1
cpu           2          1529         1
cpuacct       2          1529         1
blkio         5          1529         1
memory        11         3398         1
devices       3          1522         1
freezer       10         1026         1
net_cls       6          1026         1
perf_event    4          1026         1
net_prio      6          1026         1
hugetlb       9          1026         1
pids          12         1529         1
rdma          7          1            1

howardjohn · 2019-10-24T20:19:41Z

On my local machine, there is one folder in /sys/fs/cgroup/pids/docker per kind instance. On the bad node there is like 8, and on the good node there is also 2. So seems like we have stuff staying around there by not cleaning up docker properly.

(note: I really have no clue what I am doing here, not familiar with docker much so just poking around)

For istio/test-infra#1988

BenTheElder · 2019-10-24T22:15:53Z

On my local machine, there is one folder in /sys/fs/cgroup/pids/docker per kind instance.

yes -- there should be one per docker-in-docker container.

On the bad node there is like 8, and on the good node there is also 2. So seems like we have stuff staying around there by not cleaning up docker properly.

So the "good node" is not expected to have any running?

So seems like we have stuff staying around there by not cleaning up docker properly.

Admittedly I've focused on kind under normal circumstances, in prow.k8s.io we only support docker in docker when used with the common wrapper scripts that have always aggressively cleaned up in exit handlers.

It's possible we're hitting this too though, previously I looked for signs of this and we weren't (probably because we do actually clean up).

I'm going to see about simulating this on a clean GKE cluster and monitor the node...

We should also specifically check the running processes, from the host's side we should be able to see the process tree fine. Leaking cgroups shouldn't actually cost us anything significant, that part in and of itself is relatively fine. Leaking processes running in those groups does though.

* Clean up docker after job shutdown For istio/test-infra#1988 * Use other docker clean method

howardjohn · 2019-10-25T16:11:33Z

@clarketm rebooted all the nodes, things look back to a healthy state for now at least:

What I am looking at here is not the spikes, since that just means tests are running constantly, but the flat lines when no tests are running, we should see minimal CPU usage and we do now

So the "good node" is not expected to have any running?

It could have a couple if there was an active test running. "Good" here meant it was using low CPU, so it could have leaked just once or something possibly.

BenTheElder · 2019-10-25T16:46:16Z

ACK.
Will look into options to further mitigate, I think for now the best thing is to ensure that for any dind we delete all containers, AFAIK that has worked fine on the Kubernetes Prow, and seems like it's hopefully working here.
Will note this in the kind issue as well.

howardjohn · 2019-11-24T22:44:35Z

This was fixed and we have some more safeguards here. The one thing we don't have is proactive monitoring, but hopefully the safeguards make that not important

howardjohn · 2019-12-05T01:30:51Z

This is back 🙁

Started on Dec 2nd. There we no commits around that time

howardjohn · 2019-12-05T01:39:53Z

New guess: it didn't start dec 2nd, there was just no activity until then because of thanksgiving. It starting to leak on monday is just a result of more jobs starting monday.

A potential suspect, from 11/26: istio/istio#19156. Based on the graph above it reasonable could have started on 11/26.. It would be nice to run ls /sys/fs/cgroup and look how much stuff is there and from when on every node to get a better timeline, but I am not sure a feasible way to do this

I suspect this is the root cause of istio/test-infra#1988 For now, we will revert this and see if things improve

BenTheElder · 2019-12-05T02:48:39Z

Not sure how buildx works under the hood yet, but highly recommend having a single top level entrypoint wrapper that cleans up everything when doing docker in docker.

Skimming buildx now: It looks like buildx does create containers.

howardjohn · 2019-12-05T16:12:55Z

We do have a wrapper script (inspired by the first time this happened), but maybe it is too primitive: https://github.com/istio/tools/pull/471/files.

My understanding is that buildx has two modes. One where you have a docker container that runs all the builds and one where you just build them normally(?) not in a container. When I run buildx locally and watch docker ps nothing ever shows up. I'll try to look more into whats actually going on here.

Why do we actually need to mount the cgroup hostPath? I just did it because k8s did it 🙂. It seems like if we could remove that we could prevent this problem completely, but not sure if that is feasible

I suspect this is the root cause of istio/test-infra#1988 For now, we will revert this and see if things improve

BenTheElder · 2019-12-05T21:17:56Z

So the cgroups mount actually predates kind and comes from the previous prototype. IIRC there were stability issues without this but it's been a while.

Might be worth a shot, but in general privileged workloads need to clean up after themselves.

We've not had any issues as far as I can tell by just enforcing all docker in docker go through the single wrapper.

As for builds, AIUI any docker building with a RUN will need to have intermediate containers.

howardjohn · 2019-12-10T17:02:13Z

My attempt to fix did not work:

howardjohn · 2019-12-17T21:30:29Z

Didn't mean to close this. we are now doing a full docker system prune -a and doesn't seem to fully fix it. Its pretty hard to debug this without a way to correlate the additional cgroups left around to a pod (or more specifically what job was running)

BenTheElder · 2019-12-18T01:04:34Z

docker system prune does not stop running containers. are we also stopping containers? stopping and removing containers is the key issue, the other resources are generally namespaced anyhow.

…

On Tue, Dec 17, 2019 at 1:30 PM John Howard ***@***.***> wrote: Didn't mean to close this. we are now doing a full docker system prune -a and doesn't seem to fully fix it. Its pretty hard to debug this without a way to correlate the additional cgroups left around to a pod (or more specifically what job was running) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1988?email_source=notifications&email_token=AAHADK76FDE45RWQHX7PJ2DQZFAHNA5CNFSM4JEXMWRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHEAR3Y#issuecomment-566757615>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHADK5J5AYD3UE74WO527LQZFAHNANCNFSM4JEXMWRA> .

For istio/test-infra#1988 We should kill running docker containers at the end of the job. Pruning doesn't stop running containers.

howardjohn · 2019-12-18T18:27:52Z

Good point.. I added another step to stop all running containers as well. Will see how that works. Interestingly, we also turned on autoscaling on our cluster which "fixes" this issue by killing nodes somewhat frequently.

I made a change to make the cgroup mount readonly, and tests seemed to work fine... if we make that read only it seems like we cannot possibly have this issue, since we now can no longer write to any hostPath so everything should be cleaned up properly? Does this seem like a horrible idea for any reason?

howardjohn · 2019-12-26T19:26:55Z

All good since #2243

* Clean up docker after job shutdown For istio/test-infra#1988 * Use other docker clean method

For istio/test-infra#1988 We should kill running docker containers at the end of the job. Pruning doesn't stop running containers.

* Clean up docker after job shutdown For istio/test-infra#1988 * Use other docker clean method

For istio/test-infra#1988 We should kill running docker containers at the end of the job. Pruning doesn't stop running containers.

howardjohn self-assigned this Oct 24, 2019

howardjohn mentioned this issue Oct 24, 2019

Fix kind cleanup to prevent leak istio/installer#458

Merged

howardjohn added a commit to howardjohn/tools that referenced this issue Oct 24, 2019

Clean up docker after job shutdown

5a06a4b

For istio/test-infra#1988

howardjohn mentioned this issue Oct 24, 2019

Clean up docker after job shutdown istio/tools#471

Merged

istio-testing pushed a commit to istio/tools that referenced this issue Oct 25, 2019

Clean up docker after job shutdown (#471)

fc8dcb0

* Clean up docker after job shutdown For istio/test-infra#1988 * Use other docker clean method

istio-policy-bot added the lifecycle/needs-triage Indicates a new PR or issue needs to be triaged label Oct 30, 2019

howardjohn closed this as completed Nov 24, 2019

istio-policy-bot removed the lifecycle/needs-triage Indicates a new PR or issue needs to be triaged label Nov 24, 2019

howardjohn reopened this Dec 5, 2019

istio-policy-bot added the lifecycle/needs-triage Indicates a new PR or issue needs to be triaged label Dec 5, 2019

howardjohn added a commit to howardjohn/istio that referenced this issue Dec 5, 2019

Don't run buildx in CI

9b11d18

I suspect this is the root cause of istio/test-infra#1988 For now, we will revert this and see if things improve

howardjohn mentioned this issue Dec 5, 2019

Don't run buildx in CI - Fixes Resource leak in prow cluster that is causing tests to flake istio/istio#19405

Merged

istio-testing pushed a commit to istio/istio that referenced this issue Dec 5, 2019

Don't run buildx in CI (#19405)

c1c5fd8

I suspect this is the root cause of istio/test-infra#1988 For now, we will revert this and see if things improve

howardjohn mentioned this issue Dec 11, 2019

Improve docker entrypoint istio/tools#624

Merged

istio-testing closed this as completed in istio/tools#624 Dec 11, 2019

istio-policy-bot removed the lifecycle/needs-triage Indicates a new PR or issue needs to be triaged label Dec 17, 2019

howardjohn added a commit to howardjohn/tools that referenced this issue Dec 18, 2019

Kill all docker containers as well

0c0723b

For istio/test-infra#1988 We should kill running docker containers at the end of the job. Pruning doesn't stop running containers.

howardjohn reopened this Dec 18, 2019

istio-policy-bot added the lifecycle/needs-triage Indicates a new PR or issue needs to be triaged label Dec 18, 2019

howardjohn mentioned this issue Dec 18, 2019

Kill all docker containers as well istio/tools#639

Merged

istio-testing pushed a commit to istio/tools that referenced this issue Dec 18, 2019

Kill all docker containers as well (#639)

202f6b6

For istio/test-infra#1988 We should kill running docker containers at the end of the job. Pruning doesn't stop running containers.

howardjohn mentioned this issue Dec 20, 2019

Make cgroup mount readonly #2243

Merged

howardjohn closed this as completed Dec 30, 2019

istio-policy-bot removed the lifecycle/needs-triage Indicates a new PR or issue needs to be triaged label Dec 30, 2019

cofyc mentioned this issue Feb 5, 2020

CI fails to start container frequently pingcap/tidb-operator#1603

Closed

howardjohn mentioned this issue Aug 28, 2020

CPU leak in prow #2864

Closed

howardjohn mentioned this issue Feb 28, 2022

Optimize build cluster performance #3890

Open

7 tasks

Shuanglu pushed a commit to Shuanglu/istio-tools that referenced this issue Jun 30, 2022

Clean up docker after job shutdown (istio#471)

e4bea05

* Clean up docker after job shutdown For istio/test-infra#1988 * Use other docker clean method

Shuanglu pushed a commit to Shuanglu/istio-tools that referenced this issue Jun 30, 2022

Kill all docker containers as well (istio#639)

0428ea2

For istio/test-infra#1988 We should kill running docker containers at the end of the job. Pruning doesn't stop running containers.

Shuanglu pushed a commit to Shuanglu/istio-tools that referenced this issue Jul 6, 2022

Clean up docker after job shutdown (istio#471)

237db18

* Clean up docker after job shutdown For istio/test-infra#1988 * Use other docker clean method

Shuanglu pushed a commit to Shuanglu/istio-tools that referenced this issue Jul 6, 2022

Kill all docker containers as well (istio#639)

01caefb

For istio/test-infra#1988 We should kill running docker containers at the end of the job. Pruning doesn't stop running containers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prow cluster resource leak #1988

Prow cluster resource leak #1988

howardjohn commented Oct 24, 2019 •

edited

Loading

BenTheElder commented Oct 24, 2019

howardjohn commented Oct 24, 2019

BenTheElder commented Oct 24, 2019

howardjohn commented Oct 24, 2019

BenTheElder commented Oct 24, 2019 •

edited

Loading

BenTheElder commented Oct 24, 2019

howardjohn commented Oct 24, 2019

howardjohn commented Oct 24, 2019

BenTheElder commented Oct 24, 2019

howardjohn commented Oct 25, 2019

BenTheElder commented Oct 25, 2019

howardjohn commented Nov 24, 2019

howardjohn commented Dec 5, 2019

howardjohn commented Dec 5, 2019

BenTheElder commented Dec 5, 2019 •

edited

Loading

howardjohn commented Dec 5, 2019

BenTheElder commented Dec 5, 2019

howardjohn commented Dec 10, 2019

howardjohn commented Dec 17, 2019

BenTheElder commented Dec 18, 2019 via email

howardjohn commented Dec 18, 2019

howardjohn commented Dec 26, 2019

Prow cluster resource leak #1988

Prow cluster resource leak #1988

Comments

howardjohn commented Oct 24, 2019 • edited Loading

BenTheElder commented Oct 24, 2019

howardjohn commented Oct 24, 2019

BenTheElder commented Oct 24, 2019

howardjohn commented Oct 24, 2019

BenTheElder commented Oct 24, 2019 • edited Loading

BenTheElder commented Oct 24, 2019

howardjohn commented Oct 24, 2019

howardjohn commented Oct 24, 2019

BenTheElder commented Oct 24, 2019

howardjohn commented Oct 25, 2019

BenTheElder commented Oct 25, 2019

howardjohn commented Nov 24, 2019

howardjohn commented Dec 5, 2019

howardjohn commented Dec 5, 2019

BenTheElder commented Dec 5, 2019 • edited Loading

howardjohn commented Dec 5, 2019

BenTheElder commented Dec 5, 2019

howardjohn commented Dec 10, 2019

howardjohn commented Dec 17, 2019

BenTheElder commented Dec 18, 2019 via email

howardjohn commented Dec 18, 2019

howardjohn commented Dec 26, 2019

howardjohn commented Oct 24, 2019 •

edited

Loading

BenTheElder commented Oct 24, 2019 •

edited

Loading

BenTheElder commented Dec 5, 2019 •

edited

Loading