-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prow cluster resource leak #1988
Comments
You should be able to check the cgroups on a host node for if there's actually something leaking, there shouldn't be. I've done fairly extensive testing of that with kind itself locally, but it's possible something else is up in CI. xref: kubernetes-sigs/kind#421 (note that the "memory cgroups leak" is actually just a kernel bug in flushing them after deletion, we'e also seen this just running containers without any nesting etc.) |
Likely root cause 01e1bf6, adds kind to a new repo. This repo does have the |
is this trap in the image or in the repo? where are the image sources? 👀 |
I think the trap doesn't call because of |
yes, that should just be |
https://github.com/istio/installer/blob/fe4079d536f2d1b94eb4a7d377527a102734189b/bin/with-kind.sh#L26 should also be it's separately concerning that we see leaks without this, but that should mitigate. |
On a node that looks good:
On a node that looks bad:
|
On my local machine, there is one folder in (note: I really have no clue what I am doing here, not familiar with docker much so just poking around) |
yes -- there should be one per docker-in-docker container.
So the "good node" is not expected to have any running?
Admittedly I've focused on kind under normal circumstances, in prow.k8s.io we only support docker in docker when used with the common wrapper scripts that have always aggressively cleaned up in exit handlers. It's possible we're hitting this too though, previously I looked for signs of this and we weren't (probably because we do actually clean up). I'm going to see about simulating this on a clean GKE cluster and monitor the node... We should also specifically check the running processes, from the host's side we should be able to see the process tree fine. Leaking cgroups shouldn't actually cost us anything significant, that part in and of itself is relatively fine. Leaking processes running in those groups does though. |
* Clean up docker after job shutdown For istio/test-infra#1988 * Use other docker clean method
@clarketm rebooted all the nodes, things look back to a healthy state for now at least: What I am looking at here is not the spikes, since that just means tests are running constantly, but the flat lines when no tests are running, we should see minimal CPU usage and we do now
It could have a couple if there was an active test running. "Good" here meant it was using low CPU, so it could have leaked just once or something possibly. |
ACK. |
This was fixed and we have some more safeguards here. The one thing we don't have is proactive monitoring, but hopefully the safeguards make that not important |
New guess: it didn't start dec 2nd, there was just no activity until then because of thanksgiving. It starting to leak on monday is just a result of more jobs starting monday. A potential suspect, from 11/26: istio/istio#19156. Based on the graph above it reasonable could have started on 11/26.. It would be nice to run |
I suspect this is the root cause of istio/test-infra#1988 For now, we will revert this and see if things improve
Not sure how buildx works under the hood yet, but highly recommend having a single top level entrypoint wrapper that cleans up everything when doing docker in docker. Skimming buildx now: It looks like buildx does create containers. |
We do have a wrapper script (inspired by the first time this happened), but maybe it is too primitive: https://github.com/istio/tools/pull/471/files. My understanding is that buildx has two modes. One where you have a docker container that runs all the builds and one where you just build them normally(?) not in a container. When I run buildx locally and watch Why do we actually need to mount the cgroup hostPath? I just did it because k8s did it 🙂. It seems like if we could remove that we could prevent this problem completely, but not sure if that is feasible |
I suspect this is the root cause of istio/test-infra#1988 For now, we will revert this and see if things improve
So the cgroups mount actually predates kind and comes from the previous prototype. IIRC there were stability issues without this but it's been a while. Might be worth a shot, but in general privileged workloads need to clean up after themselves. We've not had any issues as far as I can tell by just enforcing all docker in docker go through the single wrapper. As for builds, AIUI any docker building with a |
Didn't mean to close this. we are now doing a full |
docker system prune does not stop running containers.
are we also stopping containers? stopping and removing containers is the
key issue, the other resources are generally namespaced anyhow.
…On Tue, Dec 17, 2019 at 1:30 PM John Howard ***@***.***> wrote:
Didn't mean to close this. we are now doing a full docker system prune -a
and doesn't seem to fully fix it. Its pretty hard to debug this without a
way to correlate the additional cgroups left around to a pod (or more
specifically what job was running)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1988?email_source=notifications&email_token=AAHADK76FDE45RWQHX7PJ2DQZFAHNA5CNFSM4JEXMWRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHEAR3Y#issuecomment-566757615>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHADK5J5AYD3UE74WO527LQZFAHNANCNFSM4JEXMWRA>
.
|
For istio/test-infra#1988 We should kill running docker containers at the end of the job. Pruning doesn't stop running containers.
For istio/test-infra#1988 We should kill running docker containers at the end of the job. Pruning doesn't stop running containers.
Good point.. I added another step to stop all running containers as well. Will see how that works. Interestingly, we also turned on autoscaling on our cluster which "fixes" this issue by killing nodes somewhat frequently. I made a change to make the cgroup mount readonly, and tests seemed to work fine... if we make that read only it seems like we cannot possibly have this issue, since we now can no longer write to any hostPath so everything should be cleaned up properly? Does this seem like a horrible idea for any reason? |
All good since #2243 |
* Clean up docker after job shutdown For istio/test-infra#1988 * Use other docker clean method
For istio/test-infra#1988 We should kill running docker containers at the end of the job. Pruning doesn't stop running containers.
* Clean up docker after job shutdown For istio/test-infra#1988 * Use other docker clean method
For istio/test-infra#1988 We should kill running docker containers at the end of the job. Pruning doesn't stop running containers.
Background: kubernetes-sigs/kind#759
When we started using Kind, we found all of the sudden our tests started flaking often. It turns out this was due to a resource leak by not cleaning up kind clusters at all. I am not 100% what was leaking, but something related to hostPath mount to
/lib/modules
or/sys/fs/cgroup
. Once we added cleanup configuration, things went back to normal.The best reference we had was the "resting CPU" rate of the nodes. In the original case, after a 2 week period was at between 30-90% with no tests running.
Now, months later, the same problem has returned 🙁
This time it it much more subtle though, only 20% max usage after 2 weeks.
My suspicion is that we still have some leaks from kind.
Action items:
Problem seemed to start on October 1st. Was there any changes there?
Can we identify that we have a leak and clean up proactively as a periodic job?
The text was updated successfully, but these errors were encountered: