possibly leaking memory cgroups #421

swachter · 2019-03-29T07:39:47Z

I repeatedly created and deleted kind clusters and watched the number of cgroups under /proc/cgroups (see attached file). Here is a short summary. The columns show the initial number of cgroups, and after repeatedly creating or deleting a kind cluster:

subsys_name	initial	created	deleted	created	deleted	created	deleted
cpuset	1	32	2	32	2	26	2
cpu	35	83	36	83	36	77	36
cpuacct	35	83	36	83	36	77	36
blkio	35	83	36	83	36	77	36
memory	62	126	109	168	124	175	138
devices	35	83	36	83	36	77	36
freezer	1	32	2	32	2	26	2
net_cls	1	32	2	32	2	26	2
perf_event	1	32	2	32	2	26	2
net_prio	3	32	2	32	2	26	2
hugetlb	1	32	2	32	2	26	2
pids	40	88	41	88	41	82	41
rdma	1	1	1	1	1	1	1

After each creation / deletion cycle the number of memory cgroups increases whereas the other cgroups stay the same. I am not sure if this is a kind specific problem or if its related to the underlying docker (client and server are version 18.06.1-ce).

cgroups.txt

BenTheElder · 2019-03-29T15:05:22Z

Actually leaking them is not ideal but is not overly suprising, this is definitely the most likely resource to leak. The limited upside is that they shouldn't persist a reboot.

Can we get more system info? Kernel version?

BenTheElder · 2019-03-29T15:11:36Z

There's been kernel issues around this in the past https://bugzilla.kernel.org/show_bug.cgi?id=12464

swachter · 2019-04-01T06:29:09Z

The tests where done in a VirtualBox setup by Vagrant (config.vm.box = "ubuntu/bionic64"). uname -a outputs:

Linux ubuntu-bionic 4.15.0-46-generic #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

On our CI-Environment we also seem to suffered from leaking cgroups. On that environment uname -a yields:

Linux gke-usu-manage-saas-cont-runner-pool2-647369da-0qlw 4.14.91+ #1 SMP Wed Jan 23 21:34:58 PST 2019 x86_64 Intel(R) Xeon(R) CPU @ 2.30GHz GenuineIntel GNU/Linux

aojea · 2019-04-01T07:19:24Z

This is suspiciously similar moby/moby#29638

swachter · 2019-04-01T09:16:39Z

Another similar issues is google/cadvisor#1581. I used a script mentioned in that ticket (inotify_watchers.sh) to output the installed watchers. It seems that after a kind delete cluster all watchers are removed. Therefore I think that the increasing number of cgroups is the root cause because cAdvisor tries to install a proportional number of watchers.

The attached file watchers.txt shows the installed watchers after repeated cluster creations and deletions.

neolit123 · 2019-04-01T11:17:38Z

this also happens using kubeadm directly:
kubeadm init ... && kubeadm reset

docker: 18.06.3 (cg driver = systemd)
kernel: 4.13.0-41-generic

but the amount of leakage is fairly low.
apart from testing a different version of docker and the linux kernel, i don't think there is much we can do here.

swachter · 2019-04-01T14:06:23Z

but the amount of leakage is fairly low.

The amount of leakage increases if some components are installed in the cluster on each create/delete cluster cycle. The original measurements where taken on newly created, empty clusters. With some few components being installed on each cycle the number of leaked memory cgroups increased to ~ 30 for each cycle.

aojea · 2019-04-01T14:21:27Z

The amount of leakage increases if some components are installed in the cluster on each create/delete cluster cycle

If the new components are new containers and assuming we are hitting one of the bugs linked with docker leaking cgroups, it can make sense since the more containers the more leakages

BenTheElder · 2019-04-01T17:51:34Z

has anyone checked moby/moby#29638 (comment) yet?

I will look into this soon but haven't gotten to it yet.

neolit123 · 2019-04-01T18:47:47Z

The amount of leakage increases if some components are installed in the cluster on each create/delete cluster cycle. The original measurements where taken on newly created, empty clusters. With some few components being installed on each cycle the number of leaked memory cgroups increased to ~ 30 for each cycle.

with the cgroup memory control cap in the kernel of 65535 (USHRT_MAX) it will take a while.
but i can see this being a problem in a persistently running setup without reboot.

BenTheElder · 2019-04-01T18:58:45Z

On my machine lscgroup | grep -c memory while creating / deleting clusters so far would suggest that we're not leaking any, perhaps moby/moby#29638 (comment) is correct?

@mlaventure writes:

@BenHall Every directory under the mount point (including the mount point) is considered to be a cgroup. So to get the actual number of cgroups from the FS you would have to run: find /sys/fs/cgroup/memory -type d | wc -l and that should match the number found in /proc/cgroups

It turns out that this is not always the case. I corresponded with a Linux cgroups maintainer (Michal Hocko) recently, who said:

Please note that memcgs are completely removed after the last memory accounted to
them disappears. And that happens lazily on the memory pressure. So it is quite possible that this happens much later than the actual rmdir on the memcg.

So, it's not uncommon for the num_cgroups value in /proc/cgroups to differ from what you might see in lscgroup.

Will investigate more later, off to a meeting 😅

Edit: I am on a newer 4.19.20<snip> kernel though FWIW (with Docker 18.09.3)

BenTheElder · 2019-04-01T21:26:58Z

looking back at #412, is it possible we're hitting inotify limits instead?

I've yet to find anything in your logs that wasn't watch related

Mar 27 18:22:20 kind-control-plane kubelet[952]: E0327 18:22:20.645229     952 raw.go:146] Failed to watch directory "/sys/fs/cgroup/blkio/docker/ebd0b4c8f8840ef15d77d256089b3c79bdfe85ab8152559f5abd5ee5b67c4463/system.slice": inotify_add_watch /sys/fs/cgroup/blkio/docker/ebd0b4c8f8840ef15d77d256089b3c79bdfe85ab8152559f5abd5ee5b67c4463/system.slice: no space left on device

BenTheElder · 2019-04-01T21:27:44Z

google/cadvisor#1581 (comment)

BenTheElder · 2019-04-01T21:47:11Z

The inotify scripts are not working quite right for me.

It's also entirely possible that we don't leak watches or groups at all, and that something else is just using up the limit when you have many clusters etc. The default inotify limits are rather low on many setups (looks like ubuntu is 8192).

EDIT: it's possible to check with cat /proc/sys/fs/inotify/max_user_watches

Every failure in that log appears to be related to setting up an inotify watch.

Based on some testing with lscgroup and on moby/moby#29638 (comment) I don't think we're seeing real cgroup "leaks".

BenTheElder · 2019-04-01T21:51:54Z

I'm also seeing that memory groups in /proc/cgroups do seem to stay higher for some time (unlike lscgroup), but drop later, which I can force early with sync; sudo bash -c 'echo 1 > /proc/sys/vm/drop_caches'

swachter · 2019-04-02T07:33:14Z

EDIT: it's possible to check with cat /proc/sys/fs/inotify/max_user_watches

On the GKE node where the problem occurred max_user_watches is 8192. (On that node the Gitlab-CI runner places jobs/pods that in turn use kind in integration tests.)

I remember that on this node cat /proc/cgroups showed some numbers above 1000. Unfortunately, I did not save the output. Restarting that node helped. Now the output is:

#subsys_name    hierarchy       num_cgroups     enabled
cpuset  3       17      1
cpu     5       125     1
cpuacct 5       125     1
blkio   4       125     1
memory  8       229     1
devices 7       117     1
freezer 10      17      1
net_cls 2       17      1
perf_event      6       17      1
net_prio        2       17      1
hugetlb 11      17      1
pids    12      125     1
rdma    9       1       1

It seems to me that these numbers increase by time. I will check them regularly.

Could it be that the cAdvisor that comes with kind tries to install watches on too many cgroups, in particular on cgroups that do not "belong" to the kind cluster?

swachter · 2019-04-02T08:10:54Z

Based on some testing with lscgroup and on moby/moby#29638 (comment) I don't think we're seeing real cgroup "leaks".

I can confirm that repeated cluster creation / deletion does NOT leak cgroups. The output of lscgroup | grep -c memory is the same after each creation / deletion cycle.

Thank you for looking into this. I think the issue has nothing to do with kind and this issue can be closed.

However, I wonder how the issue can be tracked down further. Unfortunately the lscgroup utility is not installed on GKE nodes and /proc/cgroups is unreliable. Is there a possibility to monitor cgroups on GKE nodes?

BenTheElder · 2019-04-03T17:41:59Z

presumably GKE nodes -> COS? (they could be ubuntu, which is a different story)

on COS I believe the expectation is that everything will run as containers, you can probably docker run an image with the right tools in it and fiddle with the mounts.

BenTheElder · 2019-04-04T02:56:39Z

going to close this for now since it seems to not be cgroups leaking.

we might need to figure out a good pattern to increase inotify watches (possibly a daemonset?), but that's a seperate issue.

…s-sigs#421) * Disable azure cloud routes & fix azure csi drivers upgrade * Clean code * Update upgrad script to cluster-operator pre release * Fix azurefile csi driver upgrade * Remove not working code * Clean code * Scale cloud-controller-manager to 2 replicas * Fix capz * Remove untested code * Fix cloud-controller-manager procedure --------- Co-authored-by: stg <65890694+stg-0@users.noreply.github.com>

BenTheElder self-assigned this Mar 29, 2019

BenTheElder added triage/needs-information Indicates an issue needs more information in order to work on it. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 29, 2019

BenTheElder added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Apr 1, 2019

BenTheElder changed the title ~~leaking memory cgroups~~ possibly leaking memory cgroups Apr 1, 2019

BenTheElder closed this as completed Apr 4, 2019

BenTheElder removed the triage/needs-information Indicates an issue needs more information in order to work on it. label Apr 4, 2019

aojea mentioned this issue Aug 7, 2019

Potential resource leak #759

Closed

BenTheElder mentioned this issue Oct 24, 2019

Prow cluster resource leak istio/test-infra#1988

Closed

3 tasks

cofyc mentioned this issue Feb 5, 2020

CI fails to start container frequently pingcap/tidb-operator#1603

Closed

incertum mentioned this issue Jul 12, 2023

OOM on physical servers falcosecurity/falco#2495

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

possibly leaking memory cgroups #421

possibly leaking memory cgroups #421

swachter commented Mar 29, 2019 •

edited

Loading

BenTheElder commented Mar 29, 2019

BenTheElder commented Mar 29, 2019

swachter commented Apr 1, 2019

aojea commented Apr 1, 2019

swachter commented Apr 1, 2019 •

edited

Loading

neolit123 commented Apr 1, 2019

swachter commented Apr 1, 2019

aojea commented Apr 1, 2019

BenTheElder commented Apr 1, 2019

neolit123 commented Apr 1, 2019

BenTheElder commented Apr 1, 2019 •

edited

Loading

BenTheElder commented Apr 1, 2019

BenTheElder commented Apr 1, 2019

BenTheElder commented Apr 1, 2019 •

edited

Loading

BenTheElder commented Apr 1, 2019 •

edited

Loading

swachter commented Apr 2, 2019

swachter commented Apr 2, 2019

BenTheElder commented Apr 3, 2019

BenTheElder commented Apr 4, 2019

possibly leaking memory cgroups #421

possibly leaking memory cgroups #421

Comments

swachter commented Mar 29, 2019 • edited Loading

BenTheElder commented Mar 29, 2019

BenTheElder commented Mar 29, 2019

swachter commented Apr 1, 2019

aojea commented Apr 1, 2019

swachter commented Apr 1, 2019 • edited Loading

neolit123 commented Apr 1, 2019

swachter commented Apr 1, 2019

aojea commented Apr 1, 2019

BenTheElder commented Apr 1, 2019

neolit123 commented Apr 1, 2019

BenTheElder commented Apr 1, 2019 • edited Loading

BenTheElder commented Apr 1, 2019

BenTheElder commented Apr 1, 2019

BenTheElder commented Apr 1, 2019 • edited Loading

BenTheElder commented Apr 1, 2019 • edited Loading

swachter commented Apr 2, 2019

swachter commented Apr 2, 2019

BenTheElder commented Apr 3, 2019

BenTheElder commented Apr 4, 2019

swachter commented Mar 29, 2019 •

edited

Loading

swachter commented Apr 1, 2019 •

edited

Loading

BenTheElder commented Apr 1, 2019 •

edited

Loading

BenTheElder commented Apr 1, 2019 •

edited

Loading

BenTheElder commented Apr 1, 2019 •

edited

Loading