Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible netlink leak on 3.29.1 #9603

Open
imbstack opened this issue Dec 14, 2024 · 0 comments
Open

Possible netlink leak on 3.29.1 #9603

imbstack opened this issue Dec 14, 2024 · 0 comments

Comments

@imbstack
Copy link

We recently updated calico to 3.29.1 on one of our staging clusters and found that after a few hours there was a clear upward trend in the number of file descriptors held by calico-node pods.

image

Checking on a running instance after a couple days, we found that the calico-node -felix process had nearly 6000 file descriptors according to lsof, nearly all of which were like the following:

# lsof -p 1517383 | tail
calico-no 1517383 root 5778u  netlink                 0t0  689245146 ROUTE
calico-no 1517383 root 5779u  netlink                 0t0  688913733 ROUTE
calico-no 1517383 root 5780u  netlink                 0t0  689385180 ROUTE
calico-no 1517383 root 5781u  netlink                 0t0  689391746 ROUTE
calico-no 1517383 root 5782u  netlink                 0t0  689402663 ROUTE
calico-no 1517383 root 5783u  netlink                 0t0  689407738 ROUTE
calico-no 1517383 root 5784u  netlink                 0t0  689296536 ROUTE
calico-no 1517383 root 5785u  netlink                 0t0  689301559 ROUTE
calico-no 1517383 root 5790u  netlink                 0t0  689395559 ROUTE
calico-no 1517383 root 5791u  netlink                 0t0  689400833 ROUTE

Deleting that pod made dropped the fds although the new pod is starting the trend all over again.

Let me know if there is any other debugging data I can provide.

Expected Behavior

A relatively steady state of file descriptors for a calico-node pod.

Current Behavior

A steady increase in open file descriptors.

Possible Solution

Steps to Reproduce (for bugs)

  1. Just deploy calico 3.29.1 afaict

Context

This is ok for now in our staging environment but are worried about going to production this way. It is entirely possible this is due to some weird config on our side but nothing is jumping out at me so far.

Your Environment

  • Calico version: 3.29.1
  • Calico dataplane (iptables, windows etc.): iptables
  • Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes
  • Operating System and version: Linux ip-10-213-23-129 6.8.0-1018-aws #19~22.04.1-Ubuntu SMP Wed Oct 9 16:48:22 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  • Link to your project (optional):
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant