Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube-proxy facing locking timeout in large clusters during load test with services enabled #48107

Closed
shyamjvs opened this issue Jun 26, 2017 · 66 comments
Labels
area/kube-proxy kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.

Comments

@shyamjvs
Copy link
Member

Follows from discussion in #48052

We noticed this while performing load test on 4000 node clusters with services enabled. The iptables restore step in the proxier fails with:

E0625 09:03:14.873338       5 proxier.go:1574] Failed to execute iptables-restore: failed to acquire old iptables lock: timed out waiting for the condition

And the reason quite likely is because of "huge" size of iptables (tens of MBs) as we run 30 pods per node and each pod is part of exactly one service
=> 30 * 4000 = 120k service endpoints (and these updates happen on all 4000 nodes)

cc @kubernetes/sig-network-misc @kubernetes/sig-scalability-misc @danwinship @wojtek-t

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. labels Jun 26, 2017
@shyamjvs
Copy link
Member Author

@freehan
Copy link
Contributor

freehan commented Jun 27, 2017

I suspect other components are holding the lock and fail to release.
Is kube-proxy able to have successful iptables-restore after first occurence?

@shyamjvs
Copy link
Member Author

shyamjvs commented Jun 30, 2017

Linking issue #47344 for tracking.

@shyamjvs
Copy link
Member Author

shyamjvs commented Jul 3, 2017

cc @wojtek-t

@freehan
Copy link
Contributor

freehan commented Jul 5, 2017

Are the nodes running on CVM or COS? It looks like it is running on CVM, right?

@freehan
Copy link
Contributor

freehan commented Jul 5, 2017

Is there anyway we can execute lsof on a node in the bad state?

@freehan freehan closed this as completed Jul 5, 2017
@freehan freehan reopened this Jul 5, 2017
@freehan
Copy link
Contributor

freehan commented Jul 5, 2017

It looks like iptables util cannot open a specific linux domain socket. I suspect kube-proxy failed to close it during one run. Not sure why. I will send a PR to expose the error soon.

@shyamjvs
Copy link
Member Author

shyamjvs commented Jul 5, 2017

@freehan The nodes were running on gci (which I guess is the same as cos). Ref: https://github.com/kubernetes/test-infra/blob/master/jobs/ci-kubernetes-e2e-gce-enormous-cluster.env#L33

k8s-github-robot pushed a commit that referenced this issue Jul 8, 2017
Automatic merge from submit-queue (batch tested with PRs 47234, 48410, 48514, 48529, 48348)

expose error lock release failure from iptables util

ref: #48107
@cmluciano
Copy link

May be related #45385.

I tried to find the dashboard where the enormous node test + services-enabled is. Can anyone provide a pointer to the testgrid runs?

@shyamjvs
Copy link
Member Author

The job for gce 5k-node performance test - https://k8s-testgrid.appspot.com/google-gce-scale#gce-scale-performance
It's not automated yet, but I'm manually triggering the job almost everyday (fixing some bugs each time) until the test goes green.
And regarding services, we reduced it to half of the original no. (which was 16400) without changing total no. of pods involved in services (ref #48908), just to experiment.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 1, 2018
@bowei
Copy link
Member

bowei commented Jan 4, 2018

@shyamjvs does this remain an issue?

@wojtek-t
Copy link
Member

wojtek-t commented Jan 5, 2018

I think that it actually still is.
But I will let @shyamjvs answer this (he is on vacation now, will be back on Monday)

@wojtek-t
Copy link
Member

wojtek-t commented Jan 5, 2018

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 5, 2018
@spiffxp
Copy link
Member

spiffxp commented Jan 6, 2018

/remove-lifecycle frozen

@k8s-ci-robot k8s-ci-robot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 6, 2018
@spiffxp
Copy link
Member

spiffxp commented Jan 6, 2018

whoops, I meant
/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 6, 2018
@wojtek-t
Copy link
Member

wojtek-t commented Jul 2, 2018

@danwinship - given all the above, I think the only open question that I still have is:
"what is grabbing iptables lock in our tests and keeps it for 2s+ so that kube-proxy is not able to grab it during some sync rounds".
I've looked into kubelet logs and didn't see any traces, meaning that it doesn't seem to be kubelet. Can this be something in OS?
@danwinship - thoughts?

@danwinship
Copy link
Contributor

docker does iptables stuff internally in some cases, so that might be it. Your network plugin might also be doing stuff as part of pod setup.

@shyamjvs
Copy link
Member Author

I've sent the above PR to experiment with ipvs in our scalability tests (as it seems GA now). The above problem with iptables might just become obsolete in case we're going to make a switch later.

@thockin thockin added the triage/unresolved Indicates an issue that can not or will not be resolved. label Mar 8, 2019
@athenabot
Copy link

@shyamjvs
If this issue has been triaged, please comment /remove-triage unresolved.

If you aren't able to handle this issue, consider unassigning yourself and/or adding the help-wanted label.

🤖 I am a bot run by vllry. 👩‍🔬

@freehan
Copy link
Contributor

freehan commented May 16, 2019

reopen if needed

@freehan freehan closed this as completed May 16, 2019
@wojtek-t
Copy link
Member

Hmm - it still seems to be an issue - it was actually the main reason why we had to revert: #77541

@wojtek-t wojtek-t reopened this May 17, 2019
@athenabot
Copy link

@shyamjvs
If this issue has been triaged, please comment /remove-triage unresolved.

If you aren't able to handle this issue, consider unassigning yourself and/or adding the help-wanted label.

🤖 I am a bot run by vllry. 👩‍🔬

1 similar comment
@athenabot
Copy link

@shyamjvs
If this issue has been triaged, please comment /remove-triage unresolved.

If you aren't able to handle this issue, consider unassigning yourself and/or adding the help-wanted label.

🤖 I am a bot run by vllry. 👩‍🔬

@shyamjvs shyamjvs removed their assignment Jul 11, 2019
@dcbw
Copy link
Member

dcbw commented Jul 11, 2019

@squeed will add a metric for failed iptables calls

@wojtek-t what kernel versions do you have?

@dcbw
Copy link
Member

dcbw commented Jul 11, 2019

/remove-triage unresolved

@k8s-ci-robot k8s-ci-robot removed the triage/unresolved Indicates an issue that can not or will not be resolved. label Jul 11, 2019
@squeed
Copy link
Contributor

squeed commented Jul 11, 2019

Filed #80061 for metrics enhancements.

@wojtek-t
Copy link
Member

@wojtek-t what kernel versions do you have?

4.14. (I can't remember exactly).
Do you have any specific feature on your mind that helps here and was added in particular kernel version?

@vanyans
Copy link

vanyans commented Jan 21, 2020

Hello,

I am seeing this issue in my proxy pods:
E0117 08:12:44.604344 1 proxier.go:1402] Failed to execute iptables-restore: failed to acquire new iptables lock: timed out waiting for the condition
kube-proxy v1.15.4

Any solutions on this? Let me know if you need more info
Thank you!

kube-proxy-fail-log.txt

@aojea
Copy link
Member

aojea commented Jan 22, 2020

are you using the CNI portmap plugin?

@vanyans
Copy link

vanyans commented Jan 23, 2020

Hi. We are using NSX-T CNI.

@aojea
Copy link
Member

aojea commented Jan 23, 2020

I think that there can be multiple reasons, I can share what I did to find a problem with the portmap plugin holding the lock.

I just patched the kubelet 22665fc to log the pid of the process, then I monitored kube-proxy to trigger an script when it finds the error message

Failed to execute iptables-restore: failed to acquire new iptables lock: timed out waiting for the condition

that dumps all the processes so you can check "who" was holding the lock

Hope this can be useful

@vanyans
Copy link

vanyans commented Jan 23, 2020

Thank you! I will try this.
When you say portmap plugin, are you referring to this one: https://github.com/containernetworking/plugins/tree/master/plugins/meta/portmap

@aojea
Copy link
Member

aojea commented Jan 23, 2020

yeah, that one
This is the issue explaining the problem
containernetworking/plugins#418

@dcbw
Copy link
Member

dcbw commented Feb 2, 2023

I think we can close this; multiple performance improvements in the kernel iptables/nftables code, improvements in iptables-nft, and the move to default to the IPVS proxy make iptables lock contention much less of an issue. If we do still see this, we can re-open.

/close

@k8s-ci-robot
Copy link
Contributor

@dcbw: Closing this issue.

In response to this:

I think we can close this; multiple performance improvements in the kernel iptables/nftables code, improvements in iptables-nft, and the move to default to the IPVS proxy make iptables lock contention much less of an issue. If we do still see this, we can re-open.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kube-proxy kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

No branches or pull requests