Kube-proxy facing locking timeout in large clusters during load test with services enabled #48107

shyamjvs · 2017-06-26T21:51:18Z

Follows from discussion in #48052

We noticed this while performing load test on 4000 node clusters with services enabled. The iptables restore step in the proxier fails with:

E0625 09:03:14.873338       5 proxier.go:1574] Failed to execute iptables-restore: failed to acquire old iptables lock: timed out waiting for the condition

And the reason quite likely is because of "huge" size of iptables (tens of MBs) as we run 30 pods per node and each pod is part of exactly one service
=> 30 * 4000 = 120k service endpoints (and these updates happen on all 4000 nodes)

cc @kubernetes/sig-network-misc @kubernetes/sig-scalability-misc @danwinship @wojtek-t

The text was updated successfully, but these errors were encountered:

shyamjvs · 2017-06-26T21:51:56Z

fyi, the logs for the run are here - http://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-enormous-cluster/10/

freehan · 2017-06-27T00:00:44Z

I suspect other components are holding the lock and fail to release.
Is kube-proxy able to have successful iptables-restore after first occurence?

shyamjvs · 2017-06-30T10:33:21Z

@freehan It's failing continuously (https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-enormous-cluster/10/artifacts/e2e-enormous-cluster-minion-group-01nl/kube-proxy.log)

shyamjvs · 2017-06-30T18:28:12Z

Linking issue #47344 for tracking.

shyamjvs · 2017-07-03T15:00:42Z

cc @wojtek-t

freehan · 2017-07-05T20:53:01Z

Are the nodes running on CVM or COS? It looks like it is running on CVM, right?

freehan · 2017-07-05T21:05:05Z

Is there anyway we can execute lsof on a node in the bad state?

freehan · 2017-07-05T21:22:36Z

It looks like iptables util cannot open a specific linux domain socket. I suspect kube-proxy failed to close it during one run. Not sure why. I will send a PR to expose the error soon.

shyamjvs · 2017-07-05T21:26:40Z

@freehan The nodes were running on gci (which I guess is the same as cos). Ref: https://github.com/kubernetes/test-infra/blob/master/jobs/ci-kubernetes-e2e-gce-enormous-cluster.env#L33

Automatic merge from submit-queue (batch tested with PRs 47234, 48410, 48514, 48529, 48348) expose error lock release failure from iptables util ref: #48107

cmluciano · 2017-07-21T19:05:34Z

May be related #45385.

I tried to find the dashboard where the enormous node test + services-enabled is. Can anyone provide a pointer to the testgrid runs?

shyamjvs · 2017-07-21T19:18:25Z

The job for gce 5k-node performance test - https://k8s-testgrid.appspot.com/google-gce-scale#gce-scale-performance
It's not automated yet, but I'm manually triggering the job almost everyday (fixing some bugs each time) until the test goes green.
And regarding services, we reduced it to half of the original no. (which was 16400) without changing total no. of pods involved in services (ref #48908), just to experiment.

fejta-bot · 2018-01-01T09:32:38Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

bowei · 2018-01-04T19:28:13Z

@shyamjvs does this remain an issue?

wojtek-t · 2018-01-05T08:03:25Z

I think that it actually still is.
But I will let @shyamjvs answer this (he is on vacation now, will be back on Monday)

wojtek-t · 2018-01-05T08:03:37Z

/lifecycle frozen

spiffxp · 2018-01-06T00:11:19Z

/remove-lifecycle frozen

spiffxp · 2018-01-06T00:11:46Z

whoops, I meant
/remove-lifecycle stale
/lifecycle frozen

wojtek-t · 2018-07-02T13:53:22Z

@danwinship - given all the above, I think the only open question that I still have is:
"what is grabbing iptables lock in our tests and keeps it for 2s+ so that kube-proxy is not able to grab it during some sync rounds".
I've looked into kubelet logs and didn't see any traces, meaning that it doesn't seem to be kubelet. Can this be something in OS?
@danwinship - thoughts?

danwinship · 2018-07-10T16:12:29Z

docker does iptables stuff internally in some cases, so that might be it. Your network plugin might also be doing stuff as part of pod setup.

shyamjvs · 2018-08-16T10:46:02Z

I've sent the above PR to experiment with ipvs in our scalability tests (as it seems GA now). The above problem with iptables might just become obsolete in case we're going to make a switch later.

athenabot · 2019-05-06T17:59:06Z

@shyamjvs
If this issue has been triaged, please comment /remove-triage unresolved.

If you aren't able to handle this issue, consider unassigning yourself and/or adding the help-wanted label.

🤖 I am a bot run by vllry. 👩‍🔬

freehan · 2019-05-16T21:40:18Z

reopen if needed

wojtek-t · 2019-05-17T05:48:08Z

Hmm - it still seems to be an issue - it was actually the main reason why we had to revert: #77541

athenabot · 2019-06-05T19:51:06Z

@shyamjvs
If this issue has been triaged, please comment /remove-triage unresolved.

If you aren't able to handle this issue, consider unassigning yourself and/or adding the help-wanted label.

🤖 I am a bot run by vllry. 👩‍🔬

athenabot · 2019-07-11T18:51:58Z

@shyamjvs
If this issue has been triaged, please comment /remove-triage unresolved.

If you aren't able to handle this issue, consider unassigning yourself and/or adding the help-wanted label.

🤖 I am a bot run by vllry. 👩‍🔬

dcbw · 2019-07-11T21:17:30Z

@squeed will add a metric for failed iptables calls

@wojtek-t what kernel versions do you have?

dcbw · 2019-07-11T21:17:34Z

/remove-triage unresolved

squeed · 2019-07-11T21:33:01Z

Filed #80061 for metrics enhancements.

wojtek-t · 2019-07-12T07:06:19Z

@wojtek-t what kernel versions do you have?

4.14. (I can't remember exactly).
Do you have any specific feature on your mind that helps here and was added in particular kernel version?

vanyans · 2020-01-21T23:41:49Z

Hello,

I am seeing this issue in my proxy pods:
E0117 08:12:44.604344 1 proxier.go:1402] Failed to execute iptables-restore: failed to acquire new iptables lock: timed out waiting for the condition
kube-proxy v1.15.4

Any solutions on this? Let me know if you need more info
Thank you!

kube-proxy-fail-log.txt

aojea · 2020-01-22T23:12:12Z

are you using the CNI portmap plugin?

vanyans · 2020-01-23T12:49:08Z

Hi. We are using NSX-T CNI.

aojea · 2020-01-23T14:16:18Z

I think that there can be multiple reasons, I can share what I did to find a problem with the portmap plugin holding the lock.

I just patched the kubelet 22665fc to log the pid of the process, then I monitored kube-proxy to trigger an script when it finds the error message

Failed to execute iptables-restore: failed to acquire new iptables lock: timed out waiting for the condition

that dumps all the processes so you can check "who" was holding the lock

Hope this can be useful

vanyans · 2020-01-23T14:37:45Z

Thank you! I will try this.
When you say portmap plugin, are you referring to this one: https://github.com/containernetworking/plugins/tree/master/plugins/meta/portmap

aojea · 2020-01-23T18:27:58Z

yeah, that one
This is the issue explaining the problem
containernetworking/plugins#418

dcbw · 2023-02-02T17:25:15Z

I think we can close this; multiple performance improvements in the kernel iptables/nftables code, improvements in iptables-nft, and the move to default to the IPVS proxy make iptables lock contention much less of an issue. If we do still see this, we can re-open.

/close

k8s-ci-robot · 2023-02-02T17:25:20Z

@dcbw: Closing this issue.

In response to this:

I think we can close this; multiple performance improvements in the kernel iptables/nftables code, improvements in iptables-nft, and the move to default to the IPVS proxy make iptables lock contention much less of an issue. If we do still see this, we can re-open.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. labels Jun 26, 2017

shyamjvs mentioned this issue Jun 26, 2017

Move iptables logging in kubeproxy from Errorf to V(2).Infof #48085

Merged

freehan closed this as completed Jul 5, 2017

freehan reopened this Jul 5, 2017

freehan mentioned this issue Jul 5, 2017

expose error lock release failure from iptables util #48514

Merged

k8s-github-robot pushed a commit that referenced this issue Jul 8, 2017

Merge pull request #48514 from freehan/iptables-lock

22550b6

Automatic merge from submit-queue (batch tested with PRs 47234, 48410, 48514, 48529, 48348) expose error lock release failure from iptables util ref: #48107

dcbw added the area/kube-proxy label Jul 13, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 1, 2018

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 5, 2018

k8s-ci-robot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 6, 2018

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 6, 2018

shyamjvs mentioned this issue Aug 16, 2018

Use IPVS is 5k-node scalability test kubernetes/test-infra#9076

Merged

thockin added the triage/unresolved Indicates an issue that can not or will not be resolved. label Mar 8, 2019

freehan closed this as completed May 16, 2019

wojtek-t reopened this May 17, 2019

shyamjvs removed their assignment Jul 11, 2019

k8s-ci-robot removed the triage/unresolved Indicates an issue that can not or will not be resolved. label Jul 11, 2019

avarf mentioned this issue Feb 10, 2020

Packet loss between pods #87976

Closed

k8s-ci-robot closed this as completed Feb 2, 2023

Kube-proxy facing locking timeout in large clusters during load test with services enabled #48107

Kube-proxy facing locking timeout in large clusters during load test with services enabled #48107

Comments

shyamjvs commented Jun 26, 2017

shyamjvs commented Jun 26, 2017

freehan commented Jun 27, 2017 • edited Loading

shyamjvs commented Jun 30, 2017

shyamjvs commented Jun 30, 2017 • edited Loading

shyamjvs commented Jul 3, 2017

freehan commented Jul 5, 2017 • edited Loading

freehan commented Jul 5, 2017 • edited Loading

freehan commented Jul 5, 2017 • edited Loading

shyamjvs commented Jul 5, 2017 • edited Loading

cmluciano commented Jul 21, 2017

shyamjvs commented Jul 21, 2017

fejta-bot commented Jan 1, 2018

bowei commented Jan 4, 2018

wojtek-t commented Jan 5, 2018

wojtek-t commented Jan 5, 2018

spiffxp commented Jan 6, 2018

spiffxp commented Jan 6, 2018

wojtek-t commented Jul 2, 2018

danwinship commented Jul 10, 2018

shyamjvs commented Aug 16, 2018

athenabot commented May 6, 2019

freehan commented May 16, 2019

wojtek-t commented May 17, 2019

athenabot commented Jun 5, 2019

athenabot commented Jul 11, 2019

dcbw commented Jul 11, 2019

dcbw commented Jul 11, 2019

squeed commented Jul 11, 2019

wojtek-t commented Jul 12, 2019

vanyans commented Jan 21, 2020

aojea commented Jan 22, 2020

vanyans commented Jan 23, 2020

aojea commented Jan 23, 2020 • edited Loading

vanyans commented Jan 23, 2020

aojea commented Jan 23, 2020

dcbw commented Feb 2, 2023

k8s-ci-robot commented Feb 2, 2023

freehan commented Jun 27, 2017 •

edited

Loading

shyamjvs commented Jun 30, 2017 •

edited

Loading

freehan commented Jul 5, 2017 •

edited

Loading

freehan commented Jul 5, 2017 •

edited

Loading

freehan commented Jul 5, 2017 •

edited

Loading

shyamjvs commented Jul 5, 2017 •

edited

Loading

aojea commented Jan 23, 2020 •

edited

Loading