-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MaxWorkerOpenFiles calculation on high cores nodes #7107
Conversation
Hi @nanorobocop. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
/retest |
error log:
|
I guess that's some flaky test. It failed for me couple of times out of few dozens. |
/retest |
Hm, seems other PRs have same problem https://prow.k8s.io/?repo=kubernetes%2Fingress-nginx&job=pull-ingress-nginx-test |
Tests passed after I disabled parallel test execution 26e267a. |
EDIT man i should've looked at it before posting 🙃 |
Hi @cmluciano, @rikatz, could you please help to review this PR? |
/assign @rikatz |
26e267a
to
7a7d1ee
Compare
7a7d1ee
to
df46d83
Compare
/lgtm If you want this one also to be available in v0.X releases (the stable ones), please cherry pick this PR to branch release-v1beta1 Thanks!! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: justinmchase, nanorobocop, rikatz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Fix MaxWorkerOpenFiles calculation on high cores nodes * Add e2e test for rlimit_nofile * Fix doc for max-worker-open-files
* Drop v1beta1 from ingress nginx (#7156) * Drop v1beta1 from ingress nginx Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix intorstr logic in controller Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * fixing admission Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * more intorstr fixing * correct template rendering Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix e2e tests for v1 api Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix gofmt errors * This is finally working...almost there... Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Re-add removed validation of AdmissionReview * Prepare for v1.0.0-alpha.1 release Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Update changelog and matrix table for v1.0.0-alpha.1 (#7274) Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * add docs for syslog feature (#7219) * Fix link to e2e-tests.md in developer-guide (#7201) * Use ENV expansion for namespace in args (#7146) Update the DaemonSet namespace references to use the `POD_NAMESPACE` environment variable in the same way that the Deployment does. * chart: using Helm builtin capabilities check (#7190) Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> * Update proper default value for HTTP2MaxConcurrentStreams in Docs (#6944) It should be 128 as documented in https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/config/config.go#L780 * Fix MaxWorkerOpenFiles calculation on high cores nodes (#7107) * Fix MaxWorkerOpenFiles calculation on high cores nodes * Add e2e test for rlimit_nofile * Fix doc for max-worker-open-files * ingress/tcp: add additional error logging on failed (#7208) * Add file containing stable release (#7313) * Handle named (non-numeric) ports correctly (#7311) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Updated v1beta1 to v1 as its deprecated (#7308) * remove mercurial from build (#7031) * Retry to download maxmind DB if it fails (#7242) * Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Remove hardcoded value Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Release v1.0.0-alpha.1 * Add changelog for v1.0.0-alpha.2 * controller: ignore non-service backends (#7332) * controller: ignore non-service backends Signed-off-by: Carlos Panato <ctadeu@gmail.com> * update per feedback Signed-off-by: Carlos Panato <ctadeu@gmail.com> * fix: allow scope/tcp/udp configmap namespace to altered (#7161) * Lower webhook timeout for digital ocean (#7319) * Lower webhook timeout for digital ocean * Set Digital Ocean value controller.admissionWebhooks.timeoutSeconds to 29 * update OWNERS and aliases files (#7365) (#7366) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Downgrade Lua modules for s390x (#7355) Downgrade Lua modules to last known working version. * Fix IngressClass logic for newer releases (#7341) * Fix IngressClass logic for newer releases Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Change e2e tests for the new IngressClass presence * Fix chart and admission tests Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix helm chart test Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix reviews * Remove ingressclass code from admission * update tag to v1.0.0-beta.1 * update readme and changelog for v1.0.0-beta.1 * Release v1.0.0-beta.1 - helm and manifests (#7422) * Change the order of annotation just to trigger a new helm release (#7425) * [cherry-pick] Add dev-v1 branch into helm releaser (#7428) * Add dev-v1 branch into helm releaser (#7424) * chore: add link for artifacthub.io/prerelease annotations Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Ricardo Katz <rikatz@users.noreply.github.com> * k8s job ci pipeline for dev-v1 br v1.22.0 (#7453) * k8s job ci pipeline for dev-v1 br v1.22.0 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * k8s job ci pipeline for dev-v1 br v1.21.2 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * remove v1.21.1 version Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * Add controller.watchIngressWithoutClass config option (#7459) Signed-off-by: Akshit Grover <akshit.grover2016@gmail.com> * Release new helm chart with certgen fixed (#7478) * Update go version, modules and remove ioutil * Release new helm chart with certgen fixed * changed appversion, chartversion, TAG, image (#7490) * Fix CI conflict * Fix CI conflict * Fix build.sh from rebase process * Fix controller_test post rebase Co-authored-by: Tianhao Guo <rggth09@gmail.com> Co-authored-by: Ray <61553+rctay@users.noreply.github.com> Co-authored-by: Bill Cassidy <cassid4@gmail.com> Co-authored-by: Jintao Zhang <tao12345666333@163.com> Co-authored-by: Sathish Ramani <rsathishx87@gmail.com> Co-authored-by: Mansur Marvanov <nanorobocop@gmail.com> Co-authored-by: Matt1360 <568198+Matt1360@users.noreply.github.com> Co-authored-by: Carlos Tadeu Panato Junior <ctadeu@gmail.com> Co-authored-by: Kundan Kumar <kundan.kumar@india.nec.com> Co-authored-by: Tom Hayward <thayward@infoblox.com> Co-authored-by: Sergey Shakuto <sshakuto@infoblox.com> Co-authored-by: Tore <tore.lonoy@gmail.com> Co-authored-by: Bouke Versteegh <info@boukeversteegh.nl> Co-authored-by: Shahid <shahid@us.ibm.com> Co-authored-by: James Strong <strong.james.e@gmail.com> Co-authored-by: Long Wu Yuan <longwuyuan@gmail.com> Co-authored-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Neha Lohia <nehapithadiya444@gmail.com> Co-authored-by: Akshit Grover <akshit.grover2016@gmail.com>
* Drop v1beta1 from ingress nginx (kubernetes#7156) * Drop v1beta1 from ingress nginx Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix intorstr logic in controller Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * fixing admission Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * more intorstr fixing * correct template rendering Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix e2e tests for v1 api Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix gofmt errors * This is finally working...almost there... Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Re-add removed validation of AdmissionReview * Prepare for v1.0.0-alpha.1 release Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Update changelog and matrix table for v1.0.0-alpha.1 (kubernetes#7274) Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * add docs for syslog feature (kubernetes#7219) * Fix link to e2e-tests.md in developer-guide (kubernetes#7201) * Use ENV expansion for namespace in args (kubernetes#7146) Update the DaemonSet namespace references to use the `POD_NAMESPACE` environment variable in the same way that the Deployment does. * chart: using Helm builtin capabilities check (kubernetes#7190) Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> * Update proper default value for HTTP2MaxConcurrentStreams in Docs (kubernetes#6944) It should be 128 as documented in https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/config/config.go#L780 * Fix MaxWorkerOpenFiles calculation on high cores nodes (kubernetes#7107) * Fix MaxWorkerOpenFiles calculation on high cores nodes * Add e2e test for rlimit_nofile * Fix doc for max-worker-open-files * ingress/tcp: add additional error logging on failed (kubernetes#7208) * Add file containing stable release (kubernetes#7313) * Handle named (non-numeric) ports correctly (kubernetes#7311) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Updated v1beta1 to v1 as its deprecated (kubernetes#7308) * remove mercurial from build (kubernetes#7031) * Retry to download maxmind DB if it fails (kubernetes#7242) * Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Remove hardcoded value Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Release v1.0.0-alpha.1 * Add changelog for v1.0.0-alpha.2 * controller: ignore non-service backends (kubernetes#7332) * controller: ignore non-service backends Signed-off-by: Carlos Panato <ctadeu@gmail.com> * update per feedback Signed-off-by: Carlos Panato <ctadeu@gmail.com> * fix: allow scope/tcp/udp configmap namespace to altered (kubernetes#7161) * Lower webhook timeout for digital ocean (kubernetes#7319) * Lower webhook timeout for digital ocean * Set Digital Ocean value controller.admissionWebhooks.timeoutSeconds to 29 * update OWNERS and aliases files (kubernetes#7365) (kubernetes#7366) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Downgrade Lua modules for s390x (kubernetes#7355) Downgrade Lua modules to last known working version. * Fix IngressClass logic for newer releases (kubernetes#7341) * Fix IngressClass logic for newer releases Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Change e2e tests for the new IngressClass presence * Fix chart and admission tests Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix helm chart test Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix reviews * Remove ingressclass code from admission * update tag to v1.0.0-beta.1 * update readme and changelog for v1.0.0-beta.1 * Release v1.0.0-beta.1 - helm and manifests (kubernetes#7422) * Change the order of annotation just to trigger a new helm release (kubernetes#7425) * [cherry-pick] Add dev-v1 branch into helm releaser (kubernetes#7428) * Add dev-v1 branch into helm releaser (kubernetes#7424) * chore: add link for artifacthub.io/prerelease annotations Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Ricardo Katz <rikatz@users.noreply.github.com> * k8s job ci pipeline for dev-v1 br v1.22.0 (kubernetes#7453) * k8s job ci pipeline for dev-v1 br v1.22.0 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * k8s job ci pipeline for dev-v1 br v1.21.2 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * remove v1.21.1 version Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * Add controller.watchIngressWithoutClass config option (kubernetes#7459) Signed-off-by: Akshit Grover <akshit.grover2016@gmail.com> * Release new helm chart with certgen fixed (kubernetes#7478) * Update go version, modules and remove ioutil * Release new helm chart with certgen fixed * changed appversion, chartversion, TAG, image (kubernetes#7490) * Fix CI conflict * Fix CI conflict * Fix build.sh from rebase process * Fix controller_test post rebase Co-authored-by: Tianhao Guo <rggth09@gmail.com> Co-authored-by: Ray <61553+rctay@users.noreply.github.com> Co-authored-by: Bill Cassidy <cassid4@gmail.com> Co-authored-by: Jintao Zhang <tao12345666333@163.com> Co-authored-by: Sathish Ramani <rsathishx87@gmail.com> Co-authored-by: Mansur Marvanov <nanorobocop@gmail.com> Co-authored-by: Matt1360 <568198+Matt1360@users.noreply.github.com> Co-authored-by: Carlos Tadeu Panato Junior <ctadeu@gmail.com> Co-authored-by: Kundan Kumar <kundan.kumar@india.nec.com> Co-authored-by: Tom Hayward <thayward@infoblox.com> Co-authored-by: Sergey Shakuto <sshakuto@infoblox.com> Co-authored-by: Tore <tore.lonoy@gmail.com> Co-authored-by: Bouke Versteegh <info@boukeversteegh.nl> Co-authored-by: Shahid <shahid@us.ibm.com> Co-authored-by: James Strong <strong.james.e@gmail.com> Co-authored-by: Long Wu Yuan <longwuyuan@gmail.com> Co-authored-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Neha Lohia <nehapithadiya444@gmail.com> Co-authored-by: Akshit Grover <akshit.grover2016@gmail.com>
Re-open of my stale PR #5627.
Added tests and fixed doc.
What this PR does / why we need it:
Default MaxWorkerOpenFiles (
max-worker-open-files
) value becomes too low on high spec nodes with multiple of cores.As workaround it always possible to provide exact value through configMap: https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#max-worker-open-files
Currently max opened files is calculated based on
RLIMIT_NOFILE
divided on amount of working processes.ingress-nginx/internal/ingress/controller/nginx.go
Lines 539 to 544 in 07b70f6
Working processes is calculated based on amount of cores.
ingress-nginx/internal/ingress/controller/config/config.go
Line 751 in bef2efc
Our example:
As a result, we have
65536 / 40 - 1024 = 614.4
. This is less than1024
, so1024
is actually used.This value is too small for single worker on heavy loaded server with thousands total connections.
Since RLIMIT_NOFILE is already a value per single process, there's no need divide by worker_processes.
Types of changes
Which issue/s this PR fixes
maxOpenFiles
as 64000 #2055 - similar problemHow Has This Been Tested?
E2e test
Checklist: