-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generated /tmp/nginx-cfg File Invalid When Creating Many Ingress Objects Simultaneously #6245
Comments
This is somehow related to helm. I had the same issue with helm 3.1.* and I had to go back to helm2. There should be no problem with the latest version of helm2. You could also try to use the latest version of helm version 3, maybe they fixed something. |
Note that in my environment where I normally see this problem I do not use helm. However, using the helm method seemed to result in the same error but with much more ease. Perhaps helm does all the applications in parallel so it's easier to reproduce than applying a file using kubectl? Not sure. I can try to replicate with kubectl and grab one of the tmp files in that case as well. |
I too seem to be having the same issue. I am also using Helm. My helm install seems to have tried to create this Ingress service 3 times in quite succession. The only custom annotation I am using is Each of them failing: client.go:108: [debug] creating 1 resource(s)
|
I'm also seeing this when deploying ~20 ingresses at once with Helm, after a few retries with exactly the same YAML output it eventually all goes through. It seems the nginx ingress does not handle adding lots of applications at once very well at all. We might have to disable the webhook until this issue is resolved. We have also already increased the webhook timeout in our configuration as that was previously throwing errors in this scenario but now it is this instead. |
We are also affected by this issue we see a lot of messages like
and also we see lot of terminated ingress connections, after removing webhook it's back in normal. |
We are running into similar issues here. webhook fails when dealing with many simultaneous ingresses (150+)
the error happens at random times and appears to "end" the file in different locations every time. what is consistent is that when having multiple server definition blocks it finishes all server block, but fails to fully render the server block with the ingresses. this block will generally have a few valid routes with their generated internal auth proxies.. they both seem to lack any data and simply represent the template used to generate all location/auth blocks. i.e.
I'll make sure to dump the output next time it occurs (it's not trivial to reinstate the webhook, hence the pseudocode) |
Hi @AWSmith0216 @bjethwan @GerryWilko @MattJeanes @elmariofredo @benny-bp can you confirm that this issue still exists when using newer versions of ingress-nginx? |
@iamNoah1 we have just disabled the webhook entirely, I would be happy to turn it back on if there has been a fix made for this issue but judging by the fact it's still open with no contributors contributing I would say it is likely still an issue. |
@MattJeanes can you confirm, that the issues still exist with newer version of ingress-nginx? |
Seeing this error on
One of controller pods stops reloading config, log:
I copied nginx.conf from that pod to inspect it manually, it ends with
If we manually kill the pod, new one restarts just fine. |
Yes, race condition on ngx_merge contexts. It stills happens on 0.48.1, mostly on large nginx.conf I produced this one, to isolate testing to single scenarios. U should be able to reproduce this easily, pushing lets say 1000~ changes to ingress in a row.
|
/triage accepted |
Even with a much smaller amount of ingresses the issue still hits occasionally. We have clusters that have under 20 ingresses most of the time, but still occasionally hit the below error. @strongjz this really needs to be looked at and fixed asap.
|
I just ran into the same issue. Sometimes this works, but in many cases it fails:
|
/priority important-soon |
WORKAROUND: remove the Validating Web hook
This was shared by a good friend with me, and now I'm able to deploy ... I hope that works for you too.... |
* fix: fix thread synchronization issue #6245 (#7800) * Add option to sanitize annotation inputs (#7874) * Add option to sanitize annotation inputs * Fix e2e tests after string sanitization * Add proxy_pass and serviceaccount as denied values * Trim spaces from badword items (#7921) * Fix tests from cherrypick Co-authored-by: Jens Reimann <ctron@dentrassi.de>
I faced the similar issue, such covered in this topic During the research I payed attention that /tmp directory is full of *cfg files on affected instances of NGINX After checking load metrics, figured out that k8s nginx_controller pods are running out of memory (but not evicted or crashed, but started with throwing 5xx errors partially) Increasing of RAM helped to solve the original issue P.S. validation webhook is not used in my setup |
NGINX Ingress controller version: 0.35.0
Kubernetes version (use
kubectl version
): v1.18.4Environment: Bare Metal Kubernetes on CentOS 7.6
uname -a
): 3.10.0-957.1.3.el7.x86_64What happened:
We have a K8S manifest that includes 100+ Ingress objects. On rare occasions the application of this manifest will fail with an error such as:
757 2020/08/07 05:41:40 [emerg] 2702#2702: "client_max_body_size" directive is not allowed here in /tmp/nginx-cfg320626064:4 758 nginx: [emerg] "client_max_body_size" directive is not allowed here in /tmp/nginx-cfg320626064:4 759 nginx: configuration file /tmp/nginx-cfg320626064 test failed
What you expected to happen:
The manifest application to succeed, as it does the vast majority of the time.
How to reproduce it:
I found issue 5096, which includes good reproduction steps. Basically create a helm template directory with two files. The structure and contents should look like:
You can then run:
while :; do helm upgrade --install testapp helm/; sleep 1; helm delete testapp || true; done
It should fail pretty frequently; in my case about 25% of the time.
For one of my failures when it was complaining about directive 'xy_set_header', I grabbed the bad tmp config file from the controller. In it I found:
Obviously the xy_set_header is not correct, but this does not appear to be under user control. It seems that ingress-nginx is generating the config file incorrectly when a lot of ingress objects are being created at once.
/kind bug
The text was updated successfully, but these errors were encountered: