Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated /tmp/nginx-cfg File Invalid When Creating Many Ingress Objects Simultaneously #6245

Closed
AWSmith0216 opened this issue Sep 29, 2020 · 18 comments · Fixed by #7800
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@AWSmith0216
Copy link

NGINX Ingress controller version: 0.35.0

Kubernetes version (use kubectl version): v1.18.4

Environment: Bare Metal Kubernetes on CentOS 7.6

  • Cloud provider or hardware configuration: Bare metal Kubernetes
  • OS (e.g. from /etc/os-release): CentOS 7.6
  • Kernel (e.g. uname -a): 3.10.0-957.1.3.el7.x86_64
  • Install tools:
  • Others: Helm v3.1.2

What happened:

We have a K8S manifest that includes 100+ Ingress objects. On rare occasions the application of this manifest will fail with an error such as:

757 2020/08/07 05:41:40 [emerg] 2702#2702: "client_max_body_size" directive is not allowed here in /tmp/nginx-cfg320626064:4 758 nginx: [emerg] "client_max_body_size" directive is not allowed here in /tmp/nginx-cfg320626064:4 759 nginx: configuration file /tmp/nginx-cfg320626064 test failed

What you expected to happen:

The manifest application to succeed, as it does the vast majority of the time.

How to reproduce it:
I found issue 5096, which includes good reproduction steps. Basically create a helm template directory with two files. The structure and contents should look like:

# tree helm
helm
|-- Chart.yaml
 -- templates
     -- ingress.yaml

1 directory, 2 files

# cat helm/Chart.yaml
name: testapp1
version: 0.1.0

# cat helm/templates/ingress.yaml
{{- range $i, $e := until 50 }}
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test-ingress-{{$i}}
  labels:
    app: test
    release: test
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
    nginx.ingress.kubernetes.io/rewrite-target: /$1
    nginx.ingress.kubernetes.io/proxy-body-size: "150m"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "7200"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "7200"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "7200"
spec:
  tls:
    - hosts:
      - test.cluster.domain.com
      secretName: ssl
  rules:
  - host: test.cluster.domain.com
    http:
      paths:
      - path: /bp-test/api-{{$i}}
        backend:
          serviceName: test-svc-{{$i}}
          servicePort: 443
---
{{end}}

You can then run:

while :; do helm upgrade --install testapp helm/; sleep 1; helm delete testapp || true; done

It should fail pretty frequently; in my case about 25% of the time.

For one of my failures when it was complaining about directive 'xy_set_header', I grabbed the bad tmp config file from the controller. In it I found:

client_max_body_size                    2048m;
xy_set_header X-Forwarded-For           $remote_addr;
proxy_set_header X-Forwarded-Proto      $pass_access_scheme;

Obviously the xy_set_header is not correct, but this does not appear to be under user control. It seems that ingress-nginx is generating the config file incorrectly when a lot of ingress objects are being created at once.

/kind bug

@AWSmith0216 AWSmith0216 added the kind/bug Categorizes issue or PR as related to a bug. label Sep 29, 2020
@egsysoev
Copy link

This is somehow related to helm. I had the same issue with helm 3.1.* and I had to go back to helm2. There should be no problem with the latest version of helm2. You could also try to use the latest version of helm version 3, maybe they fixed something.

@AWSmith0216
Copy link
Author

Note that in my environment where I normally see this problem I do not use helm. However, using the helm method seemed to result in the same error but with much more ease. Perhaps helm does all the applications in parallel so it's easier to reproduce than applying a file using kubectl? Not sure. I can try to replicate with kubectl and grab one of the tmp files in that case as well.

@bjethwan
Copy link

image

nginx ingress controller : 0.30.0

@GerryWilko
Copy link

GerryWilko commented Jan 11, 2021

I too seem to be having the same issue. I am also using Helm. My helm install seems to have tried to create this Ingress service 3 times in quite succession.

The only custom annotation I am using is nginx.ingress.kubernetes.io/proxy-body-size: 20m. If I create the resources manually then subsequent Helm upgrades seem to work fine. I am using v0.35.0.

Each of them failing:

client.go:108: [debug] creating 1 resource(s)
client.go:108: [debug] creating 7 resource(s)
Release "wintrix-develop" does not exist. Installing it now.
Error: admission webhook "validate.nginx.ingress.kubernetes.io" denied the request:

Error: exit status 1
2021/01/11 11:56:19 [emerg] 359#359: "proxy_set_header" directive is not allowed here in /tmp/nginx-cfg330571149:2795
nginx: [emerg] "proxy_set_header" directive is not allowed here in /tmp/nginx-cfg330571149:2795
nginx: configuration file /tmp/nginx-cfg330571149 test failed


helm.go:84: [debug] admission webhook "validate.nginx.ingress.kubernetes.io" denied the request:

Error: exit status 1
2021/01/11 11:56:19 [emerg] 359#359: "proxy_set_header" directive is not allowed here in /tmp/nginx-cfg330571149:2795
nginx: [emerg] "proxy_set_header" directive is not allowed here in /tmp/nginx-cfg330571149:2795
nginx: configuration file /tmp/nginx-cfg330571149 test failed


##[error]history.go:52: [debug] getting history for release wintrix-develop
install.go:159: [debug] Original chart version: "10603"
install.go:176: [debug] CHART PATH: /home/buildadmin/.cache/helm/repository/wintrix-10603.tgz

client.go:108: [debug] creating 1 resource(s)
client.go:108: [debug] creating 7 resource(s)
Error: admission webhook "validate.nginx.ingress.kubernetes.io" denied the request:

Error: exit status 1
2021/01/11 11:56:19 [emerg] 359#359: "proxy_set_header" directive is not allowed here in /tmp/nginx-cfg330571149:2795
nginx: [emerg] "proxy_set_header" directive is not allowed here in /tmp/nginx-cfg330571149:2795
nginx: configuration file /tmp/nginx-cfg330571149 test failed


helm.go:84: [debug] admission webhook "validate.nginx.ingress.kubernetes.io" denied the request:

Error: exit status 1
2021/01/11 11:56:19 [emerg] 359#359: "proxy_set_header" directive is not allowed here in /tmp/nginx-cfg330571149:2795
nginx: [emerg] "proxy_set_header" directive is not allowed here in /tmp/nginx-cfg330571149:2795
nginx: configuration file /tmp/nginx-cfg330571149 test failed


@MattJeanes
Copy link

I'm also seeing this when deploying ~20 ingresses at once with Helm, after a few retries with exactly the same YAML output it eventually all goes through. It seems the nginx ingress does not handle adding lots of applications at once very well at all. We might have to disable the webhook until this issue is resolved. We have also already increased the webhook timeout in our configuration as that was previously throwing errors in this scenario but now it is this instead.

@elmariofredo
Copy link
Contributor

We are also affected by this issue we see a lot of messages like

nginx: [emerg] unexpected end of file, expecting "}" in /etc/nginx/nginx.conf:166057

and also we see lot of terminated ingress connections, after removing webhook it's back in normal.

@benny-bp
Copy link

benny-bp commented May 10, 2021

We are running into similar issues here. webhook fails when dealing with many simultaneous ingresses (150+)

nginx: [emerg] unexpected end of file, expecting "}" in /etc/nginx/nginx.conf:8860

nginx: [emerg] unexpected end of file, expecting "}" in /etc/nginx/nginx.conf:1022

the error happens at random times and appears to "end" the file in different locations every time.

what is consistent is that when having multiple server definition blocks it finishes all server block, but fails to fully render the server block with the ingresses.

this block will generally have a few valid routes with their generated internal auth proxies..
but at some point it will generate a abnormal auth proxy location and an abnormal location

they both seem to lack any data and simply represent the template used to generate all location/auth blocks.

i.e.

location ~ /_somename-auth-randomhash12345abcde1234abcd123-Prefix {
  internal;
  proxy_pass http://upstream
}

location ~ /_someoname-auth-hash-Prefix {
  auth /somename;
  proxy_pass http://upstream
}

## breaks here ##

location ~ /_-auth-234abcd123-Prefix {
  internal;
  proxy_pass http://upstream
}

location ~ / {
  proxy_pass http://upstream
}

I'll make sure to dump the output next time it occurs (it's not trivial to reinstate the webhook, hence the pseudocode)

@iamNoah1
Copy link
Contributor

Hi @AWSmith0216 @bjethwan @GerryWilko @MattJeanes @elmariofredo @benny-bp can you confirm that this issue still exists when using newer versions of ingress-nginx?

@MattJeanes
Copy link

@iamNoah1 we have just disabled the webhook entirely, I would be happy to turn it back on if there has been a fix made for this issue but judging by the fact it's still open with no contributors contributing I would say it is likely still an issue.

@iamNoah1
Copy link
Contributor

@MattJeanes can you confirm, that the issues still exist with newer version of ingress-nginx?

@semenovroman
Copy link

Seeing this error on

app.kubernetes.io/version: 0.44.0
helm.sh/chart: ingress-nginx-3.24.0
image: k8s.gcr.io/ingress-nginx/controller:v0.44.0@sha256:c84fb8549d7c24c62e36531acf1fd9d8d7d494cd8508429d1159b7becd4b1889
bash-5.1$ nginx -v
nginx version: nginx/1.19.6

One of controller pods stops reloading config, log:

[emerg] 24354#24354: unexpected end of file, expecting ";" or "}" in /etc/nginx/nginx.conf:141702
nginx: [emerg] unexpected end of file, expecting ";" or "}" in /etc/nginx/nginx.conf:141702
E0624 20:20:02.050612       6 queue.go:130] "requeuing" err="exit status 1
[emerg] 24354#24354: unexpected end of file, expecting \";\" or \"}\" in /etc/nginx/nginx.conf:141702
nginx: [emerg] unexpected end of file, expecting \";\" or \"}\" in /etc/nginx/nginx.conf:141702" key="$REDACTED"
[emerg] 24354#24354: unexpected end of file, expecting ";" or "}" in /etc/nginx/nginx.conf:141702
nginx: [emerg] unexpected end of file, expecting ";" or "}" in /etc/nginx/nginx.conf:141702

I copied nginx.conf from that pod to inspect it manually, it ends with

            proxy_set_header Proxy                  "";

            # Custom headers to proxied server


            proxy_connect_timeout                   5s;
            proxy_send_timeout                      3600s;
            proxy_read_timeout                      3600s;

            proxy_buffering                         off;
            proxy_buffer_size                       4k;
            proxy_buffers                           4 4k;

            proxy_max_temp_

If we manually kill the pod, new one restarts just fine.
We started seeing this behavior on different clusters after enabling validating admission webhook (the one that comes with helm chart). We changed pod labels and left it running in case we can provide any additional information.

@fblgit
Copy link
Contributor

fblgit commented Aug 21, 2021

Yes, race condition on ngx_merge contexts. It stills happens on 0.48.1, mostly on large nginx.conf
@iamNoah1 I can confirm this stills existing. But it only happens on large nginx.conf the thing is that people who has so many ingresses, unfortunately has to always disable the admission controller coz its too slow.. nginx -t 165MB nginx.conf with 2000 ingresses, would take 20s~ between test/merges/read/renders,etc..

I produced this one, to isolate testing to single scenarios.
#7514

U should be able to reproduce this easily, pushing lets say 1000~ changes to ingress in a row.

nginx: [emerg] unexpected "e" in /etc/nginx/nginx.conf:945
E0820 19:51:13.691486       7 queue.go:130] "requeuing" err="exit status 1\n2021/08/20 19:51:13 [warn] 23792#23792: the \"http2_max_field_size\" directive is obsolete, use the \"large_client_header_buffers\" directive instead in /etc/nginx/nginx.conf:143\nnginx: [warn] the \"http2_max_field_size\" directive is obsolete, use the \"large_client_header_buffers\" directive instead in /etc/nginx/nginx.conf:143\n2021/08/20 19:51:13 [warn] 23792#23792: the \"http2_max_header_size\" directive is obsolete, use the \"large_client_header_buffers\" directive instead in /etc/nginx/nginx.conf:144\nnginx: [warn] the \"http2_max_header_size\" directive is obsolete, use the \"large_client_header_buffers\" directive instead in /etc/nginx/nginx.conf:144\n2021/08/20 19:51:13 [warn] 23792#23792: the \"http2_max_requests\" directive is obsolete, use the \"keepalive_requests\" directive instead in /etc/nginx/nginx.conf:145\nnginx: [warn] the \"http2_max_requests\" directive is obsolete, use the \"keepalive_requests\" directive instead in /etc/nginx/nginx.conf:145\n2021/08/20 19:51:13 [emerg] 23792#23792: unexpected \"e\" in /etc/nginx/nginx.conf:945\nnginx: [emerg] unexpected \"e\" in /etc/nginx/nginx.conf:945\n" key="test/example-ingress-531"
I0820 19:51:13.691547       7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress", Name:"ingress-ingress-nginx-controller-664c88c54f-2b9cn", UID:"fe3d7c4d-40c3-48d9-9341-e3ac38b1c7b7", APIVersion:"v1", ResourceVersion:"11555", FieldPath:""}): type: 'Warning' reason: 'RELOAD' Error reloading NGINX: exit status 1
2021/08/20 19:51:13 [warn] 23792#23792: the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:143
nginx: [warn] the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:143
2021/08/20 19:51:13 [warn] 23792#23792: the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:144
nginx: [warn] the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:144
2021/08/20 19:51:13 [warn] 23792#23792: the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /etc/nginx/nginx.conf:145
nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /etc/nginx/nginx.conf:145
2021/08/20 19:51:13 [emerg] 23792#23792: unexpected "e" in /etc/nginx/nginx.conf:945
nginx: [emerg] unexpected "e" in /etc/nginx/nginx.conf:945
E0820 19:15:36.444483       7 controller.go:160] Unexpected failure reloading the backend:
exit status 1
2021/08/20 19:15:36 [warn] 28671#28671: the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:185
nginx: [warn] the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:185
2021/08/20 19:15:36 [warn] 28671#28671: the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:186
nginx: [warn] the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:186
2021/08/20 19:15:36 [warn] 28671#28671: the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /etc/nginx/nginx.conf:187
nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /etc/nginx/nginx.conf:187
2021/08/20 19:15:36 [emerg] 28671#28671: unknown directive "test"" in /etc/nginx/nginx.conf:23795
nginx: [emerg] unknown directive "test"" in /etc/nginx/nginx.conf:23795

@strongjz
Copy link
Member

/triage accepted
/priority backlog

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Sep 14, 2021
@alex-vmw
Copy link

Even with a much smaller amount of ingresses the issue still hits occasionally. We have clusters that have under 20 ingresses most of the time, but still occasionally hit the below error. @strongjz this really needs to be looked at and fixed asap.

nginx: [emerg] unexpected end of file, expecting ";" or "}" in /etc/nginx/nginx.conf:21609

@ctron
Copy link
Contributor

ctron commented Oct 8, 2021

I just ran into the same issue. Sometimes this works, but in many cases it fails:

 Error: admission webhook "validate.nginx.ingress.kubernetes.io" denied the request: 
-------------------------------------------------------------------------------
Error: exit status 1
2021/10/08 12:54:54 [warn] 107#107: the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx-cfg2025906398:185
nginx: [warn] the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx-cfg2025906398:185
2021/10/08 12:54:54 [warn] 107#107: the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx-cfg2025906398:186
nginx: [warn] the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx-cfg2025906398:186
2021/10/08 12:54:54 [warn] 107#107: the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx-cfg2025906398:187
nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx-cfg2025906398:187
2021/10/08 12:54:54 [emerg] 107#107: unexpected end of file, expecting ";" or "}" in /tmp/nginx-cfg2025906398:858
nginx: [emerg] unexpected end of file, expecting ";" or "}" in /tmp/nginx-cfg2025906398:858
nginx: configuration file /tmp/nginx-cfg2025906398 test failed

@strongjz
Copy link
Member

/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 12, 2021
@rikatz rikatz added this to the v1.1.0 milestone Oct 12, 2021
ctron added a commit to ctron/ingress-nginx that referenced this issue Oct 13, 2021
@FelixDiazScope
Copy link

WORKAROUND: remove the Validating Web hook

  1. list the webhooks
    kubectl get ValidatingWebhookConfiguration

  2. review the properties
    kubectl describe ValidatingWebhookConfiguration <>

  3. delete the webhook
    kubectl delete -A ValidatingWebhookConfiguration <>

This was shared by a good friend with me, and now I'm able to deploy ... I hope that works for you too....

rikatz pushed a commit to rikatz/ingress-nginx that referenced this issue Nov 16, 2021
k8s-ci-robot pushed a commit that referenced this issue Nov 16, 2021
* fix: fix thread synchronization issue #6245 (#7800)

* Add option to sanitize annotation inputs (#7874)

* Add option to sanitize annotation inputs

* Fix e2e tests after string sanitization

* Add proxy_pass and serviceaccount as denied values

* Trim spaces from badword items (#7921)

* Fix tests from cherrypick

Co-authored-by: Jens Reimann <ctron@dentrassi.de>
@berezinsn
Copy link

WORKAROUND: remove the Validating Web hook

  1. list the webhooks
    kubectl get ValidatingWebhookConfiguration
  2. review the properties
    kubectl describe ValidatingWebhookConfiguration <>
  3. delete the webhook
    kubectl delete -A ValidatingWebhookConfiguration <>

This was shared by a good friend with me, and now I'm able to deploy ... I hope that works for you too....

I faced the similar issue, such covered in this topic
Similar logs regarding the obsolete directives.

During the research I payed attention that /tmp directory is full of *cfg files on affected instances of NGINX
I suppose it happens in case of NGINX config RELOAD error

After checking load metrics, figured out that k8s nginx_controller pods are running out of memory (but not evicted or crashed, but started with throwing 5xx errors partially)

Increasing of RAM helped to solve the original issue

P.S. validation webhook is not used in my setup

rchshld pushed a commit to joomcode/ingress-nginx that referenced this issue May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.