Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent ingress secret synchronization across nodes #2068

Closed
frnckdlprt opened this issue Feb 12, 2018 · 2 comments · Fixed by #2069
Closed

Inconsistent ingress secret synchronization across nodes #2068

frnckdlprt opened this issue Feb 12, 2018 · 2 comments · Fixed by #2069

Comments

@frnckdlprt
Copy link

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.): potential bug

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.): default backend 404, sync, secrets, ingress, tls


Is this a BUG REPORT or FEATURE REQUEST? (choose one):

NGINX Ingress controller version:
v0.10.2

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:28:34Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: 9 worker node cluster
  • OS (e.g. from /etc/os-release): Ubuntu 16.0.3
  • Kernel (e.g. uname -a):
  • Install tools: Ansible, Helm
  • Others:

What happened:

We are randomly getting "default backend 404" depending on which node (out of 9) handles the request, and what point in time. Those occurrences seem much reduced if we give time for a given deployment to "settle in". The nodes responding with 404 typically show in ingress log that "adding secret ... to the local store" was much delayed compared to other nodes for the same deployment.

While trying to troubleshoot this we came accross this line which seems suspicious:
https://github.com/kubernetes/ingress-nginx/blob/nginx-0.10.2/internal/ingress/controller/store/backend_ssl.go#L199
This used to be a "continue" instead of "return", and it is seems odd that this should give up on all remaining ingresses.

What you expected to happen:

No 404 consistently across all the worker nodes, once the application starts responding through at least one node (or within a few seconds)

How to reproduce it (as minimally and precisely as possible):

Deploy ingresses with TLS secrets, observe the time the secret is added to the local store for each ingress controller pod.
See below one example of timing where one node is behind by 40min and another by 90min:

ngress-nginx-ingress-controller-4rrlj.log
219612:I0209 03:14:53.185798       7 backend_ssl.go:68] adding secret mynamespace/mytlssecret to the local store

ingress-nginx-ingress-controller-7pl55.log
218480:I0209 03:10:54.994316       7 backend_ssl.go:68] adding secret mynamespace/mytlssecret to the local store

ingress-nginx-ingress-controller-6j5pp.log
227555:I0209 03:51:59.353796       7 backend_ssl.go:68] adding secret mynamespace/mytlssecret to the local store

ingress-nginx-ingress-controller-bbgr2.log
234364:I0209 04:39:41.330655       7 backend_ssl.go:68] adding secret mynamespace/mytlssecret to the local store

ingress-nginx-ingress-controller-q27kj.log
219909:I0209 03:08:15.045453       7 backend_ssl.go:68] adding secret mynamespace/mytlssecret to the local store

ingress-nginx-ingress-controller-clt27.log
231931:I0209 03:05:43.346984       7 backend_ssl.go:68] adding secret mynamespace/mytlssecret to the local store

ingress-nginx-ingress-controller-pcwf9.log
262087:I0209 03:06:12.650154       7 backend_ssl.go:68] adding secret mynamespace/mytlssecret to the local store

ingress-nginx-ingress-controller-vlvkd.log
204711:I0209 03:13:46.820651       7 backend_ssl.go:68] adding secret mynamespace/mytlssecret to the local store

ingress-nginx-ingress-controller-vzg8r.log
217407:I0209 03:01:32.340037       7 backend_ssl.go:68] adding secret mynamespace/mytlssecret to the local store

Anything else we need to know:

@aledbf
Copy link
Member

aledbf commented Feb 12, 2018

@frnckdlprt thank you for the report. Please use quay.io/aledbf/nginx-ingress-controller:0.324. It contains #2069

@frnckdlprt
Copy link
Author

@aledbf thank you much for your quick response. I could see 2 test runs going ok without 404, and each node showed "adding secret ... to the local store" within 5s of each other. More testing to come but this looks good, thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants