Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingress Controller randomly referencing deleted services causing 503's #2302

Closed
jeffutter opened this issue Apr 6, 2018 · 4 comments
Closed

Comments

@jeffutter
Copy link

NGINX Ingress controller version:

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-27T00:13:02Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:23:29Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: AWS
  • Install tools: Kops 1.8

I think our setup is important to know to understand the issue. We are using the nginx-ingress to do blue/green deployments. Here are the components involved:

Deployments:
• web-a-blue
• web-b-blue
• web-a-green
• web-b-green

Services:
• web-a-blue-svc
• web-b-blue-svc
• web-a-green-svc
• web-b-green-svc

Ingress:
• master-ingress (serves (a|b).ourdomain.com)
• web-a-blue-ingress (serves a.blue.ourdomain.com)
• web-b-blue-ingress (serves a.blue.ourdomain.com)
• web-a-green-ingress (serves a.green.ourdomain.com)
• web-b-green-ingress (serves b.green.ourdomain.com)

We do blue/green as follows:

• Deploy web-*-blue deployments and web-*-blue-svc
• Update master-ingress to point to web-*-blue-svc

When cutting over:
• Deploy web-*-green deployments and web-*-green-svc
• Update master-ingress to point to web-*-green-svc
• Wait a bit
• Delete web-*-blue deployments and web-*-blue-svc

What happened:

I am occasionally seeing requests be returned as 503 errors. In the logs from the ingress-controller, I am seeing some strangeness around the 503.

Here are the logs from the ingress controller, collected over 24 hours after the last switch from blue -> green:

ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.964185       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"master-ingress", UID:"5738a999-23e8-11e8-ad22-1232ed09fa60", APIVersion:"extensions", ResourceVersion:"5535845", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/master-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.964275       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"master-ingress", UID:"5738a999-23e8-11e8-ad22-1232ed09fa60", APIVersion:"extensions", ResourceVersion:"4217503", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/master-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.964435       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-b-green-ingress", UID:"3d652a60-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6000972", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-b-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z W0405 17:56:06.964503       6 controller.go:668] error obtaining service endpoints: error getting service default/web-b-blue-svc from the cache: service default/web-b-blue-svc was not found
ingress-controller.log 2018-04-05T17:56:06.000Z W0405 17:56:06.964526       6 controller.go:668] error obtaining service endpoints: error getting service default/web-a-blue-svc from the cache: service default/web-a-blue-svc was not found
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.965081       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6000980", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.965376       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001068", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.965422       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001057", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.965685       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001177", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.965757       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001076", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.965926       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001184", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.966058       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001191", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.966177       6 controller.go:171] backend reload required
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.966523       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"master-ingress", UID:"5738a999-23e8-11e8-ad22-1232ed09fa60", APIVersion:"extensions", ResourceVersion:"6006538", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/master-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.966548       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-b-green-ingress", UID:"3d652a60-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001194", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-b-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.966323       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001192", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress

It seems like the nginx-ingress is retaining some reference to web-*-blue-svc even though that resource has been destroyed. Dumping the nginx.conf doesn't show any reference to blue at all, although I may not be dumping it at the correct time to catch it.

I'm also not sure if the 503 is happening because it's actually trying to send it to the backends for the web-*-blue-svcs or if it is because nginx is restarting, and hasn't fully started back up yet.

What you expected to happen: After cutting over which service the master-ingress points to, I wouldn't expect the nginx-ingress to reference web-blue at all, and I would not expect it to get 503's

How to reproduce it (as minimally and precisely as possible): We can reproduce it on our infrastructure by deploying the deployments and services, switching over in the ingress and deleting the old deployments and services.

Anything else we need to know: Please let me know if there is any more information I can gather to help debug.

@aledbf
Copy link
Member

aledbf commented Apr 6, 2018

@jeffutter the behavior you see is expected.

We do blue/green as follows:

• Deploy web-*-blue deployments and web-*-blue-svc
• Update master-ingress to point to web-*-blue-svc

When cutting over:
• Deploy web-*-green deployments and web-*-green-svc
• Update master-ingress to point to web-*-green-svc
• Wait a bit
• Delete web-*-blue deployments and web-*-blue-svc

This is what happens behind the scenes:

  • new deployment web-*-blue-svc

    • ingress controller detects the changes:
      • creates new nginx configursation:
        • adds upstream for the service web-*-blue-svc
        • changes upstream for master-ingress Ingress
      • reloads nginx
  • new deployment web-*-green-svc

    • ingress controller detects the changes:
      • creates new nginx configursation:
        • adds upstream for the service web-*-green-svc
        • removes upstream for the service web-*-blue-svc
        • changes upstream for master-ingress Ingress
      • reloads nginx

The issue here is the change in the upstreams. You are using a different one each time you change from blue to green. What you should do is to use the same service and change the selector. This only changes the upstream servers in nginx.

@jeffutter
Copy link
Author

@aledbf Thanks for the quick response.

I would maybe expect a little blip right at the time of the cut-over from nginx restarting. However, I'm seeing these errors more than 24 hours after cutting over. I don't think that should be expected?

Also, ingress-nginx doesn't call services directly, it gets the addresses of the pods from the service, right? (maybe I misread that somewhere). If that is the case though, the behavior would still be similar if I kept the service, no? It wouldn't have to add/remove the upstreams but it would re-configure them still requiring a restart?

@aledbf
Copy link
Member

aledbf commented Apr 6, 2018

However, I'm seeing these errors more than 24 hours after cutting over. I don't think that should be expected?

No

Also, ingress-nginx doesn't call services directly, it gets the addresses of the pods from the service, right?

Yes but the name of the upstream (nginx section) is related to the k8s service name

If that is the case though, the behavior would still be similar if I kept the service, no?

No if you don't use websockets or keepalive connections. If that's the case this feature avoids nginx reloads.

@aledbf
Copy link
Member

aledbf commented May 3, 2018

Closing. Please update to 0.14.0 and reopen if the issue persist

@aledbf aledbf closed this as completed May 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants