Ingress Controller randomly referencing deleted services causing 503's #2302

jeffutter · 2018-04-06T20:28:55Z

NGINX Ingress controller version:

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-27T00:13:02Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:23:29Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: AWS
Install tools: Kops 1.8

I think our setup is important to know to understand the issue. We are using the nginx-ingress to do blue/green deployments. Here are the components involved:

Deployments:
• web-a-blue
• web-b-blue
• web-a-green
• web-b-green

Services:
• web-a-blue-svc
• web-b-blue-svc
• web-a-green-svc
• web-b-green-svc

Ingress:
• master-ingress (serves (a|b).ourdomain.com)
• web-a-blue-ingress (serves a.blue.ourdomain.com)
• web-b-blue-ingress (serves a.blue.ourdomain.com)
• web-a-green-ingress (serves a.green.ourdomain.com)
• web-b-green-ingress (serves b.green.ourdomain.com)

We do blue/green as follows:

• Deploy web-*-blue deployments and web-*-blue-svc
• Update master-ingress to point to web-*-blue-svc

When cutting over:
• Deploy web-*-green deployments and web-*-green-svc
• Update master-ingress to point to web-*-green-svc
• Wait a bit
• Delete web-*-blue deployments and web-*-blue-svc

What happened:

I am occasionally seeing requests be returned as 503 errors. In the logs from the ingress-controller, I am seeing some strangeness around the 503.

Here are the logs from the ingress controller, collected over 24 hours after the last switch from blue -> green:

ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.964185       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"master-ingress", UID:"5738a999-23e8-11e8-ad22-1232ed09fa60", APIVersion:"extensions", ResourceVersion:"5535845", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/master-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.964275       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"master-ingress", UID:"5738a999-23e8-11e8-ad22-1232ed09fa60", APIVersion:"extensions", ResourceVersion:"4217503", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/master-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.964435       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-b-green-ingress", UID:"3d652a60-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6000972", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-b-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z W0405 17:56:06.964503       6 controller.go:668] error obtaining service endpoints: error getting service default/web-b-blue-svc from the cache: service default/web-b-blue-svc was not found
ingress-controller.log 2018-04-05T17:56:06.000Z W0405 17:56:06.964526       6 controller.go:668] error obtaining service endpoints: error getting service default/web-a-blue-svc from the cache: service default/web-a-blue-svc was not found
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.965081       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6000980", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.965376       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001068", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.965422       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001057", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.965685       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001177", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.965757       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001076", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.965926       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001184", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.966058       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001191", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.966177       6 controller.go:171] backend reload required
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.966523       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"master-ingress", UID:"5738a999-23e8-11e8-ad22-1232ed09fa60", APIVersion:"extensions", ResourceVersion:"6006538", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/master-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.966548       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-b-green-ingress", UID:"3d652a60-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001194", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-b-green-ingress
ingress-controller.log 2018-04-05T17:56:06.000Z I0405 17:56:06.966323       6 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"web-a-green-ingress", UID:"3d6d2fe9-3846-11e8-981d-12f0cfed4292", APIVersion:"extensions", ResourceVersion:"6001192", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/web-a-green-ingress

It seems like the nginx-ingress is retaining some reference to web-*-blue-svc even though that resource has been destroyed. Dumping the nginx.conf doesn't show any reference to blue at all, although I may not be dumping it at the correct time to catch it.

I'm also not sure if the 503 is happening because it's actually trying to send it to the backends for the web-*-blue-svcs or if it is because nginx is restarting, and hasn't fully started back up yet.

What you expected to happen: After cutting over which service the master-ingress points to, I wouldn't expect the nginx-ingress to reference web-blue at all, and I would not expect it to get 503's

How to reproduce it (as minimally and precisely as possible): We can reproduce it on our infrastructure by deploying the deployments and services, switching over in the ingress and deleting the old deployments and services.

Anything else we need to know: Please let me know if there is any more information I can gather to help debug.

The text was updated successfully, but these errors were encountered:

aledbf · 2018-04-06T21:49:04Z

@jeffutter the behavior you see is expected.

We do blue/green as follows:

• Deploy web-*-blue deployments and web-*-blue-svc
• Update master-ingress to point to web-*-blue-svc

When cutting over:
• Deploy web-*-green deployments and web-*-green-svc
• Update master-ingress to point to web-*-green-svc
• Wait a bit
• Delete web-*-blue deployments and web-*-blue-svc

This is what happens behind the scenes:

new deployment web-*-blue-svc
- ingress controller detects the changes:
  - creates new nginx configursation:
    - adds upstream for the service web-*-blue-svc
    - changes upstream for master-ingress Ingress
  - reloads nginx
new deployment web-*-green-svc
- ingress controller detects the changes:
  - creates new nginx configursation:
    - adds upstream for the service web-*-green-svc
    - removes upstream for the service web-*-blue-svc
    - changes upstream for master-ingress Ingress
  - reloads nginx

The issue here is the change in the upstreams. You are using a different one each time you change from blue to green. What you should do is to use the same service and change the selector. This only changes the upstream servers in nginx.

jeffutter · 2018-04-06T21:59:27Z

@aledbf Thanks for the quick response.

I would maybe expect a little blip right at the time of the cut-over from nginx restarting. However, I'm seeing these errors more than 24 hours after cutting over. I don't think that should be expected?

Also, ingress-nginx doesn't call services directly, it gets the addresses of the pods from the service, right? (maybe I misread that somewhere). If that is the case though, the behavior would still be similar if I kept the service, no? It wouldn't have to add/remove the upstreams but it would re-configure them still requiring a restart?

aledbf · 2018-04-06T22:05:01Z

However, I'm seeing these errors more than 24 hours after cutting over. I don't think that should be expected?

No

Also, ingress-nginx doesn't call services directly, it gets the addresses of the pods from the service, right?

Yes but the name of the upstream (nginx section) is related to the k8s service name

If that is the case though, the behavior would still be similar if I kept the service, no?

No if you don't use websockets or keepalive connections. If that's the case this feature avoids nginx reloads.

aledbf · 2018-05-03T21:27:03Z

Closing. Please update to 0.14.0 and reopen if the issue persist

aledbf closed this as completed May 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingress Controller randomly referencing deleted services causing 503's #2302

Ingress Controller randomly referencing deleted services causing 503's #2302

jeffutter commented Apr 6, 2018

aledbf commented Apr 6, 2018

jeffutter commented Apr 6, 2018

aledbf commented Apr 6, 2018

aledbf commented May 3, 2018

Ingress Controller randomly referencing deleted services causing 503's #2302

Ingress Controller randomly referencing deleted services causing 503's #2302

Comments

jeffutter commented Apr 6, 2018

aledbf commented Apr 6, 2018

jeffutter commented Apr 6, 2018

aledbf commented Apr 6, 2018

aledbf commented May 3, 2018