[nginx ingress controller] Requests lost while re-deploying #1086

micheleorsi · 2016-05-27T12:44:24Z

We are trying to define a configuration where no requests are lost while re-deploying of nginx-ingress-controller.

So we have this configuration

F5 in front of our bare-metal nodes
Deployment resource that we apply with this configuration

  replicas: 1
  minReadySeconds: 15
  strategy:
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 100
...

      containers:
      - image: <our-image-forked-from-the-gcr.io/google_containers/nginx-ingress-controller:0.62-with-some-logs>
        name: nginx-ingress-ctrl
        imagePullPolicy: Always
        livenessProbe:
          httpGet:
            path: /healthz
            port: 10249
            scheme: HTTP
          initialDelaySeconds: 20
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command:
              - cat
              - /tmp/ready
          initialDelaySeconds: 15
          timeoutSeconds: 5
          periodSeconds: 5
          successThreshold: 2
          failureThreshold: 1
        lifecycle:
          preStop:
            exec:
              # SIGTERM triggers a quick exit; gracefully terminate instead
              command: ["rm","/tmp/ready"]
              command: sleep 10
              command: nginx - q quit
          postStart:
            exec:
              command:
                - "touch"
                - "/tmp/ready"

So the idea is to be really strict on the readiness probe in order that as soon as the preStop is called, we take out that machine from "balancing".

The readinessProbe check works: once the pod receives the preStop hook I can see that it goes to "Ready: false".
But the problem is that the nginx ingress controller pod continues to handle requests. So that when kubernetes shut down it, a number of requests return errors. (to be precise here is the error: Request 'standard' failed: java.io.IOException: Remotely closed)

This is some logging:

I0527 11:39:38.648945       1 main.go:186] Received SIGTERM, shutting down
I0527 11:39:38.648990       1 controller.go:905] updating 2 Ingress rule/s
I0527 11:39:38.651683       1 controller.go:918] Updating loadbalancer service2-production/service2-ingress. Removing IP 10.100.0.104
I0527 11:39:38.716403       1 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"service2-production", Name:"service2-ingress", UID:"c93eeab0-21cf-11e6-b056-8a5128ea7197", APIVersion:"extensions", ResourceVersion:"9347650", FieldPath:""}): type: 'Normal' reason: 'UPDATE' service2-production/service2-ingress
I0527 11:39:38.716485       1 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"service2-production", Name:"service2-ingress", UID:"c93eeab0-21cf-11e6-b056-8a5128ea7197", APIVersion:"extensions", ResourceVersion:"9347419", FieldPath:""}): type: 'Normal' reason: 'DELETE' ip: 10.100.0.104
I0527 11:39:38.718317       1 controller.go:918] Updating loadbalancer service1-production/service1-ingress. Removing IP 10.100.0.104
I0527 11:39:38.719995       1 controller.go:379] not contained: [{10.100.0.105 } {10.100.0.103 } {10.100.0.102 } {10.100.0.1 } {10.100.0.107 }]
I0527 11:39:38.720038       1 controller.go:380] Updating loadbalancer service2-production/service2-ingress with IP 10.100.0.104
I0527 11:39:38.804214       1 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"service1-production", Name:"service1-ingress", UID:"d09f3143-20d4-11e6-b056-8a5128ea7197", APIVersion:"extensions", ResourceVersion:"9347653", FieldPath:""}): type: 'Normal' reason: 'UPDATE' service1-production/service1-ingress
I0527 11:39:38.804687       1 controller.go:890] shutting down controller queues
I0527 11:39:38.804722       1 controller.go:951] shutting down NGINX loadbalancer controller
I0527 11:39:38.804738       1 main.go:144] Handled quit, awaiting pod deletion
I0527 11:39:38.804756       1 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"service1-production", Name:"service1-ingress", UID:"d09f3143-20d4-11e6-b056-8a5128ea7197", APIVersion:"extensions", ResourceVersion:"9347422", FieldPath:""}): type: 'Normal' reason: 'DELETE' ip: 10.100.0.104
I0527 11:39:38.820874       1 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"service2-production", Name:"service2-ingress", UID:"c93eeab0-21cf-11e6-b056-8a5128ea7197", APIVersion:"extensions", ResourceVersion:"9347650", FieldPath:""}): type: 'Normal' reason: 'CREATE' ip: 10.100.0.104
I0527 11:39:38.822353       1 controller.go:379] not contained: [{10.100.0.105 } {10.100.0.103 } {10.100.0.102 } {10.100.0.1 } {10.100.0.107 }]
I0527 11:39:38.822374       1 controller.go:380] Updating loadbalancer service1-production/service1-ingress with IP 10.100.0.104
I0527 11:39:38.916628       1 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"service1-production", Name:"service1-ingress", UID:"d09f3143-20d4-11e6-b056-8a5128ea7197", APIVersion:"extensions", ResourceVersion:"9347653", FieldPath:""}): type: 'Normal' reason: 'CREATE' ip: 10.100.0.104

The strange thing is (in my opinion) that it recreates IP after it deletes it.

The ingress doesn't reflect the expected situation but it still has a lot of IP assigned for that specific rule.

I tried also to put the nginx -s quit in the preStop, but it is even worse because it continues to handle requests and then the nginx disappears.

The text was updated successfully, but these errors were encountered:

aledbf · 2016-05-27T12:54:41Z

that it recreates IP after it deletes it.

@micheleorsi this bug was fixed in #1057 (no new image published yet)

failed: java.io.IOException: Remotely closed

This could not be an error. NGINX configuration enables keepalived so that could be the cause of this error. You can disable keepalived using a custom configuration.

micheleorsi · 2016-05-27T13:02:20Z

Ah ok, I will try again to rebuild the image on our registry!

Thanks so much @aledbf!

micheleorsi · 2016-05-27T16:03:06Z

I checked and actually with #1057 the IP are removed from the ingress ..
Now the problems seem to be on the F5 side that continues to see a port open and continue to redirect traffic.

I will try to work with the keepalive ..

micheleorsi closed this as completed May 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[nginx ingress controller] Requests lost while re-deploying #1086

[nginx ingress controller] Requests lost while re-deploying #1086

micheleorsi commented May 27, 2016 •

edited

Loading

aledbf commented May 27, 2016 •

edited

Loading

micheleorsi commented May 27, 2016

micheleorsi commented May 27, 2016

[nginx ingress controller] Requests lost while re-deploying #1086

[nginx ingress controller] Requests lost while re-deploying #1086

Comments

micheleorsi commented May 27, 2016 • edited Loading

aledbf commented May 27, 2016 • edited Loading

micheleorsi commented May 27, 2016

micheleorsi commented May 27, 2016

micheleorsi commented May 27, 2016 •

edited

Loading

aledbf commented May 27, 2016 •

edited

Loading