Skip to content
This repository has been archived by the owner on Apr 17, 2019. It is now read-only.

[nginx ingress controller] Requests lost while re-deploying #1086

Closed
micheleorsi opened this issue May 27, 2016 · 3 comments
Closed

[nginx ingress controller] Requests lost while re-deploying #1086

micheleorsi opened this issue May 27, 2016 · 3 comments

Comments

@micheleorsi
Copy link

micheleorsi commented May 27, 2016

We are trying to define a configuration where no requests are lost while re-deploying of nginx-ingress-controller.

So we have this configuration

  • F5 in front of our bare-metal nodes
  • Deployment resource that we apply with this configuration
  replicas: 1
  minReadySeconds: 15
  strategy:
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 100
...

      containers:
      - image: <our-image-forked-from-the-gcr.io/google_containers/nginx-ingress-controller:0.62-with-some-logs>
        name: nginx-ingress-ctrl
        imagePullPolicy: Always
        livenessProbe:
          httpGet:
            path: /healthz
            port: 10249
            scheme: HTTP
          initialDelaySeconds: 20
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command:
              - cat
              - /tmp/ready
          initialDelaySeconds: 15
          timeoutSeconds: 5
          periodSeconds: 5
          successThreshold: 2
          failureThreshold: 1
        lifecycle:
          preStop:
            exec:
              # SIGTERM triggers a quick exit; gracefully terminate instead
              command: ["rm","/tmp/ready"]
              command: sleep 10
              command: nginx - q quit
          postStart:
            exec:
              command:
                - "touch"
                - "/tmp/ready"

So the idea is to be really strict on the readiness probe in order that as soon as the preStop is called, we take out that machine from "balancing".

The readinessProbe check works: once the pod receives the preStop hook I can see that it goes to "Ready: false".
But the problem is that the nginx ingress controller pod continues to handle requests. So that when kubernetes shut down it, a number of requests return errors. (to be precise here is the error: Request 'standard' failed: java.io.IOException: Remotely closed)

This is some logging:

I0527 11:39:38.648945       1 main.go:186] Received SIGTERM, shutting down
I0527 11:39:38.648990       1 controller.go:905] updating 2 Ingress rule/s
I0527 11:39:38.651683       1 controller.go:918] Updating loadbalancer service2-production/service2-ingress. Removing IP 10.100.0.104
I0527 11:39:38.716403       1 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"service2-production", Name:"service2-ingress", UID:"c93eeab0-21cf-11e6-b056-8a5128ea7197", APIVersion:"extensions", ResourceVersion:"9347650", FieldPath:""}): type: 'Normal' reason: 'UPDATE' service2-production/service2-ingress
I0527 11:39:38.716485       1 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"service2-production", Name:"service2-ingress", UID:"c93eeab0-21cf-11e6-b056-8a5128ea7197", APIVersion:"extensions", ResourceVersion:"9347419", FieldPath:""}): type: 'Normal' reason: 'DELETE' ip: 10.100.0.104
I0527 11:39:38.718317       1 controller.go:918] Updating loadbalancer service1-production/service1-ingress. Removing IP 10.100.0.104
I0527 11:39:38.719995       1 controller.go:379] not contained: [{10.100.0.105 } {10.100.0.103 } {10.100.0.102 } {10.100.0.1 } {10.100.0.107 }]
I0527 11:39:38.720038       1 controller.go:380] Updating loadbalancer service2-production/service2-ingress with IP 10.100.0.104
I0527 11:39:38.804214       1 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"service1-production", Name:"service1-ingress", UID:"d09f3143-20d4-11e6-b056-8a5128ea7197", APIVersion:"extensions", ResourceVersion:"9347653", FieldPath:""}): type: 'Normal' reason: 'UPDATE' service1-production/service1-ingress
I0527 11:39:38.804687       1 controller.go:890] shutting down controller queues
I0527 11:39:38.804722       1 controller.go:951] shutting down NGINX loadbalancer controller
I0527 11:39:38.804738       1 main.go:144] Handled quit, awaiting pod deletion
I0527 11:39:38.804756       1 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"service1-production", Name:"service1-ingress", UID:"d09f3143-20d4-11e6-b056-8a5128ea7197", APIVersion:"extensions", ResourceVersion:"9347422", FieldPath:""}): type: 'Normal' reason: 'DELETE' ip: 10.100.0.104
I0527 11:39:38.820874       1 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"service2-production", Name:"service2-ingress", UID:"c93eeab0-21cf-11e6-b056-8a5128ea7197", APIVersion:"extensions", ResourceVersion:"9347650", FieldPath:""}): type: 'Normal' reason: 'CREATE' ip: 10.100.0.104
I0527 11:39:38.822353       1 controller.go:379] not contained: [{10.100.0.105 } {10.100.0.103 } {10.100.0.102 } {10.100.0.1 } {10.100.0.107 }]
I0527 11:39:38.822374       1 controller.go:380] Updating loadbalancer service1-production/service1-ingress with IP 10.100.0.104
I0527 11:39:38.916628       1 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"service1-production", Name:"service1-ingress", UID:"d09f3143-20d4-11e6-b056-8a5128ea7197", APIVersion:"extensions", ResourceVersion:"9347653", FieldPath:""}): type: 'Normal' reason: 'CREATE' ip: 10.100.0.104

The strange thing is (in my opinion) that it recreates IP after it deletes it.

The ingress doesn't reflect the expected situation but it still has a lot of IP assigned for that specific rule.

I tried also to put the nginx -s quit in the preStop, but it is even worse because it continues to handle requests and then the nginx disappears.

@aledbf
Copy link
Contributor

aledbf commented May 27, 2016

that it recreates IP after it deletes it.

@micheleorsi this bug was fixed in #1057 (no new image published yet)

failed: java.io.IOException: Remotely closed

This could not be an error. NGINX configuration enables keepalived so that could be the cause of this error. You can disable keepalived using a custom configuration.

@micheleorsi
Copy link
Author

Ah ok, I will try again to rebuild the image on our registry!

Thanks so much @aledbf!

@micheleorsi
Copy link
Author

I checked and actually with #1057 the IP are removed from the ingress ..
Now the problems seem to be on the F5 side that continues to see a port open and continue to redirect traffic.

I will try to work with the keepalive ..

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants