Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

502 and "Connection refused while connecting to upstream" scaling pods down (or deleting pods) #3639

Closed
max-rocket-internet opened this issue Jan 9, 2019 · 12 comments

Comments

@max-rocket-internet
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

BUG REPORT

NGINX Ingress controller version:

0.21.0 / git-b65b85cd9

Kubernetes version (use kubectl version):

1.11

Environment:

AWS EKS

Installed from chart version 1.0.1 as daemonset.

Chart config:

controller:
  ingressClass: "nginx-public"
  publishService:
    enabled: true
    pathOverride: default/ingress02-nginx-ingress-controller
  kind: DaemonSet
  service:
    targetPorts:
      https: 80
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: xxx
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
      external-dns.alpha.kubernetes.io/hostname: xxx.xxx.xxx-xxxx.com
      prometheus.io/scrape: 'true'
      prometheus.io/port: '10254'
  config:
    proxy-next-upstream: error timeout http_502
    proxy-next-upstream-tries: "3"
    use-geoip: "true"
    enable-vts-status: "true"
    log-format-escape-json: "true"
    log-format-upstream: '{
      "@timestamp": "$time_iso8601",
      "body_bytes_sent": "$body_bytes_sent",
      "client_ip": "$remote_addr"
      ...
      "user_agent": "$http_user_agent"
    }'
  stats:
    enabled: true
  metrics:
    enabled: true
    service:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "10254"

What happened:

When scaling down or deleting pods we see a few 502 errors returned to the client and this in the controller logs:

2019/01/09 12:02:19 [error] 64#64: *13620 connect() failed (111: Connection refused) while connecting to upstream, client: 66.66.66.15, server: xx-01.xx.xx-xx.com, request: "GET /xxx/xxx/xxx/alive HTTP/1.1", upstream: "http://10.0.27.39:80/xxx/xxx/xxx/alive", host: "xxx-01.xx.xxx-xxx.com"

What you expected to happen:

No error visible for client.

How to reproduce it (as minimally and precisely as possible):

Run HTTP GET requests continuously. Delete some pods or downscale deployment.

Anything else we need to know:

Note I have set proxy-next-upstream: error timeout http_502 as shown above.

The deployment behind the ingress is also nginx and it sometimes shows error 499 which indicates that the ingress-controller is terminating the connections before completion?

The log from the controller shows this:

status:                   502	
upstream_status: 502, 502, 502	
  1. Even if the upstream pod is shutting down, and returns a 502, then shouldn't the ingress-controller retry with another pod?
  2. Shouldn't k8s stop routing any more requests to this pod if it's shutting down?
  3. Is there a small window here that is unavoidable?
@aledbf
Copy link
Member

aledbf commented Jan 9, 2019

Even if the upstream pod is shutting down, and returns a 502, then shouldn't the ingress-controller retry with another pod?

No. The default value is to not do that. Please check https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#proxy-next-upstream

The default value is error timeout. You can change that setting
proxy_next_upstream: "error timeout http_502"  in the configuration configmap.

@max-rocket-internet
Copy link
Author

@aledbf I did that already, see my configmap above 🙂

@aledbf
Copy link
Member

aledbf commented Jan 9, 2019

I did that already, see my configmap above slightly_smiling_face

Sorry about that.

  • How many replicas do you have when this happens?
  • Are you configuring RollingUpdateStrategy to avoid removing old replicas while the new ones are not yet ready?
  • Are you using probes in the deployment of your application?
  • Please post the complete log line from the ingress controller pod

@max-rocket-internet
Copy link
Author

max-rocket-internet commented Jan 9, 2019

All good!

  1. I go from 10 replicas to 5. I also just deleted 2 pods and also saw the same issue.
  2. Yes but I don't think this makes a difference in this scenario as there is always pods available to serve requests.
  3. Like this? This is from the pod
{ "@timestamp": "2019-01-09T12:02:21+00:00", "body_bytes_sent": "2", "client_ip": "1.2.3.4", "geoip_city": "", "geoip_country_code": "GB", "geoip_region_name": "", "http_cf_ray": "", "http_referer": "", "http_user_agent": "ApacheBench/2.3", "http_x_forwarded_for": "1.2.3.4, 54.3.4.5, 1.2.3.4", "http_x_forwarded_proto": "https", "ingress_name": "app1", "namespace": "default", "proxy_add_x_forwarded_for": "1.2.3.4, 54.3.4.5, 1.2.3.4", "proxy_protocol_addr": "", "proxy_upstream_name": "default-app1-80", "remote_address": "10.0.24.131", "remote_user": "", "request_host": "xxxx-01.xx.xxxx.com", "request_id": "6a99d7249713e81694b2208b81f1b66b", "request_length" : 594, "request_method": "GET", "request_path": "/app/monitoring/alive", "request_time": 0.003, "request": "GET /app/monitoring/alive HTTP/1.1", "response_time": "0.003", "service_name": "app1", "status": "200", "time_local": "09/Jan/2019:12:02:21 +0000", "upstream_addr": "10.0.26.32:80", "upstream_response_length": "2", "upstream_response_time": "0.004", "upstream_status": "200", "uri": "/app/monitoring/alive", "user_agent": "ApacheBench/2.3" }
2019/01/09 12:02:22 [error] 64#64: *13620 connect() failed (111: Connection refused) while connecting to upstream, client: 1.2.3.4, server: xxxx-01.xx.xxxx.com, request: "GET /app/monitoring/alive HTTP/1.1", upstream: "http://10.0.25.162:80/app/monitoring/alive", host: "xxxx-01.xx.xxxx.com"
{ "@timestamp": "2019-01-09T12:02:22+00:00", "body_bytes_sent": "2", "client_ip": "1.2.3.4", "geoip_city": "", "geoip_country_code": "GB", "geoip_region_name": "", "http_cf_ray": "", "http_referer": "", "http_user_agent": "ApacheBench/2.3", "http_x_forwarded_for": "1.2.3.4, 54.3.4.5, 1.2.3.4", "http_x_forwarded_proto": "https", "ingress_name": "app1", "namespace": "default", "proxy_add_x_forwarded_for": "1.2.3.4, 54.3.4.5, 1.2.3.4", "proxy_protocol_addr": "", "proxy_upstream_name": "default-app1-80", "remote_address": "10.0.24.131", "remote_user": "", "request_host": "xxxx-01.xx.xxxx.com", "request_id": "0f1659cbe9224562561e476d457208f7", "request_length" : 594, "request_method": "GET", "request_path": "/app/monitoring/alive", "request_time": 0.002, "request": "GET /app/monitoring/alive HTTP/1.1", "response_time": "0.002", "service_name": "app1", "status": "200", "time_local": "09/Jan/2019:12:02:22 +0000", "upstream_addr": "10.0.25.162:80, 10.0.24.174:80", "upstream_response_length": "0, 2", "upstream_response_time": "0.004, 0.000", "upstream_status": "502, 200", "uri": "/app/monitoring/alive", "user_agent": "ApacheBench/2.3" }
[09/Jan/2019:12:02:22 +0000]TCP200000.000
2019/01/09 12:02:22 [error] 64#64: *13620 connect() failed (111: Connection refused) while connecting to upstream, client: 1.2.3.4, server: xxxx-01.xx.xxxx.com, request: "GET /app/monitoring/alive HTTP/1.1", upstream: "http://10.0.27.29:80/app/monitoring/alive", host: "xxxx-01.xx.xxxx.com"
{ "@timestamp": "2019-01-09T12:02:22+00:00", "body_bytes_sent": "2", "client_ip": "1.2.3.4", "geoip_city": "", "geoip_country_code": "GB", "geoip_region_name": "", "http_cf_ray": "", "http_referer": "", "http_user_agent": "ApacheBench/2.3", "http_x_forwarded_for": "1.2.3.4, 54.3.4.5, 1.2.3.4", "http_x_forwarded_proto": "https", "ingress_name": "app1", "namespace": "default", "proxy_add_x_forwarded_for": "1.2.3.4, 54.3.4.5, 1.2.3.4", "proxy_protocol_addr": "", "proxy_upstream_name": "default-app1-80", "remote_address": "10.0.24.131", "remote_user": "", "request_host": "xxxx-01.xx.xxxx.com", "request_id": "d4e68d20ba3e30cfd844dd4bba71c8ce", "request_length" : 594, "request_method": "GET", "request_path": "/app/monitoring/alive", "request_time": 0.003, "request": "GET /app/monitoring/alive HTTP/1.1", "response_time": "0.003", "service_name": "app1", "status": "200", "time_local": "09/Jan/2019:12:02:22 +0000", "upstream_addr": "10.0.27.29:80, 10.0.24.236:80", "upstream_response_length": "0, 2", "upstream_response_time": "0.000, 0.004", "upstream_status": "502, 200", "uri": "/app/monitoring/alive", "user_agent": "ApacheBench/2.3" }

@aledbf
Copy link
Member

aledbf commented Jan 9, 2019

#64: *13620 connect() failed (111: Connection refused) while connecting to upstream

That means the pod is not ready to receive connections. This should not happen if the app is using probes

"upstream_addr": "10.0.25.162:80, 10.0.24.174:80", "upstream_response_length": "0, 2", "upstream_response_time": "0.004, 0.000", "upstream_status": "502, 200",

The configuration is working, 10.0.25.162:80 returned 502 (internally) and then 10.0.24.174:80 returned 200. You should see 200 as response.

I also just deleted 2 pods and also saw the same issue.

You see 502 in apachebench or just in the logs?

@max-rocket-internet
Copy link
Author

That means the pod is not ready to receive connections. This should not happen if the app is using probes

We are using probes. Both Liveness and Readiness. And they work.

The configuration is working, 10.0.25.162:80 returned 502 (internally) and then 10.0.24.174:80 returned 200. You should see 200 as response.

Ahh I see what you mean! It looks like this, where it tried and failed with 10.0.25.162 and succeeded with 10.0.24.174:

"upstream_addr": "10.0.25.162:80, 10.0.24.174:80"
"upstream_status": "502, 200"

That's great to see it working but we still see messages like this from the controller also:

upstream_addr:	 10.0.27.39:80, 10.0.25.162:80, 10.0.26.117:80	
upstream_status:	 502, 502, 502	

So I think the controller is doing the best it can.

You see 502 in apachebench or just in the logs?

Yes, we see it in AB.

OK at this point I'll close this issue. Maybe I'll open a new issue about the 499 as I think that's different.

But do you know how our pod could return 502 when it should have been prevented from serving requests by k8s? There is 0 502s before the pod deletion is started. Is there a small window here that is unavoidable?

@aledbf
Copy link
Member

aledbf commented Jan 9, 2019

There is 0 502s before the pod deletion is started. Is there a small window here that is unavoidable?

From your description, you went from 10 to 5 pods but the ingress controller only retries 3 times (you can change that). If the retries are sent to the killed pods, there is a small window of time (less than a second) where the ingress controller is not up to date.

Edit: you can test this removing only one pod. You should not see 502

@ElvinEfendi
Copy link
Member

This can also be the case when your app does not handle SIGTERM properly, in other words if it exits immediately. I'd suggest trying with following configuration for your deployment:

...
      containers:
        ...
        lifecycle:
          preStop:
            exec:
              command: ["sleep", "5"]
...

@max-rocket-internet
Copy link
Author

From your description, you went from 10 to 5 pods but the ingress controller only retries 3 times

OK yes, that makes sense because in theory, the ingress controller could try 3 of the 5 stopping pods.

you can test this removing only one pod. You should not see 502

Tested, everything makes sense now!

in other words if it exits immediately

Yes but I think the problem here is that there will always be a small window between the pod containers getting SIGTERM and the ingress-controller updating it's upstreams. If your app takes 5 seconds to stop, you won't see it. But for our app, a request might only last 50ms. And it shuts own correctly, i.e. finishes any remaining requests and then exits. So in theory, the app can gracefully shutdown and exit before the ingress-controller is up to date.

This can also be the case when your app does not handle SIGTERM properly, in other words if it exits immediately

True and for nginx (not the controller but part of our application pod), apparently /usr/sbin/nginx -s quit is required.

Anyway, in summary, I was able to avoid all 502s when scaling down by adding this to our application nginx container:

        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 3 && /usr/sbin/nginx -s quit"]

And this to our app container:

        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 3"]

Thanks for taking the time @aledbf and @ElvinEfendi

@matti
Copy link

matti commented Jan 18, 2019

@max-rocket-internet thank you for posting this, this solved a major mystery that I've been after 3 years!

How did you end up with sleep 3 - I think it takes longer for ingress-nginx to reload?

@ElvinEfendi
Copy link
Member

How did you end up with sleep 3 - I think it takes longer for ingress-nginx to reload?

@matti as long as you're using recent ingress-nginx versions reload does not matter. You want to sleep <the time it takes for ingress-nginx to get update from k8s API about terminating pods> + 1s (

local BACKENDS_SYNC_INTERVAL = 1
) to sync new endpoints

@felixscheinost
Copy link

I'm concerned about why this issue has been closed. The current behavior of ingress-nginx doesn't align with what I would expect from a properly functioning ingress controller. Using the preStop hook to artificially delay container termination feels like a workaround rather than a solution, yet it seems to be the default suggestion whenever this issue arises.

Many users, including myself, have only discovered this problem after experiencing sporadic 503 errors during rolling updates (e.g., issues #8731, #7330, #6273). Isn't one of the main reasons for choosing Kubernetes its HA features?

The explanation provided by @max-rocket-internet makes complete sense: If ingress-nginx polls the Kubernetes API every second, then for pods that terminate in less than a second, ingress-nginx may still have IPs in its list of endpoints that have already shut down, especially when there's minimal load and no in-flight requests.

Kubernetes ClusterIP services seem to handle this situation correctly. Why can't ingress-nginx do the same?

Alternatively, the docs should come with a prominent warning advising users that their pods mustn't shutdown too fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants