Nginx timeouts when proxy is injected #2000

vic3lord · 2018-12-19T18:58:52Z

Bug Report

I have an Nginx service, serving static files and some locations with proxy_pass that fails with timeouts.

What is the issue?

Lots of timeouts on an Nginx service

How can it be reproduced?

Logs, error output, etc

# Linkerd-proxy: 
“ERR! proxy={server=in listen=0.0.0.0:4143 remote=10.128.0.24:63489} linkerd2_proxy::proxy::http::router service error: an IO error occurred: Connection reset by peer (os error 104)

# nginx:

2018/12/19 18:35:22 [error] 9#9: *172617 upstream timed out (110: Operation timed out) while connecting to upstream, client: 127.0.0.1, server: _, request: “GET /cookie/bundle.js HTTP/1.1”, upstream:

`linkerd check` output

kubernetes-api: can initialize the client..................................[ok]
kubernetes-api: can query the Kubernetes API...............................[ok]
kubernetes-api: is running the minimum Kubernetes API version..............[ok]
linkerd-api: control plane namespace exists................................[ok]
linkerd-api: control plane pods are ready..................................[ok]
linkerd-api: can initialize the client.....................................[ok]
linkerd-api: can query the control plane API...............................[ok]
linkerd-api[kubernetes]: control plane can talk to Kubernetes..............[ok]
linkerd-api[prometheus]: control plane can talk to Prometheus..............[ok]
linkerd-api: no invalid service profiles...................................[ok]
linkerd-version: can determine the latest version..........................[ok]
linkerd-version: cli is up-to-date.........................................[ok]
linkerd-version: control plane is up-to-date...............................[ok]

Status check results are [ok]

Environment

Kubernetes Version: 1.11.2
Cluster Environment: GKE
Host OS: COS
Linkerd version: 2.1

Possible solution

Additional context

The text was updated successfully, but these errors were encountered:

klingerf · 2018-12-19T19:55:58Z

@vic3lord Thanks for opening this. This sounds like a duplicate of #1537. Can you check out the remediation steps mentioned in that issue to see if it fixes your setup?

vic3lord · 2018-12-20T08:35:52Z

@klingerf thanks for the quick response, I saw the error logs from #1537 and the logs are not the same as I have, plus I don't use nginx-ingress in front of this service, it's a GLBC ingress and the service itself is an nginx.

EDIT: P.s the fixes are not applicable since it's not ingress.

klingerf · 2018-12-21T01:00:09Z

@vic3lord Ah, ok, apologies for misreading it. It would be really helpful if you could provide a Kubernetes config that reproduces the issue that you're seeing when it's injected with the linkerd proxy. For instance, it could be a modified version of one of our test yaml files that includes an nginx frontend that serves static assets and uses proxy_pass. That will make it a lot easier for us to track down what's going on.

vic3lord · 2018-12-21T07:33:02Z

of course!

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cdn
  namespace: default
  labels:
    app: cdn
spec:
  replicas: 3
  revisionHistoryLimit: 1
  selector:
    matchLabels:
      app: cdn
  template:
    metadata:
      labels:
        app: cdn
    spec:
      containers:
        - name: cdn
          image: nginx:alpine
          volumeMounts:
            - name: vhost
              mountPath: /etc/nginx/nginx.conf
              subPath: nginx.conf
          ports:
            - name: http
              containerPort: 80
          readinessProbe:
            httpGet:
              path: /healthz
              port: http
          livenessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 60
          resources:
            limits:
              cpu: 1
              memory: 512Mi
      volumes:
        - name: vhost
          configMap:
            name: cdn
---
apiVersion: v1
kind: Service
metadata:
  name: cdn
  namespace: default
  labels:
    app: cdn
spec:
  type: NodePort
  selector:
    app: cdn
  ports:
    - name: http
      port: 80
      targetPort: 80
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: cdn
  namespace: default
data:
  nginx.conf: |+
    user  nginx;
    worker_processes  1;

    error_log  /var/log/nginx/error.log warn;
    pid        /var/run/nginx.pid;

    events {
      worker_connections  1024;
    }

    http {
      include       /etc/nginx/mime.types;

      # add extra types support
      types {
        font/ttf                      ttf;
        font/opentype                 otf;
        font/woff                     woff;
        font/woff2                    woff2;
      }

      default_type  application/octet-stream;

      log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
      '$status $body_bytes_sent "$http_referer" '
      '"$http_user_agent" "$http_x_forwarded_for"';

      access_log off;
      sendfile on;
      tcp_nopush on;
      keepalive_timeout  65;

      map $sent_http_content_type $expires {
        default                    off;
        text/html                  1h;
        text/css                   max;
        application/javascript     1h;
        ~image/                    max;
        ~font/                     max;
      }

      server {
        listen 80;
        server_name  _;

        gzip on;
        gzip_vary on;
        gzip_proxied any;
        gzip_types "*";

        location = /healthz {
          access_log off;
          return 200 "OK";
        }

        if ($request_method !~ "OPTIONS|GET|HEAD") {
          return 405;
        }

        location / {
          access_log off;
          return 200 "OK";
        }

        location /js/ {
          add_header Cache-Control "public,s-maxage=120,max-age=300";
          proxy_pass http://sdk.default.svc.cluster.local/js/;
        }

        location /js/assets/ {
          expires $expires;
          add_header Cache-Control "public";
          proxy_pass http://sdk.default.svc.cluster.local/js/assets/;
        }

        location /fonts/ {
          expires $expires;
          add_header Cache-Control "public";
          add_header Access-Control-Allow-Origin "*";
          proxy_pass http://fonts.default.svc.cluster.local/;
        }

        location /cookie/ {
          expires $expires;
          proxy_pass http://cookie-iframe.default.svc.cluster.local/;
        }

        location /img/ {
          expires $expires;
          add_header Cache-Control "public";
          proxy_pass http://imageflow.default.svc.cluster.local:3000/img/;
        }
    }

klingerf · 2018-12-21T22:30:27Z

@vic3lord Thanks! That config doesn't apply in my env. The nginx pods exit with:

2018/12/21 18:42:50 [emerg] 1#1: unexpected end of file, expecting "}" in /etc/nginx/nginx.conf:94
nginx: [emerg] unexpected end of file, expecting "}" in /etc/nginx/nginx.conf:94

But I came up with a working nginx config that uses proxy_pass, and I can't replicate the timeout issue that you're seeing. Here's what I did:

Install the linkerd control plane
```
linkerd install | kubectl apply -f -
```

Inject and install the "hello" backend

linkerd inject hello.yml | kubectl apply -f -

hello.yml

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: hello
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: service
        image: buoyantio/helloworld:0.1.6
        args:
        - "-addr=:7777"
        - "-text=Hello"
        ports:
        - name: http
          containerPort: 7777
---
apiVersion: v1
kind: Service
metadata:
  name: hello
spec:
  selector:
    app: hello
  clusterIP: None
  ports:
  - name: http
    port: 7777

Inject and install nginx

linkerd inject nginx.yml | kubectl apply -f -

nginx.yml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cdn
  namespace: default
  labels:
    app: cdn
spec:
  replicas: 3
  revisionHistoryLimit: 1
  selector:
    matchLabels:
      app: cdn
  template:
    metadata:
      labels:
        app: cdn
    spec:
      containers:
        - name: cdn
          image: nginx:alpine
          volumeMounts:
            - name: vhost
              mountPath: /etc/nginx/nginx.conf
              subPath: nginx.conf
          ports:
            - name: http
              containerPort: 80
          readinessProbe:
            httpGet:
              path: /healthz
              port: http
          livenessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 60
          resources:
            limits:
              cpu: 1
              memory: 512Mi
      volumes:
        - name: vhost
          configMap:
            name: cdn
---
apiVersion: v1
kind: Service
metadata:
  name: cdn
  namespace: default
  labels:
    app: cdn
spec:
  type: NodePort
  selector:
    app: cdn
  ports:
    - name: http
      port: 80
      targetPort: 80
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: cdn
  namespace: default
data:
  nginx.conf: |+
    user  nginx;
    worker_processes  1;

    error_log  /var/log/nginx/error.log warn;
    pid        /var/run/nginx.pid;

    events {
      worker_connections  1024;
    }

    http {
      include       /etc/nginx/mime.types;

      # add extra types support
      types {
        font/ttf                      ttf;
        font/opentype                 otf;
        font/woff                     woff;
        font/woff2                    woff2;
      }

      default_type  application/octet-stream;

      log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
      '$status $body_bytes_sent "$http_referer" '
      '"$http_user_agent" "$http_x_forwarded_for"';

      access_log off;
      sendfile on;
      tcp_nopush on;
      keepalive_timeout  65;

      map $sent_http_content_type $expires {
        default                    off;
        text/html                  1h;
        text/css                   max;
        application/javascript     1h;
        ~image/                    max;
        ~font/                     max;
      }

      server {
        listen 80;
        server_name  _;

        gzip on;
        gzip_vary on;
        gzip_proxied any;
        gzip_types "*";

        location = /healthz {
          access_log off;
          return 200 "OK";
        }

        if ($request_method !~ "OPTIONS|GET|HEAD") {
          return 405;
        }

        location / {
          access_log off;
          return 200 "OK";
        }

        location /hello/ {
          expires $expires;
          add_header Cache-Control "public";
          proxy_pass http://hello.default.svc.cluster.local:7777/;
        }
      }
    }

Port-forward to nginx
```
kubectl port-forward svc/cdn 8080:80
```
Curl nginx
```
$ curl localhost:8080
OK
```
Curl the hello service by way of nginx
```
$ curl localhost:8080/hello/
Hello!
```

And that all works for me. Can you try it in your environment?

vic3lord · 2018-12-22T14:00:58Z

Hi @klingerf I'm sorry I just missed a brace when copying, this service is running in production for the past 254 days with 1k rps without linkerd

The problem with the timeouts is that they are not consistent and come after running a few hours, I saw someone posted another issue with something similar about a memory leak in the proxy container I think it's related #2012

It happens only under high traffic that's why I couldnt replicate it on staging. I've been running linkerd in stage for the past month and only after verifying that everything works I moved to production, that's when I found all sorts of issues...

Thanks again for your help, LMK if you need anything else on my end.

chandanpasunoori · 2019-01-03T05:54:54Z

@klingerf I have seen similar behavior as @vic3lord said, few hours after injecting linkerd2 to nginx, memory was around 3+ and CPU 100%, crashed multiples times then finally stopped working,

we removed linkerd2 from nginx as a temporary fix. any solutions for this issue will help us.

klingerf · 2019-01-03T19:36:32Z

@vic3lord @etsrepo Thanks for the additional details. I didn't realize from reading the initial description of this issue that the timeouts only happen after a few hours of high traffic. I agree that this sounds similar to the reports in #2012.

klingerf · 2019-01-14T22:40:50Z

The fix for #2012 was shipped with the edge-19.1.1 release. @vic3lord, @etsrepo, can you try upgrading the linkerd proxies in your nginx setups to see if that fixes this issue?

vic3lord · 2019-01-15T12:15:01Z

I injected into few services, will monitor closely for the next few days and close the issue if everything is fine

chandanpasunoori · 2019-01-16T07:48:52Z

@klingerf I won't be able to test nginx deployment, we had removed linkerd from nginx deployment.
I will update if we can inject again.

vic3lord · 2019-01-20T08:24:07Z

Hi @klingerf,
I injected the proxy into a few nginx services for a few days, everything seems to work fine except for one service which is not "internal" service it's serving users from our GCE ingress, and has more traffic than others

This is the error I get from nginx after injecting

1024 worker_connections are not enough

And from linkerd-proxy of this pod

WARN admin={bg=resolver} linkerd2_proxy::control::destination::background::destination_set Destination.Get stream errored for NameAddr { name: DnsName(DNSName("sdk.default.svc.cluster.local")), port: 80 }: Grpc(Status { code: Unknown, error_message: "", binary_error_details: b"" })

klingerf · 2019-01-22T23:56:27Z

@vic3lord Thanks for checking it out and reporting back! The new issue that you're seeing is described in #2118. Please watch that issue for a fix, and I'll close this one out in the meantime.

klingerf added the needs/more label Dec 19, 2018

klingerf closed this as completed Jan 22, 2019

github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nginx timeouts when proxy is injected #2000

Nginx timeouts when proxy is injected #2000

vic3lord commented Dec 19, 2018

klingerf commented Dec 19, 2018

vic3lord commented Dec 20, 2018 •

edited

Loading

klingerf commented Dec 21, 2018

vic3lord commented Dec 21, 2018 •

edited

Loading

klingerf commented Dec 21, 2018

vic3lord commented Dec 22, 2018

chandanpasunoori commented Jan 3, 2019

klingerf commented Jan 3, 2019

klingerf commented Jan 14, 2019

vic3lord commented Jan 15, 2019

chandanpasunoori commented Jan 16, 2019

vic3lord commented Jan 20, 2019

klingerf commented Jan 22, 2019

Nginx timeouts when proxy is injected #2000

Nginx timeouts when proxy is injected #2000

Comments

vic3lord commented Dec 19, 2018

Bug Report

What is the issue?

How can it be reproduced?

Logs, error output, etc

linkerd check output

Environment

Possible solution

Additional context

klingerf commented Dec 19, 2018

vic3lord commented Dec 20, 2018 • edited Loading

klingerf commented Dec 21, 2018

vic3lord commented Dec 21, 2018 • edited Loading

klingerf commented Dec 21, 2018

vic3lord commented Dec 22, 2018

chandanpasunoori commented Jan 3, 2019

klingerf commented Jan 3, 2019

klingerf commented Jan 14, 2019

vic3lord commented Jan 15, 2019

chandanpasunoori commented Jan 16, 2019

vic3lord commented Jan 20, 2019

klingerf commented Jan 22, 2019

`linkerd check` output

vic3lord commented Dec 20, 2018 •

edited

Loading

vic3lord commented Dec 21, 2018 •

edited

Loading