Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nginx timeouts when proxy is injected #2000

Closed
vic3lord opened this issue Dec 19, 2018 · 13 comments
Closed

Nginx timeouts when proxy is injected #2000

vic3lord opened this issue Dec 19, 2018 · 13 comments

Comments

@vic3lord
Copy link
Contributor

Bug Report

I have an Nginx service, serving static files and some locations with proxy_pass that fails with timeouts.

What is the issue?

Lots of timeouts on an Nginx service

How can it be reproduced?

Logs, error output, etc

# Linkerd-proxy: 
“ERR! proxy={server=in listen=0.0.0.0:4143 remote=10.128.0.24:63489} linkerd2_proxy::proxy::http::router service error: an IO error occurred: Connection reset by peer (os error 104)

# nginx:

2018/12/19 18:35:22 [error] 9#9: *172617 upstream timed out (110: Operation timed out) while connecting to upstream, client: 127.0.0.1, server: _, request: “GET /cookie/bundle.js HTTP/1.1”, upstream:

linkerd check output

kubernetes-api: can initialize the client..................................[ok]
kubernetes-api: can query the Kubernetes API...............................[ok]
kubernetes-api: is running the minimum Kubernetes API version..............[ok]
linkerd-api: control plane namespace exists................................[ok]
linkerd-api: control plane pods are ready..................................[ok]
linkerd-api: can initialize the client.....................................[ok]
linkerd-api: can query the control plane API...............................[ok]
linkerd-api[kubernetes]: control plane can talk to Kubernetes..............[ok]
linkerd-api[prometheus]: control plane can talk to Prometheus..............[ok]
linkerd-api: no invalid service profiles...................................[ok]
linkerd-version: can determine the latest version..........................[ok]
linkerd-version: cli is up-to-date.........................................[ok]
linkerd-version: control plane is up-to-date...............................[ok]

Status check results are [ok]

Environment

  • Kubernetes Version: 1.11.2
  • Cluster Environment: GKE
  • Host OS: COS
  • Linkerd version: 2.1

Possible solution

Additional context

@klingerf
Copy link
Member

@vic3lord Thanks for opening this. This sounds like a duplicate of #1537. Can you check out the remediation steps mentioned in that issue to see if it fixes your setup?

@vic3lord
Copy link
Contributor Author

vic3lord commented Dec 20, 2018

@klingerf thanks for the quick response, I saw the error logs from #1537 and the logs are not the same as I have, plus I don't use nginx-ingress in front of this service, it's a GLBC ingress and the service itself is an nginx.

EDIT: P.s the fixes are not applicable since it's not ingress.

@klingerf
Copy link
Member

@vic3lord Ah, ok, apologies for misreading it. It would be really helpful if you could provide a Kubernetes config that reproduces the issue that you're seeing when it's injected with the linkerd proxy. For instance, it could be a modified version of one of our test yaml files that includes an nginx frontend that serves static assets and uses proxy_pass. That will make it a lot easier for us to track down what's going on.

@vic3lord
Copy link
Contributor Author

vic3lord commented Dec 21, 2018

of course!

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cdn
  namespace: default
  labels:
    app: cdn
spec:
  replicas: 3
  revisionHistoryLimit: 1
  selector:
    matchLabels:
      app: cdn
  template:
    metadata:
      labels:
        app: cdn
    spec:
      containers:
        - name: cdn
          image: nginx:alpine
          volumeMounts:
            - name: vhost
              mountPath: /etc/nginx/nginx.conf
              subPath: nginx.conf
          ports:
            - name: http
              containerPort: 80
          readinessProbe:
            httpGet:
              path: /healthz
              port: http
          livenessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 60
          resources:
            limits:
              cpu: 1
              memory: 512Mi
      volumes:
        - name: vhost
          configMap:
            name: cdn
---
apiVersion: v1
kind: Service
metadata:
  name: cdn
  namespace: default
  labels:
    app: cdn
spec:
  type: NodePort
  selector:
    app: cdn
  ports:
    - name: http
      port: 80
      targetPort: 80
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: cdn
  namespace: default
data:
  nginx.conf: |+
    user  nginx;
    worker_processes  1;

    error_log  /var/log/nginx/error.log warn;
    pid        /var/run/nginx.pid;

    events {
      worker_connections  1024;
    }

    http {
      include       /etc/nginx/mime.types;

      # add extra types support
      types {
        font/ttf                      ttf;
        font/opentype                 otf;
        font/woff                     woff;
        font/woff2                    woff2;
      }

      default_type  application/octet-stream;

      log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
      '$status $body_bytes_sent "$http_referer" '
      '"$http_user_agent" "$http_x_forwarded_for"';

      access_log off;
      sendfile on;
      tcp_nopush on;
      keepalive_timeout  65;

      map $sent_http_content_type $expires {
        default                    off;
        text/html                  1h;
        text/css                   max;
        application/javascript     1h;
        ~image/                    max;
        ~font/                     max;
      }

      server {
        listen 80;
        server_name  _;

        gzip on;
        gzip_vary on;
        gzip_proxied any;
        gzip_types "*";

        location = /healthz {
          access_log off;
          return 200 "OK";
        }

        if ($request_method !~ "OPTIONS|GET|HEAD") {
          return 405;
        }

        location / {
          access_log off;
          return 200 "OK";
        }

        location /js/ {
          add_header Cache-Control "public,s-maxage=120,max-age=300";
          proxy_pass http://sdk.default.svc.cluster.local/js/;
        }

        location /js/assets/ {
          expires $expires;
          add_header Cache-Control "public";
          proxy_pass http://sdk.default.svc.cluster.local/js/assets/;
        }

        location /fonts/ {
          expires $expires;
          add_header Cache-Control "public";
          add_header Access-Control-Allow-Origin "*";
          proxy_pass http://fonts.default.svc.cluster.local/;
        }

        location /cookie/ {
          expires $expires;
          proxy_pass http://cookie-iframe.default.svc.cluster.local/;
        }

        location /img/ {
          expires $expires;
          add_header Cache-Control "public";
          proxy_pass http://imageflow.default.svc.cluster.local:3000/img/;
        }
    }

@klingerf
Copy link
Member

@vic3lord Thanks! That config doesn't apply in my env. The nginx pods exit with:

2018/12/21 18:42:50 [emerg] 1#1: unexpected end of file, expecting "}" in /etc/nginx/nginx.conf:94
nginx: [emerg] unexpected end of file, expecting "}" in /etc/nginx/nginx.conf:94

But I came up with a working nginx config that uses proxy_pass, and I can't replicate the timeout issue that you're seeing. Here's what I did:

  1. Install the linkerd control plane

    linkerd install | kubectl apply -f -
  2. Inject and install the "hello" backend

    linkerd inject hello.yml | kubectl apply -f -
    hello.yml
    ---
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: hello
    spec:
      replicas: 3
      template:
        metadata:
          labels:
            app: hello
        spec:
          containers:
          - name: service
            image: buoyantio/helloworld:0.1.6
            args:
            - "-addr=:7777"
            - "-text=Hello"
            ports:
            - name: http
              containerPort: 7777
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: hello
    spec:
      selector:
        app: hello
      clusterIP: None
      ports:
      - name: http
        port: 7777
  3. Inject and install nginx

    linkerd inject nginx.yml | kubectl apply -f -
    nginx.yml
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: cdn
      namespace: default
      labels:
        app: cdn
    spec:
      replicas: 3
      revisionHistoryLimit: 1
      selector:
        matchLabels:
          app: cdn
      template:
        metadata:
          labels:
            app: cdn
        spec:
          containers:
            - name: cdn
              image: nginx:alpine
              volumeMounts:
                - name: vhost
                  mountPath: /etc/nginx/nginx.conf
                  subPath: nginx.conf
              ports:
                - name: http
                  containerPort: 80
              readinessProbe:
                httpGet:
                  path: /healthz
                  port: http
              livenessProbe:
                httpGet:
                  path: /healthz
                  port: http
                initialDelaySeconds: 60
              resources:
                limits:
                  cpu: 1
                  memory: 512Mi
          volumes:
            - name: vhost
              configMap:
                name: cdn
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: cdn
      namespace: default
      labels:
        app: cdn
    spec:
      type: NodePort
      selector:
        app: cdn
      ports:
        - name: http
          port: 80
          targetPort: 80
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cdn
      namespace: default
    data:
      nginx.conf: |+
        user  nginx;
        worker_processes  1;
    
        error_log  /var/log/nginx/error.log warn;
        pid        /var/run/nginx.pid;
    
        events {
          worker_connections  1024;
        }
    
        http {
          include       /etc/nginx/mime.types;
    
          # add extra types support
          types {
            font/ttf                      ttf;
            font/opentype                 otf;
            font/woff                     woff;
            font/woff2                    woff2;
          }
    
          default_type  application/octet-stream;
    
          log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
          '$status $body_bytes_sent "$http_referer" '
          '"$http_user_agent" "$http_x_forwarded_for"';
    
          access_log off;
          sendfile on;
          tcp_nopush on;
          keepalive_timeout  65;
    
          map $sent_http_content_type $expires {
            default                    off;
            text/html                  1h;
            text/css                   max;
            application/javascript     1h;
            ~image/                    max;
            ~font/                     max;
          }
    
          server {
            listen 80;
            server_name  _;
    
            gzip on;
            gzip_vary on;
            gzip_proxied any;
            gzip_types "*";
    
            location = /healthz {
              access_log off;
              return 200 "OK";
            }
    
            if ($request_method !~ "OPTIONS|GET|HEAD") {
              return 405;
            }
    
            location / {
              access_log off;
              return 200 "OK";
            }
    
            location /hello/ {
              expires $expires;
              add_header Cache-Control "public";
              proxy_pass http://hello.default.svc.cluster.local:7777/;
            }
          }
        }
  4. Port-forward to nginx

    kubectl port-forward svc/cdn 8080:80
  5. Curl nginx

    $ curl localhost:8080
    OK
  6. Curl the hello service by way of nginx

    $ curl localhost:8080/hello/
    Hello!

And that all works for me. Can you try it in your environment?

@vic3lord
Copy link
Contributor Author

Hi @klingerf I'm sorry I just missed a brace when copying, this service is running in production for the past 254 days with 1k rps without linkerd

The problem with the timeouts is that they are not consistent and come after running a few hours, I saw someone posted another issue with something similar about a memory leak in the proxy container I think it's related #2012

It happens only under high traffic that's why I couldnt replicate it on staging. I've been running linkerd in stage for the past month and only after verifying that everything works I moved to production, that's when I found all sorts of issues...

Thanks again for your help, LMK if you need anything else on my end.

@chandanpasunoori
Copy link

@klingerf I have seen similar behavior as @vic3lord said, few hours after injecting linkerd2 to nginx, memory was around 3+ and CPU 100%, crashed multiples times then finally stopped working,

we removed linkerd2 from nginx as a temporary fix. any solutions for this issue will help us.

@klingerf
Copy link
Member

klingerf commented Jan 3, 2019

@vic3lord @etsrepo Thanks for the additional details. I didn't realize from reading the initial description of this issue that the timeouts only happen after a few hours of high traffic. I agree that this sounds similar to the reports in #2012.

@klingerf
Copy link
Member

The fix for #2012 was shipped with the edge-19.1.1 release. @vic3lord, @etsrepo, can you try upgrading the linkerd proxies in your nginx setups to see if that fixes this issue?

@vic3lord
Copy link
Contributor Author

I injected into few services, will monitor closely for the next few days and close the issue if everything is fine

@chandanpasunoori
Copy link

@klingerf I won't be able to test nginx deployment, we had removed linkerd from nginx deployment.
I will update if we can inject again.

@vic3lord
Copy link
Contributor Author

Hi @klingerf,
I injected the proxy into a few nginx services for a few days, everything seems to work fine except for one service which is not "internal" service it's serving users from our GCE ingress, and has more traffic than others

This is the error I get from nginx after injecting

1024 worker_connections are not enough

And from linkerd-proxy of this pod

WARN admin={bg=resolver} linkerd2_proxy::control::destination::background::destination_set Destination.Get stream errored for NameAddr { name: DnsName(DNSName("sdk.default.svc.cluster.local")), port: 80 }: Grpc(Status { code: Unknown, error_message: "", binary_error_details: b"" })

@klingerf
Copy link
Member

@vic3lord Thanks for checking it out and reporting back! The new issue that you're seeing is described in #2118. Please watch that issue for a fix, and I'll close this one out in the meantime.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants