Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: support TCP keepalive for downstream sockets #3634

Closed
ikatson opened this issue Jun 14, 2018 · 12 comments
Closed

Feature request: support TCP keepalive for downstream sockets #3634

ikatson opened this issue Jun 14, 2018 · 12 comments
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!

Comments

@ikatson
Copy link
Contributor

ikatson commented Jun 14, 2018

We use AWS NLB in front of envoy, and it has a 350s idle timeout.
https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout

If the client makes a HTTP/1.1 keep-alive connection to that NLB, sends a request and then sends another one > 350s later, the client will get a connection reset.

We expose that to the internet, so we don't control the clients. However, it looks like envoy's listener could set SO_KEEPALIVE on accepted sockets, so that we can then tune the kernel to send those keepalives more frequently than 350 seconds, and the connection will keep working even if it's idle.

Edit: looks like on linux you can set keepalive parameters like TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL, in which case you won't need to alter the kernel settings, if envoy supports configuring these.

@mattklein123 mattklein123 added the question Questions that are neither investigations, bugs, nor enhancements label Jun 14, 2018
@mattklein123
Copy link
Member

@ggreenway I had thought we already implemented this for listeners, but it looks like only upstream clusters right now. Is that right?

@ggreenway
Copy link
Contributor

It's on my TODO list to implement for listeners. Someone else implemented it for upstreams already. The needed work is to create the config, and then pipe it through to the existing code for setting the correct sockopt.

@ggreenway ggreenway added enhancement Feature requests. Not bugs or questions. and removed question Questions that are neither investigations, bugs, nor enhancements labels Jun 14, 2018
@ikatson
Copy link
Contributor Author

ikatson commented Jun 19, 2018

Just FYI, we hard-coded some setsockopt() calls into our envoy fork, and it solved the NLB connection reset problem.

@lwhile
Copy link

lwhile commented Jul 16, 2018

@ggreenway @mattklein123

Does the feature included in the version v1.7.0?

@mattklein123 mattklein123 added the help wanted Needs help! label Jul 16, 2018
@ggreenway
Copy link
Contributor

@lwhile To my knowledge this hasn't been implemented yet. However, generic socket option support was added. On a Listener, you can set socket_options. You'll have to figure out the kernel socket option numbers and exact values, but I think it will work. See https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/lds.proto.

@ikatson
Copy link
Contributor Author

ikatson commented Jul 16, 2018

@ggreenway are the socket options set only on listening socket or also on accept'ed ones? Cause what's needed here is the latter

@ggreenway
Copy link
Contributor

I don't know off the top of my head.

@skyrocknroll
Copy link

We are bitten by this.

@m1o1
Copy link

m1o1 commented Jan 17, 2019

I think we ran into this too while running Istio on AKS:
kubernetes/client-go#527

@lychung83
Copy link

lychung83 commented Apr 13, 2019

Setting socket option for listener actually works. Assuming unix-like environment, configuration looks like follows.

listeners {
    socket_options {
    # SOL_SOCKET = 1
    # SO_KEEPALIVE = 9
        level: 1
        name: 9
        int_value: 1
        state: STATE_LISTENING
    }
    # IPPROTO_TCP = 6
    # TCP_KEEPIDLE = 4
    # TCP_KEEPINTVL = 5
    socket_options {
        level: 6
        name: 4
        int_value: 60
        state: STATE_LISTENING
    }
    socket_options {
        level: 6
        name: 5
        int_value: 60
        state: STATE_LISTENING
    }
}

For an example setting listener in as a static resource:

static_resources:
  listeners:
  - name: listener_0
    socket_options:
      - level: 1
        name: 9
        int_value: 1
        state: STATE_LISTENING

    # SOL_SOCKET = 1
    # SO_KEEPALIVE = 9
      - level: 1
        name: 9
        int_value: 1
        state: STATE_LISTENING

    # IPPROTO_TCP = 6
    # TCP_KEEPIDLE = 4
    # TCP_KEEPINTVL = 5
      - level: 6
        name: 4
        int_value: 60
        state: STATE_LISTENING
      - level: 6
        name: 5
        int_value: 60
        state: STATE_LISTENING

    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 10000

    ...

@codervinod
Copy link

Will this also send Keep-Alive headers in http response so that http1.1 clients can negotiate timeout with server? Example app:
image

@mattklein123
Copy link
Member

This is possible using socket options as described.

mvladev pushed a commit to mvladev/gardener that referenced this issue Nov 4, 2020
Some LoadBalancers do not set KEEPALIVE when they open a TCP connection
to the Istio Ingress Gateway. For long living connections it can cause
silent timeouts.
Therefore envoy must be configured to send KEEPALIVE to downstream (LB).

See envoyproxy/envoy#3634
mvladev pushed a commit to mvladev/gardener that referenced this issue Nov 4, 2020
Some LoadBalancers do not set KEEPALIVE when they open a TCP connection
to the Istio Ingress Gateway. For long living connections it can cause
silent timeouts.
Therefore envoy must be configured to send KEEPALIVE to downstream (LB).

See envoyproxy/envoy#3634
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!
Projects
None yet
Development

No branches or pull requests

8 participants