Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable TCP keepalive on listener sockets. #2638

Merged
merged 2 commits into from
Jul 17, 2020
Merged

Enable TCP keepalive on listener sockets. #2638

merged 2 commits into from
Jul 17, 2020

Conversation

erwbgy
Copy link
Contributor

@erwbgy erwbgy commented Jun 29, 2020

Fixes #2633
Signed-off-by: Keith Burdis keith.burdis@gs.com

@codecov
Copy link

codecov bot commented Jun 29, 2020

Codecov Report

Merging #2638 into master will increase coverage by 0.11%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2638      +/-   ##
==========================================
+ Coverage   76.77%   76.88%   +0.11%     
==========================================
  Files          71       72       +1     
  Lines        5674     5702      +28     
==========================================
+ Hits         4356     4384      +28     
  Misses       1227     1227              
  Partials       91       91              
Impacted Files Coverage Δ
internal/envoy/listener.go 97.73% <100.00%> (+<0.01%) ⬆️
internal/envoy/stats.go 100.00% <100.00%> (ø)
internal/envoy/tcp_keepalive.go 100.00% <100.00%> (ø)

Copy link
Contributor

@jpeach jpeach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very reasonable to me (barring a couple of style nits). I'll leave it open for other maintainers to review, however.

I'm not too concerned about compatibility, since all the socket options are documented as being from Linux 2.2 or 2.4 days. Hopefully no-one is running kernels that old. The values seem sensible to me.

I would like to see this documented, though I'll cop to there not currently being an obvious place to put this documentation. Let's at least file an issue to remember that we should add docs.

internal/envoy/tcp_keepalive.go Outdated Show resolved Hide resolved
internal/envoy/tcp_keepalive.go Outdated Show resolved Hide resolved
@erwbgy
Copy link
Contributor Author

erwbgy commented Jul 4, 2020

@jpeach @youngnick I saw some discussion on Slack about putting this behind a feature flag because these socket options don't work on Windows. Another way would be to check runtime.GOOS and only set the socket options on Linux, which should be a simple update to this pull request. Thoughts on that?

@erwbgy
Copy link
Contributor Author

erwbgy commented Jul 4, 2020

@pims Did you get a chance to test if this works on MacOS?

@pims
Copy link
Contributor

pims commented Jul 4, 2020

@erwbgy I should be able to test this weekend. I’ll update the PR.

I think build flags might also be a good way to gate this without a dedicated contour flag.

@pims
Copy link
Contributor

pims commented Jul 4, 2020

Hmm, so this doesn't build on MacOS:

internal/envoy/tcp_keepalive.go:40:17: undefined: syscall.TCP_KEEPIDLE
internal/envoy/tcp_keepalive.go:48:17: undefined: syscall.TCP_KEEPINTVL
internal/envoy/tcp_keepalive.go:58:17: undefined: syscall.TCP_KEEPCNT

if we rename tcp_keepalive.go to tcp_keepalive_linux.go and add the //+build linux directive
as well as create a tcp_keepalive.go with the following:

package envoy

import (
	envoy_api_v2_core "github.com/envoyproxy/go-control-plane/envoy/api/v2/core"
)

func TCPKeepaliveSocketOptions() []*envoy_api_v2_core.SocketOption {
    // should we return nil instead?
    return []*envoy_api_v2_core.SocketOption{}
}

it builds correctly. I still need to test it though.

@erwbgy
Copy link
Contributor Author

erwbgy commented Jul 4, 2020

Thanks @pims, learnt something new today as I hadn't had the need for build constraints before.

@jpeach
Copy link
Contributor

jpeach commented Jul 5, 2020

Hmm, so this doesn't build on MacOS:

internal/envoy/tcp_keepalive.go:40:17: undefined: syscall.TCP_KEEPIDLE
internal/envoy/tcp_keepalive.go:48:17: undefined: syscall.TCP_KEEPINTVL
internal/envoy/tcp_keepalive.go:58:17: undefined: syscall.TCP_KEEPCNT

if we rename tcp_keepalive.go to tcp_keepalive_linux.go and add the //+build linux directive
as well as create a tcp_keepalive.go with the following:

package envoy

import (
	envoy_api_v2_core "github.com/envoyproxy/go-control-plane/envoy/api/v2/core"
)

func TCPKeepaliveSocketOptions() []*envoy_api_v2_core.SocketOption {
    // should we return nil instead?
    return []*envoy_api_v2_core.SocketOption{}
}

it builds correctly. I still need to test it though.

Sorry, I should have picked up this issue during review :(

What we should do is carry our own constants for the socket options. If we conditionally compile the whole of the keep alive options file, then Envoy would get programmed differently depending on what platform Contour is running on, which is not the result that we want.

We could use +build to define the socket option number, but since we will need to hard-code them for non-linux platforms, I'm not sure there is much benefit in that. Basically what we need is a set of constants per target Envoy platform.

Copy link
Contributor

@jpeach jpeach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in main thread, Contour should program Envoy the same way, irrespective of the platform Contour was built on.

@pims
Copy link
Contributor

pims commented Jul 5, 2020

🤷 syscall.TCP_KEEPINTVL is available on // +build aix freebsd linux netbsd which I'd guess would cover 99.9% of use cases of Contour/k8s + Envoy.
My main concern was the ability to build/run contour on MacOS for development purposes.

@pims
Copy link
Contributor

pims commented Jul 5, 2020

#define	TCP_KEEPINTVL		0x101	/* interval between keepalives */
#define	TCP_KEEPCNT		0x102	/* number of keepalives before close */

from https://opensource.apple.com/source/xnu/xnu-2422.115.4/bsd/netinet/tcp.h

Couldn't find references to TCP_KEEPIDLE on MacOS but I could have missed it.

Edit:

	TCP_KEEPALIVE                     = 0x10
	TCP_KEEPCNT                       = 0x102
	TCP_KEEPINTVL                     = 0x101

from https://github.com/golang/go/blob/50bd1c4d4eb4fac8ddeb5f063c099daccfb71b26/src/syscall/zerrors_darwin_arm64.go

@erwbgy
Copy link
Contributor Author

erwbgy commented Jul 5, 2020

Looks like we can get these values from https://github.com/felixge/tcpkeepalive/.

@erwbgy
Copy link
Contributor Author

erwbgy commented Jul 5, 2020

Added constants for the TCP keepalive options, defaulting to the syscall values for Linux and BSD with MacOS (Darwin) being different. Apparently MacOS calls the TCP_KEEPIDLE socket option TCP_KEEPALIVE.

The GOOS values not covered are aix, android, illumos, js, plan9, solaris, windows. Do we need to be concerned about those?

@jpeach
Copy link
Contributor

jpeach commented Jul 5, 2020

Added constants for the TCP keepalive options, defaulting to the syscall values for Linux and BSD with MacOS (Darwin) being different. Apparently MacOS calls the TCP_KEEPIDLE socket option TCP_KEEPALIVE.

Remember that the socket options are being used to program Envoy, so we need to send the right options for the platform that Envoy is running on, not the ones for the platform that Contour is running on. There's no in-band way to detect the Envoy platform AFAIK, so the only option is to assume that it is running on Linux (which would be true in the majority of cases).

@erwbgy
Copy link
Contributor Author

erwbgy commented Jul 6, 2020

Is it not valid to assume that Contour and Envoy would run on the same platform?

Do we need to support any platforms other than Linux? For development or runtime or both?

@jpeach
Copy link
Contributor

jpeach commented Jul 7, 2020

Is it not valid to assume that Contour and Envoy would run on the same platform?

Not always. The main case is developer usage, where you may run a working copy of Contour on macOS and Envoy in a kind cluster.

Do we need to support any platforms other than Linux? For development or runtime or both?

For Contour, we should support macOS as a developer convenience. I don't know of any use case for supporting Envoy on non-Linux.

@erwbgy
Copy link
Contributor Author

erwbgy commented Jul 8, 2020

Thanks @jpeach I think I've got it now. We only support Envoy on Linux so always configure Linux TCP keep-alive socket options regardless of the platform that Contour is running on.

Copy link
Member

@stevesloka stevesloka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm with just one thought on docs & configuring.

internal/envoy/tcp_keepalive.go Show resolved Hide resolved
Copy link
Contributor

@jpeach jpeach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please hold until I get a chance to do a final review pass. Thanks.

Copy link
Contributor

@jpeach jpeach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good; we just need to rethink the docs.

Please add the keepalive options to the Listener returned by envoy.StatsListener().

Can you please rebase and squash the commits. Add a commit message that discusses the important aspects of the change for the benefit of future maintainers.

site/_resources/envoy.md Outdated Show resolved Hide resolved
@erwbgy erwbgy changed the base branch from master to gh-pages July 15, 2020 15:54
@erwbgy erwbgy changed the base branch from gh-pages to master July 15, 2020 15:55
@erwbgy
Copy link
Contributor Author

erwbgy commented Jul 15, 2020

Rebased and squashed the commits with a single commit message.

…stener sockets.

With these changes Contour configures TCP keepalives for Envoy listener sockets so that half-open connections with clients (downstream in Envoy speak) are flushed which helps Envoy work better with networks that drop long running idle TCP connections.  Once a TCP connection has been idle for 45 seconds a keepalive packet is sent every 5 seconds and if 9 packets receive no response then the connection will be terminated by Envoy.

Envoy is only supported on Linux so Contour always configures Linux TCP keep-alive socket options regardless of the platform that Contour is running on.

Fixes #2633

Signed-off-by: Keith Burdis <keith.burdis@gs.com>
@jpeach jpeach added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable TCP keepalive on listener sockets
4 participants