Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add eBPF connection tracking, without dependencies on kernel headers #2070

Closed

Conversation

iaguis
Copy link
Contributor

@iaguis iaguis commented Dec 7, 2016

Based on work from Lorenzo, updated by Iago, Alban, and Alessandro

This PR adds connection tracking using eBPF. This feature is not enabled by default.
For now, you can enable it by launching scope with the following command:

sudo ./scope launch --probe.ebpf.connections=true

Scope Probe also falls back on the old /proc parsing if eBPF is not working
(e.g. too old kernel, or pre-compiled eBPF program missing for the current kernel).
This patch allows scope to get notified of every connection event, without relying
on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore
improve performance.

We vendor https://github.com/kinvolk/gobpf-elf-loader in Scope to load and
receive information from the ebpf programs. The Scope build fetches the pre-compiled
ebpf programs from https://hub.docker.com/r/kinvolk/tcptracer-bpf/
(see https://github.com/kinvolk/tcptracer-bpf)

At the moment, the eBPF programs are pre-compiled for:

  • Arch 4.8.11-1
  • fedora-24
  • debian-testing
  • coreos

The ebpf programs use kprobes on the following kernel functions:

  • tcp_v4_connect
  • tcp_v6_connect
  • inet_csk_accept
  • tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also pid and netns.
Note: the IPv6 events are not plugged in Scope.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, it also keeps the dead connections for one iteration in order
to report short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via newForegroundReader().

NAT resolution on connections from eBPF works in the same way as it did on connections
from /proc: by using conntrack. One of the two conntrack instances was removed since
eBPF detects short-lived connections.

The Scope Docker image size comparison:

  • weaveworks/scope in current master: 22 MB (compressed), 68 MB
    (uncompressed)
  • weaveworks/scope with this patchset: 24 MB (compressed), 70 MB
    (uncompressed)

Fixes #1168 (walking /proc to obtain connections is very expensive)
Fixes #1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
Fixes #1962 (Port ebpf tracker to Go)
Fixes #1961 (Remove runtime kernel header dependency from ebpf tracker)

Supersedes #2024

@iaguis iaguis force-pushed the alessandro/conn-perf-ebpf branch from 801dcfc to a84a81e Compare December 8, 2016 12:23
@rade
Copy link
Member

rade commented Dec 8, 2016

Scope fetches the pre-compiled ebpf program from https://hub.docker.com/r/kinvolk/tcptracer-bpf/
(see https://github.com/kinvolk/tcptracer-bpf)

Perhaps include all the binaries in the scope container image? Yes, it adds bloat, but dynamically downloading binaries (on every launch?) is gross.

@alban
Copy link
Contributor

alban commented Dec 8, 2016

Perhaps include all the binaries in the scope container image? Yes, it adds bloat, but dynamically downloading binaries (on every launch?) is gross.

@rade It is already in the Scope Docker image. It is done at build-time (implemented in Scope's Makefile).

At the moment, the size of the compiled ebpf programs is very small, I think <50KB uncompressed.
https://hub.docker.com/r/kinvolk/tcptracer-bpf/tags/

@rade
Copy link
Member

rade commented Dec 8, 2016

Ah, so when you wrote "Scope fetches" you didn't actually mean "Scope" but "The scope build". Also you wrote "Scope fetches the pre-compiled ebpf program" (note the singular). Taking into account your comments, I believe that sentence should have read "The scope build fetches the pre-compiled ebpf programs." Correct?

@alban
Copy link
Contributor

alban commented Dec 8, 2016

@rade correct. Sorry for the impreceise wording. I or @iaguis will edit the PR text.

@alban
Copy link
Contributor

alban commented Dec 8, 2016

I closed the two previous iterations of this work, in favor of this PR. For the record, this was:

@schu schu mentioned this pull request Dec 12, 2016
@schu
Copy link
Contributor

schu commented Dec 12, 2016

Two graphs comparing the performance between master and this branch (by running our test dialer, #2082):

With up to 4K connections (40 dialer container with 100 connections each):

scope-probe-time-4k-conn

With a pool of 100 containers, each holding a random number of connections between 1 and 20:

scope-probe-time-100-containers-with-up-to-20-conn

iaguis and others added 9 commits December 21, 2016 09:55
Based on work from Lorenzo, updated by Iago, Alban, and Alessandro

This PR adds connection tracking using eBPF. This feature is not enabled by default.
For now, you can enable it by launching scope with the following command:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the old /proc parsing if eBPF is not working
(e.g. too old kernel, or pre-compiled eBPF program missing for the current kernel).
This patch allows scope to get notified of every connection event, without relying
on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore
improve performance.

We vendor https://github.com/kinvolk/gobpf-elf-loader in Scope to load and
receive information from the ebpf program. Scope fetches the pre-compiled
ebpf program from https://hub.docker.com/r/kinvolk/tcptracer-bpf/
(see https://github.com/kinvolk/tcptracer-bpf)

At the moment, the eBPF program is pre-compiled for:
- Arch 4.8.11-1
- fedora-24
- debian-testing
- coreos

The ebpf program uses kprobes on the following kernel functions:
- tcp_v4_connect
- tcp_v6_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also pid and netns.
Note: the IPv6 events are not plugged in Scope.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, it also keeps the dead connections for one iteration in order
to report short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via newForegroundReader().

NAT resolution on connections from eBPF works in the same way as it did on connections
from /proc: by using conntrack. One of the two conntrack instances was removed since
eBPF detects short-lived connections.

The Scope Docker image size comparison:
- weaveworks/scope in current master:  22 MB (compressed),  68 MB
  (uncompressed)
- weaveworks/scope with this patchset: 24 MB (compressed), 70 MB
  (uncompressed)

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
The variable EBPF_IMAGE in the Makefile determines where to fetch the
pre-compiled ebpf programs. They are added in the Scope Docker image in
/usr/libexec/scope/. This is so far less than 50KB.

Then, at run-time, Scope will pick the correct ebpf program for the
currently running kernel. This is determined by:
- /etc/os-release bind-mounted from the host gives the distribution ID.
- uname finds the architecture and the kernel version.
This patch adds a debug message to show the perf event details.
We now use CGO which requires an arm compiler and the CC environment
variable set up.

Also, the version of go shipped on debian doesn't have some standard
library packages for arm and when it tries to install them it fails
because it doesn't have permission to write to /usr/lib.

Use -pkgdir if we build for arm so packages are installed in $HOME
The cyclomatic complexity was too high and the CI didn't pass.
To handle non-linux systems.
@alepuccetti alepuccetti force-pushed the alessandro/conn-perf-ebpf branch from a84a81e to c8396aa Compare December 21, 2016 08:56
Alessandro Puccetti and others added 8 commits December 21, 2016 10:15
This patch adds a debug message to show the perf event details.
We now use CGO which requires an arm compiler and the CC environment
variable set up.

Also, the version of go shipped on debian doesn't have some standard
library packages for arm and when it tries to install them it fails
because it doesn't have permission to write to /usr/lib.

Use -pkgdir if we build for arm so packages are installed in $HOME
This goroutine is unnecessary because the `run()` function uses a goroutine itself.
Goroutines are asynchronous, and they can introduce race between connections
detected during the first pass and events form eBPF.
This prevents a race between conntrack information and ebpf.

When using eBPF, we do a first pass on `/proc` and with conntrack
to get the initial state, then eBPF takes over. To do so we "feed"
this information to eBPF so it can pair active connection with future close events.
It could happen that a connection is closed after the data from
`/proc` and conntrack are populated and before eBPF takes over.
In this case, we lose the close event, and we will have a "ghost" process
that will never disappear from the report.
@alepuccetti alepuccetti force-pushed the alessandro/conn-perf-ebpf branch from f16ea55 to 5f82411 Compare December 22, 2016 12:49
IMAGE_TAG=$(shell ./tools/image-tag)
EBPF_IMAGE=kinvolk/tcptracer-bpf:semaphore-master-475be63

This comment was marked as abuse.

@iaguis
Copy link
Contributor Author

iaguis commented Jan 17, 2017

Superseded by #2135

@iaguis iaguis closed this Jan 17, 2017
@alepuccetti alepuccetti deleted the alessandro/conn-perf-ebpf branch January 17, 2017 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants