Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add eBPF connection tracking without dependencies on kernel headers #2135

Merged

Conversation

iaguis
Copy link
Contributor

@iaguis iaguis commented Jan 17, 2017

Based on work from Lorenzo, updated by Iago, Alban, Alessandro and
Michael.

This PR adds connection tracking using eBPF. This feature is not enabled by default.
For now, you can enable it by launching scope with the following command:

sudo ./scope launch --probe.ebpf.connections=true

This patch allows scope to get notified of every connection event,
without relying on the parsing of /proc/$pid/net/tcp{,6} and
/proc/$pid/fd/*, and therefore improve performance.

We vendor https://github.com/iovisor/gobpf in Scope to load the
pre-compiled ebpf program and https://github.com/kinvolk/tcptracer-bpf
to guess the offsets of the structures we need in the kernel. In this
way we don't need a different pre-compiled ebpf object file per kernel.
The Scope build fetches the pre-compiled ebpf program from
https://hub.docker.com/r/kinvolk/tcptracer-bpf/ (see
https://github.com/kinvolk/tcptracer-bpf). To update to a new version
you can modify the EBPF_IMAGE variable in Makefile.

The ebpf program uses kprobes/kretprobes on the following kernel functions:

  • tcp_v4_connect
  • tcp_v6_connect
  • tcp_set_state
  • inet_csk_accept
  • tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also pid and netns.
Note: the IPv6 events are not supported in Scope and thus not passed on.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, it also keeps the dead connections for one iteration in order
to report short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolution on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances is only started to get the initial state and then it is
stopped since eBPF detects short-lived connections.

The Scope Docker image size comparison:

  • weaveworks/scope in current master: 22 MB (compressed), 68 MB
    (uncompressed)
  • weaveworks/scope with this patchset: 23 MB (compressed), 69 MB
    (uncompressed)

Fixes #1168 (walking /proc to obtain connections is very expensive)
Fixes #1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
Fixes #1962 (Port ebpf tracker to Go)
Fixes #1961 (Remove runtime kernel header dependency from ebpf tracker)

This PR is kernel version/configuration independent so it supersedes #2070 which needed a different object file per kernel/kernel config.

Depends on #2068

TODO:

@pidster pidster assigned iaguis and unassigned iaguis Jan 18, 2017
@iaguis iaguis force-pushed the iaguis/conn-perf-ebpf-guess branch 2 times, most recently from dd77777 to 020d967 Compare January 19, 2017 12:15
Makefile Outdated
$(SCOPE_EXPORT): $(SCOPE_EXE) $(DOCKER_DISTRIB) docker/weave $(RUNSVINIT) docker/Dockerfile docker/demo.json docker/run-app docker/run-probe docker/entrypoint.sh
docker/ebpf.tgz: Makefile
$(SUDO) docker pull $(EBPF_IMAGE)
CONTAINER_ID=$(shell $(SUDO) docker run -d $(EBPF_IMAGE) /bin/false 2>/dev/null || true); $(SUDO) docker export -o docker/ebpf.tgz $${CONTAINER_ID}

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

Makefile Outdated
NO_CROSS_COMP=unset GOOS GOARCH
GO_HOST=$(NO_CROSS_COMP); $(GO)
WITH_GO_HOST_ENV=$(NO_CROSS_COMP); $(GO_ENV)
GO_ENV_ARM=$(GO_ENV) CC=/usr/bin/arm-linux-gnueabihf-gcc

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

Makefile Outdated
endif

ifeq ($(GOARCH),arm)
GO=env $(GO_ENV_ARM) go

This comment was marked as abuse.

This comment was marked as abuse.

Makefile Outdated
ifeq ($(GOARCH),arm)
GO=env $(GO_ENV_ARM) go
# The version of go shipped on debian doesn't have some standard library
# packages for arm and when it tries to install them it fails because it

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

circle.yml Outdated
@@ -41,7 +41,7 @@ test:
parallel: true
- cd $SRCDIR; make RM= client-lint static:
parallel: true
- cd $SRCDIR; rm -f prog/scope; if [ "$CIRCLE_NODE_INDEX" = "0" ]; then GOARCH=arm make GO_BUILD_INSTALL_DEPS= RM= prog/scope; else GOOS=darwin make GO_BUILD_INSTALL_DEPS= RM= prog/scope; fi:
- cd $SRCDIR; rm -f prog/scope; if [ "$CIRCLE_NODE_INDEX" = "0" ]; then GOARCH=arm GOOS=linux make GO_BUILD_INSTALL_DEPS= RM= prog/scope; else GOOS=darwin GOOS=linux make GO_BUILD_INSTALL_DEPS= RM= prog/scope; fi:

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@@ -6,6 +6,7 @@ RUN echo "http://dl-cdn.alpinelinux.org/alpine/edge/community" >>/etc/apk/reposi
apk add --update bash runit conntrack-tools iproute2 util-linux curl && \
rm -rf /var/cache/apk/*
ADD ./docker.tgz /
ADD ./ebpf.tgz /usr/libexec/scope/

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@@ -463,14 +463,14 @@ func (c *conntrackWalker) handleFlow(f flow, forceAdd bool) {

// walkFlows calls f with all active flows and flows that have come and gone
// since the last call to walkFlows
func (c *conntrackWalker) walkFlows(f func(flow)) {
func (c *conntrackWalker) walkFlows(f func(flow, bool)) {

This comment was marked as abuse.

This comment was marked as abuse.

@alban
Copy link
Contributor

alban commented Jan 19, 2017

We have tested (manually) on Debian, Ubuntu, Fedora, Arch, CoreOS (beta), Amazon Linux.

Note that Amazon Linux does not have /sys/kernel/debug mounted by default. So it should be mounted manually before starting the Docker daemon:

$ sudo yum install docker
$ sudo mount -t debugfs debugfs /sys/kernel/debug
$ sudo /etc/init.d/docker start

We have not tested on GCE yet.

@alepuccetti
Copy link

I tried on gc with the image gci but it does not work because bpf is not supported. Error message from strace:
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERF_EVENT_ARRAY, key_size=4, value_size=4, max_entries=1024}, 48) = -1 ENOSYS (Function not implemented)

These are the information about the gci virtual machine image:

#cat /etc/os-release 

BUILD_ID=8872.47.0
NAME="Container-VM Image"
GOOGLE_CRASH_ID=Lakitu
VERSION_ID=55
BUG_REPORT_URL=https://crbug.com/new
PRETTY_NAME="Google Container-VM Image"
VERSION=55
GOOGLE_METRICS_PRODUCT_ID=26
HOME_URL="https://cloud.google.com/compute/docs/containers/vm-image/"
ID=gci

@2opremio
Copy link
Contributor

Thanks @alepuccetti . Mind creating a ticket like https://code.google.com/p/google-compute-engine/issues/detail?id=499 to ask them to enable EBPF? Might be worth checking https://cloud.google.com/container-optimized-os/docs/release-notes in case it's already done on the dev channel.

@2opremio
Copy link
Contributor

2opremio commented Jan 20, 2017

So it should be mounted manually before starting the Docker daemon

@alban Is it worth asking them to mount it by default?

@alban
Copy link
Contributor

alban commented Jan 20, 2017

@alban Is it worth asking them to mount it by default?

Ok, asked in that thread: https://forums.aws.amazon.com/thread.jspa?messageID=762683

var ebpfTracker *EbpfTracker

// nilTracker is a tracker that does nothing, and it implements the eventTracker interface.
// It is returned when the useEbpfConn flag is false.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.


func newEbpfTracker(useEbpfConn bool) eventTracker {
if !useEbpfConn {
return &nilTracker{}

This comment was marked as abuse.

@@ -0,0 +1,7 @@
//+build linux

This comment was marked as abuse.

@@ -0,0 +1,256 @@
package endpoint

This comment was marked as abuse.

return &nilTracker{}
}
err = bpfPerfEvent.Load()
if err != nil {

This comment was marked as abuse.

return &nilTracker{}
}

bpfPerfEvent.EnableKprobes()

This comment was marked as abuse.


// tcpEvent should be in sync with the struct in the ebpf maps.
type tcpEvent struct {
// Timestamp must be the first field, the sorting depends on it

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

sport := event.SPort
dport := event.DPort

if typ.String() == "close" || typ.String() == "unknown" {

This comment was marked as abuse.

This comment was marked as abuse.


type eventType uint32

// These constants should be in sync with the equivalent definitions in the ebpf program.

This comment was marked as abuse.

This comment was marked as abuse.

}

func (t *EbpfTracker) run() {
if err := offsetguess.Guess(t.reader); err != nil {

This comment was marked as abuse.

}
}()

pmIPv4, err := bpflib.InitPerfMap(t.reader, "tcp_event_ipv4", channel)

This comment was marked as abuse.

}

func (t *EbpfTracker) stop() {
// TODO: stop the go routine in run()

This comment was marked as abuse.

This comment was marked as abuse.

@@ -38,6 +38,28 @@ func newBackgroundReader(walker process.Walker) *backgroundReader {
return br
}

func newForegroundReader(walker process.Walker) *backgroundReader {

This comment was marked as abuse.

@2opremio
Copy link
Contributor

2opremio commented Mar 2, 2017

Also, regarding https://circleci.com/gh/kinvolk/scope/741 , please don't output full reports in tests since they fill the output.

@2opremio
Copy link
Contributor

2opremio commented Mar 2, 2017

I have tried to test this branch locally and I get panics: https://gist.github.com/2opremio/8aece6a2fba726fabc6b944ab2d9ec92

@schu schu force-pushed the iaguis/conn-perf-ebpf-guess branch from 0fc6511 to 3cf87dc Compare March 3, 2017 14:01
@schu
Copy link
Contributor

schu commented Mar 3, 2017

Rebased on master.

return err
}

releaseParts := releaseRegex.FindStringSubmatch(release)

This comment was marked as abuse.

@@ -0,0 +1,14 @@
FROM alpine:3.5

This comment was marked as abuse.

This comment was marked as abuse.

@2opremio
Copy link
Contributor

2opremio commented Mar 6, 2017

A quick and rough check (top -c) of the CPU consumption in an empty system shows that, when ebpf is enabled, the probe consumes 2x the CPU.

How can you explain this?

With ebpf:

screen shot 2017-03-06 at 15 04 45

Without ebpf:

screen shot 2017-03-06 at 15 11 17

If anything, I would expect the opposite ...

@2opremio
Copy link
Contributor

2opremio commented Mar 6, 2017

Note that the app CPU consumption varies a bit but the probe consumption is fairly static: ~7% when enabling ebpf and ~3% without it.

@2opremio
Copy link
Contributor

2opremio commented Mar 6, 2017

I am seeing a lot of unmatched close event errors, just like @ekimekim at #2135 (comment)

In this case, they are DNS requests and requests to weave net (which is not running):

<probe> ERRO: 2017/03/06 14:19:12.590043 EbpfTracker error: unmatched close event: 127.0.0.1:35732-127.0.0.1:6784 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:19:17.608381 EbpfTracker error: unmatched close event: 172.17.0.1:42202-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:19:27.586822 EbpfTracker error: unmatched close event: 172.17.0.1:42206-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:19:37.589443 EbpfTracker error: unmatched close event: 172.17.0.1:42212-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:19:47.592937 EbpfTracker error: unmatched close event: 172.17.0.1:42216-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:19:57.597814 EbpfTracker error: unmatched close event: 172.17.0.1:42222-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:20:07.597878 EbpfTracker error: unmatched close event: 172.17.0.1:42226-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:20:12.597237 EbpfTracker error: unmatched close event: 127.0.0.1:35764-127.0.0.1:6784 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:20:17.598236 EbpfTracker error: unmatched close event: 172.17.0.1:42234-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:20:27.598730 EbpfTracker error: unmatched close event: 172.17.0.1:42238-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:20:37.603551 EbpfTracker error: unmatched close event: 172.17.0.1:42244-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:20:47.606111 EbpfTracker error: unmatched close event: 172.17.0.1:42248-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:20:57.608987 EbpfTracker error: unmatched close event: 172.17.0.1:42254-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:21:07.608991 EbpfTracker error: unmatched close event: 172.17.0.1:42258-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:21:12.591241 EbpfTracker error: unmatched close event: 127.0.0.1:35796-127.0.0.1:6784 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:21:17.614570 EbpfTracker error: unmatched close event: 172.17.0.1:42266-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:21:27.618499 EbpfTracker error: unmatched close event: 172.17.0.1:42270-172.17.0.1:53 pid=3022 netns=4026531957
<probe> ERRO: 2017/03/06 14:21:37.624944 EbpfTracker error: unmatched close event: 172.17.0.1:42276-172.17.0.1:53 pid=3022 netns=4026531957

I truly doubt it's #2253 but simply requests to ports where no-one is listening on. In fact, if I try to connect to a port where no-one is listening:

ubuntu@ubuntu-yakkety:~$ netcat localhost 9999

I get this in the logs:

<probe> ERRO: 2017/03/06 14:25:36.558455 EbpfTracker error: unmatched close event: 127.0.0.1:59104-127.0.0.1:9999 pid=5632 netns=4026531957

Please correct this

@iaguis
Copy link
Contributor Author

iaguis commented Mar 6, 2017

I am seeing a lot of unmatched close event errors, just like @ekimekim at #2135 (comment)

This is tracked in weaveworks/tcptracer-bpf#19

@2opremio
Copy link
Contributor

2opremio commented Mar 6, 2017

This is tracked in weaveworks/tcptracer-bpf#19

Thanks. Should we silence those errors in the meantime? (they are not really errors after all).

@schu schu force-pushed the iaguis/conn-perf-ebpf-guess branch from 3cf87dc to a545d59 Compare March 6, 2017 15:03
@iaguis
Copy link
Contributor Author

iaguis commented Mar 6, 2017

Thanks. Should we silence those errors in the meantime? (they are not really errors after all).

I think we should do that in the meantime, yes.

@alban
Copy link
Contributor

alban commented Mar 8, 2017

A quick and rough check (top -c) of the CPU consumption in an empty system shows that, when ebpf is enabled, the probe consumes 2x the CPU.

How can you explain this?

@2opremio The connection scanner used to feed the ebpf tracker with the initial state was not stopped correctly. I added commit 915eb69 which fixes the perf issue. @schu tested that it fixes the problem: top -c and cpustat show similar results when there are no connections.

@2opremio
Copy link
Contributor

2opremio commented Mar 8, 2017

Good job, I will test this branch again before the day ends.

iaguis added 5 commits March 8, 2017 22:11
Based on work from Lorenzo, updated by Iago, Alban, Alessandro and
Michael.

This PR adds connection tracking using eBPF. This feature is not enabled by default.
For now, you can enable it by launching scope with the following command:

```
sudo ./scope launch --probe.ebpf.connections=true
```

This patch allows scope to get notified of every connection event,
without relying on the parsing of /proc/$pid/net/tcp{,6} and
/proc/$pid/fd/*, and therefore improve performance.

We vendor https://github.com/iovisor/gobpf in Scope to load the
pre-compiled ebpf program and https://github.com/weaveworks/tcptracer-bpf
to guess the offsets of the structures we need in the kernel. In this
way we don't need a different pre-compiled ebpf object file per kernel.
The pre-compiled ebpf program is included in the vendoring of
tcptracer-bpf.

The ebpf program uses kprobes/kretprobes on the following kernel functions:
- tcp_v4_connect
- tcp_v6_connect
- tcp_set_state
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also pid and netns.
Note: the IPv6 events are not supported in Scope and thus not passed on.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, it also keeps the dead connections for one iteration in order
to report short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolution on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances is only started to get the initial state and then it is
stopped since eBPF detects short-lived connections.

The Scope Docker image size comparison:
- weaveworks/scope in current master:  22 MB (compressed),  68 MB
  (uncompressed)
- weaveworks/scope with this patchset: 23 MB (compressed), 69 MB
  (uncompressed)

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)

Fixes weaveworks#1962 (Port ebpf tracker to Go)

Fixes weaveworks#1961 (Remove runtime kernel header dependency from ebpf tracker)
There's no obvious reason why those tests can only be run on
us-central-1, remove the check.

It was added with 1577b90
@alban alban force-pushed the iaguis/conn-perf-ebpf-guess branch 2 times, most recently from 915eb69 to 6d55a34 Compare March 8, 2017 21:14
@alban
Copy link
Contributor

alban commented Mar 8, 2017

PR updated:

Let's see if CircleCI is happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants