Skip to content
This repository has been archived by the owner on Apr 20, 2021. It is now read-only.

Commit

Permalink
Add eBPF connection tracking without dependencies on kernel headers
Browse files Browse the repository at this point in the history
Based on work from Lorenzo, updated by Iago, Alban, Alessandro and
Michael.

This PR adds connection tracking using eBPF. This feature is not enabled by default.
For now, you can enable it by launching scope with the following command:

```
sudo ./scope launch --probe.ebpf.connections=true
```

Scope Probe also falls back on the old /proc parsing if eBPF is not
working (e.g. too old kernel). This patch allows scope to get notified
of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

We vendor https://github.com/iovisor/gobpf in Scope to load the
pre-compiled ebpf program and https://github.com/kinvolk/tcptracer-bpf
to guess the offsets of the structures we need in ther kernel. In this
way we don't need a different pre-compiled ebpf object file per kernel.
Scope fetches the pre-compiled ebpf program from
https://hub.docker.com/r/kinvolk/tcptracer-bpf/ (see
https://github.com/kinvolk/tcptracer-bpf)

The ebpf program uses kprobes on the following kernel functions:
- tcp_v4_connect
- tcp_v6_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also pid and netns.
Note: the IPv6 events are not plugged in Scope.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, it also keeps the dead connections for one iteration in order
to report short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via newForegroundReader().

NAT resolution on connections from eBPF works in the same way as it did on connections
from /proc: by using conntrack. One of the two conntrack instances was removed since
eBPF detects short-lived connections.

The Scope Docker image size comparison:
- weaveworks/scope in current master:  22 MB (compressed),  68 MB
  (uncompressed)
- weaveworks/scope with this patchset: 23 MB (compressed), 69 MB
  (uncompressed)

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
  • Loading branch information
Lorenzo Manacorda authored and iaguis committed Jan 17, 2017
1 parent e19fc7e commit 544f5c7
Show file tree
Hide file tree
Showing 19 changed files with 517 additions and 80 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ scope.tar
prog/scope
docker/scope
docker/docker.tgz
docker/ebpf.tgz
docker/weave
docker/runsvinit
extras/fixprobe/fixprobe
Expand Down
31 changes: 26 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,31 @@ RM=--rm
RUN_FLAGS=-ti
BUILD_IN_CONTAINER=true
GO_ENV=GOGC=off
GO=env $(GO_ENV) go
NO_CROSS_COMP=unset GOOS GOARCH
GO_HOST=$(NO_CROSS_COMP); $(GO)
WITH_GO_HOST_ENV=$(NO_CROSS_COMP); $(GO_ENV)
GO_ENV_ARM=$(GO_ENV) CC=/usr/bin/arm-linux-gnueabihf-gcc
GO_BUILD_INSTALL_DEPS=-i
GO_BUILD_TAGS='netgo unsafe'
GO_BUILD_FLAGS=$(GO_BUILD_INSTALL_DEPS) -ldflags "-extldflags \"-static\" -X main.version=$(SCOPE_VERSION) -s -w" -tags $(GO_BUILD_TAGS)

ifeq ($(GOOS),linux)
GO_ENV+=CGO_ENABLED=1
endif

ifeq ($(GOARCH),arm)
GO=env $(GO_ENV_ARM) go
# The version of go shipped on debian doesn't have some standard library
# packages for arm and when it tries to install them it fails because it
# doesn't have permission to write to /usr/lib
# Use -pkgdir if we build for arm so packages are installed in $HOME
GO_BUILD_FLAGS+=-pkgdir ~
else
GO=env $(GO_ENV) go
endif

NO_CROSS_COMP=unset GOOS GOARCH
GO_HOST=$(NO_CROSS_COMP); env $(GO_ENV) go
WITH_GO_HOST_ENV=$(NO_CROSS_COMP); $(GO_ENV)
IMAGE_TAG=$(shell ./tools/image-tag)
EBPF_IMAGE=kinvolk/tcptracer-bpf:master-769adde

all: $(SCOPE_EXPORT)

Expand All @@ -41,7 +58,11 @@ docker/weave:
curl -L git.io/weave -o docker/weave
chmod u+x docker/weave

$(SCOPE_EXPORT): $(SCOPE_EXE) $(DOCKER_DISTRIB) docker/weave $(RUNSVINIT) docker/Dockerfile docker/demo.json docker/run-app docker/run-probe docker/entrypoint.sh
docker/ebpf.tgz: Makefile
$(SUDO) docker pull $(EBPF_IMAGE)
CONTAINER_ID=$(shell $(SUDO) docker run -d $(EBPF_IMAGE) /bin/false 2>/dev/null || true); $(SUDO) docker export -o docker/ebpf.tgz $${CONTAINER_ID}

$(SCOPE_EXPORT): $(SCOPE_EXE) $(DOCKER_DISTRIB) docker/weave $(RUNSVINIT) docker/Dockerfile docker/demo.json docker/run-app docker/run-probe docker/entrypoint.sh docker/ebpf.tgz
cp $(SCOPE_EXE) $(RUNSVINIT) docker/
cp $(DOCKER_DISTRIB) docker/docker.tgz
$(SUDO) docker build -t $(SCOPE_IMAGE) docker/
Expand Down
6 changes: 4 additions & 2 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
FROM golang:1.7.1
FROM ubuntu:yakkety
ENV GOPATH /go
ENV PATH /go/bin:/usr/lib/go-1.7/bin:/usr/bin:/bin:/usr/sbin:/sbin
RUN apt-get update && \
apt-get install -y libpcap-dev python-requests time file shellcheck && \
apt-get install -y libpcap-dev python-requests time file shellcheck golang-1.7 git gcc-arm-linux-gnueabihf && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN go clean -i net && \
go install -tags netgo std && \
Expand Down
2 changes: 1 addition & 1 deletion circle.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ test:
parallel: true
- cd $SRCDIR; make RM= client-lint static:
parallel: true
- cd $SRCDIR; rm -f prog/scope; if [ "$CIRCLE_NODE_INDEX" = "0" ]; then GOARCH=arm make GO_BUILD_INSTALL_DEPS= RM= prog/scope; else GOOS=darwin make GO_BUILD_INSTALL_DEPS= RM= prog/scope; fi:
- cd $SRCDIR; rm -f prog/scope; if [ "$CIRCLE_NODE_INDEX" = "0" ]; then GOARCH=arm GOOS=linux make GO_BUILD_INSTALL_DEPS= RM= prog/scope; else GOOS=darwin GOOS=linux make GO_BUILD_INSTALL_DEPS= RM= prog/scope; fi:
parallel: true
- cd $SRCDIR; rm -f prog/scope; make RM=:
parallel: true
Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ RUN echo "http://dl-cdn.alpinelinux.org/alpine/edge/community" >>/etc/apk/reposi
apk add --update bash runit conntrack-tools iproute2 util-linux curl && \
rm -rf /var/cache/apk/*
ADD ./docker.tgz /
ADD ./ebpf.tgz /usr/libexec/scope/
ADD ./demo.json /
ADD ./weave /usr/bin/
COPY ./scope ./runsvinit ./entrypoint.sh /home/weave/
Expand Down
12 changes: 6 additions & 6 deletions probe/endpoint/conntrack.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,14 @@ type conntrack struct {
// flowWalker is something that maintains flows, and provides an accessor
// method to walk them.
type flowWalker interface {
walkFlows(f func(flow))
walkFlows(f func(flow, bool))
stop()
}

type nilFlowWalker struct{}

func (n nilFlowWalker) stop() {}
func (n nilFlowWalker) walkFlows(f func(flow)) {}
func (n nilFlowWalker) stop() {}
func (n nilFlowWalker) walkFlows(f func(flow, bool)) {}

// conntrackWalker uses the conntrack command to track network connections and
// implement flowWalker.
Expand Down Expand Up @@ -427,14 +427,14 @@ func (c *conntrackWalker) handleFlow(f flow, forceAdd bool) {

// walkFlows calls f with all active flows and flows that have come and gone
// since the last call to walkFlows
func (c *conntrackWalker) walkFlows(f func(flow)) {
func (c *conntrackWalker) walkFlows(f func(flow, bool)) {
c.Lock()
defer c.Unlock()
for _, flow := range c.activeFlows {
f(flow)
f(flow, true)
}
for _, flow := range c.bufferedFlows {
f(flow)
f(flow, false)
}
c.bufferedFlows = c.bufferedFlows[:0]
}
256 changes: 256 additions & 0 deletions probe/endpoint/ebpf.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
package endpoint

import (
"bytes"
"encoding/binary"
"net"
"strconv"
"sync"

log "github.com/Sirupsen/logrus"
bpflib "github.com/iovisor/gobpf/elf"
"github.com/kinvolk/tcptracer-bpf/pkg/byteorder"
"github.com/kinvolk/tcptracer-bpf/pkg/offsetguess"
)

type eventType uint32

// These constants should be in sync with the equivalent definitions in the ebpf program.
const (
_ eventType = iota
EventConnect
EventAccept
EventClose
)

func (e eventType) String() string {
switch e {
case EventConnect:
return "connect"
case EventAccept:
return "accept"
case EventClose:
return "close"
default:
return "unknown"
}
}

// tcpEvent should be in sync with the struct in the ebpf maps.
type tcpEvent struct {
// Timestamp must be the first field, the sorting depends on it
Timestamp uint64

CPU uint64
Type uint32
Pid uint32
Comm [16]byte
SAddr uint32
DAddr uint32
SPort uint16
DPort uint16
NetNS uint32
}

// An ebpfConnection represents a TCP connection
type ebpfConnection struct {
tuple fourTuple
networkNamespace string
incoming bool
pid int
}

type eventTracker interface {
handleConnection(eventType string, tuple fourTuple, pid int, networkNamespace string)
hasDied() bool
run()
walkConnections(f func(ebpfConnection))
initialize()
isInitialized() bool
stop()
}

var ebpfTracker *EbpfTracker

// nilTracker is a tracker that does nothing, and it implements the eventTracker interface.
// It is returned when the useEbpfConn flag is false.
type nilTracker struct{}

func (n nilTracker) handleConnection(_ string, _ fourTuple, _ int, _ string) {}
func (n nilTracker) hasDied() bool { return true }
func (n nilTracker) run() {}
func (n nilTracker) walkConnections(f func(ebpfConnection)) {}
func (n nilTracker) initialize() {}
func (n nilTracker) isInitialized() bool { return false }
func (n nilTracker) stop() {}

// EbpfTracker contains the sets of open and closed TCP connections.
// Closed connections are kept in the `closedConnections` slice for one iteration of `walkConnections`.
type EbpfTracker struct {
sync.Mutex
reader *bpflib.Module
initialized bool
dead bool

openConnections map[string]ebpfConnection
closedConnections []ebpfConnection
}

func newEbpfTracker(useEbpfConn bool) eventTracker {
if !useEbpfConn {
return &nilTracker{}
}

bpfObjectFile, err := findBpfObjectFile()
if err != nil {
log.Errorf("Cannot find BPF object file: %v", err)
return &nilTracker{}
}

bpfPerfEvent := bpflib.NewModule(bpfObjectFile)
if bpfPerfEvent == nil {
return &nilTracker{}
}
err = bpfPerfEvent.Load()
if err != nil {
log.Errorf("Error loading BPF program: %v", err)
return &nilTracker{}
}

bpfPerfEvent.EnableKprobes()

tracker := &EbpfTracker{
openConnections: map[string]ebpfConnection{},
reader: bpfPerfEvent,
}
tracker.run()

ebpfTracker = tracker
return tracker
}

func (t *EbpfTracker) handleConnection(eventType string, tuple fourTuple, pid int, networkNamespace string) {
t.Lock()
defer t.Unlock()
log.Debugf("handleConnection(%v, [%v:%v --> %v:%v], pid=%v, netNS=%v)",
eventType, tuple.fromAddr, tuple.fromPort, tuple.toAddr, tuple.toPort, pid, networkNamespace)

switch eventType {
case "connect":
conn := ebpfConnection{
incoming: false,
tuple: tuple,
pid: pid,
networkNamespace: networkNamespace,
}
t.openConnections[tuple.String()] = conn
case "accept":
conn := ebpfConnection{
incoming: true,
tuple: tuple,
pid: pid,
networkNamespace: networkNamespace,
}
t.openConnections[tuple.String()] = conn
case "close":
if deadConn, ok := t.openConnections[tuple.String()]; ok {
delete(t.openConnections, tuple.String())
t.closedConnections = append(t.closedConnections, deadConn)
} else {
log.Errorf("EbpfTracker error: unmatched close event: %s pid=%d netns=%s", tuple.String(), pid, networkNamespace)
}
}
}

func tcpEventCallback(event tcpEvent) {
var alive bool
typ := eventType(event.Type)
pid := event.Pid & 0xffffffff

saddrbuf := make([]byte, 4)
daddrbuf := make([]byte, 4)

byteorder.Host.PutUint32(saddrbuf, uint32(event.SAddr))
byteorder.Host.PutUint32(daddrbuf, uint32(event.DAddr))

sIP := net.IPv4(saddrbuf[0], saddrbuf[1], saddrbuf[2], saddrbuf[3])
dIP := net.IPv4(daddrbuf[0], daddrbuf[1], daddrbuf[2], daddrbuf[3])

sport := event.SPort
dport := event.DPort

if typ.String() == "close" || typ.String() == "unknown" {
alive = true
} else {
alive = false
}
tuple := fourTuple{sIP.String(), dIP.String(), uint16(sport), uint16(dport), alive}

log.Debugf("tcpEventCallback(%v, [%v:%v --> %v:%v], pid=%v, netNS=%v, cpu=%v, ts=%v)",
typ.String(), tuple.fromAddr, tuple.fromPort, tuple.toAddr, tuple.toPort, pid, event.NetNS, event.CPU, event.Timestamp)
ebpfTracker.handleConnection(typ.String(), tuple, int(pid), strconv.FormatUint(uint64(event.NetNS), 10))
}

// walkConnections calls f with all open connections and connections that have come and gone
// since the last call to walkConnections
func (t *EbpfTracker) walkConnections(f func(ebpfConnection)) {
t.Lock()
defer t.Unlock()

for _, connection := range t.openConnections {
f(connection)
}
for _, connection := range t.closedConnections {
f(connection)
}
t.closedConnections = t.closedConnections[:0]
}

func (t *EbpfTracker) run() {
if err := offsetguess.Guess(t.reader); err != nil {
log.Errorf("%v\n", err)
return
}

channel := make(chan []byte)

go func() {
var event tcpEvent
for {
data := <-channel
err := binary.Read(bytes.NewBuffer(data), byteorder.Host, &event)
if err != nil {
log.Errorf("Failed to decode received data: %s\n", err)
continue
}
tcpEventCallback(event)
}
}()

pmIPv4, err := bpflib.InitPerfMap(t.reader, "tcp_event_ipv4", channel)
if err != nil {
log.Errorf("%v\n", err)
return
}

pmIPv4.PollStart()
}

func (t *EbpfTracker) hasDied() bool {
t.Lock()
defer t.Unlock()

return t.dead
}

func (t *EbpfTracker) initialize() {
t.initialized = true
}

func (t *EbpfTracker) isInitialized() bool {
return t.initialized
}

func (t *EbpfTracker) stop() {
// TODO: stop the go routine in run()
}
7 changes: 7 additions & 0 deletions probe/endpoint/ebpf_linux.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
//+build linux

package endpoint

func findBpfObjectFile() (string, error) {
return "/usr/libexec/scope/ebpf/ebpf.o", nil
}
9 changes: 9 additions & 0 deletions probe/endpoint/ebpf_unsupported.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
//+build !linux

package endpoint

import "fmt"

func findBpfObjectFile() (string, error) {
return "", fmt.Errorf("not supported")
}
Loading

5 comments on commit 544f5c7

@alban
Copy link

@alban alban commented on 544f5c7 Jan 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ther/their

@iaguis
Copy link

@iaguis iaguis commented on 544f5c7 Jan 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More like s/ther/the/

@alban
Copy link

@alban alban commented on 544f5c7 Jan 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scope fetches the pre-compiled ebpf program from

In addition to that explanation, I would like to mention which variable in the Makefile should be modified to update to a newer version of the ebpf code.

@alban
Copy link

@alban alban commented on 544f5c7 Jan 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ebpf program uses kprobes on the following kernel functions:

The ebpf program uses kprobes mainly on the following kernel functions:

Either add "mainly" or mention tcp_set_state, and kretprobes.

@alban
Copy link

@alban alban commented on 544f5c7 Jan 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the two conntrack instances was removed since eBPF detects short-lived connections.

Can you clarify the "removed"? It is still started, but stopped after getting the initial state

Please sign in to comment.