This repository has been archived by the owner on Apr 20, 2021. It is now read-only.
forked from weaveworks/scope
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add eBPF connection tracking without dependencies on kernel headers
Based on work from Lorenzo, updated by Iago, Alban, Alessandro and Michael. This PR adds connection tracking using eBPF. This feature is not enabled by default. For now, you can enable it by launching scope with the following command: ``` sudo ./scope launch --probe.ebpf.connections=true ``` Scope Probe also falls back on the old /proc parsing if eBPF is not working (e.g. too old kernel). This patch allows scope to get notified of every connection event, without relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve performance. We vendor https://github.com/iovisor/gobpf in Scope to load the pre-compiled ebpf program and https://github.com/kinvolk/tcptracer-bpf to guess the offsets of the structures we need in ther kernel. In this way we don't need a different pre-compiled ebpf object file per kernel. Scope fetches the pre-compiled ebpf program from https://hub.docker.com/r/kinvolk/tcptracer-bpf/ (see https://github.com/kinvolk/tcptracer-bpf) The ebpf program uses kprobes on the following kernel functions: - tcp_v4_connect - tcp_v6_connect - inet_csk_accept - tcp_close It generates "connect", "accept" and "close" events containing the connection tuple but also pid and netns. Note: the IPv6 events are not plugged in Scope. probe/endpoint/ebpf.go maintains the list of connections. Similarly to conntrack, it also keeps the dead connections for one iteration in order to report short-lived connections. The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still there and still used at start-up because eBPF only brings us the events and not the initial state. However, the /proc parsing for the initial state is now done in foreground instead of background, via newForegroundReader(). NAT resolution on connections from eBPF works in the same way as it did on connections from /proc: by using conntrack. One of the two conntrack instances was removed since eBPF detects short-lived connections. The Scope Docker image size comparison: - weaveworks/scope in current master: 22 MB (compressed), 68 MB (uncompressed) - weaveworks/scope with this patchset: 23 MB (compressed), 69 MB (uncompressed) Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive) Fixes weaveworks#1260 (Short-lived connections not tracked for containers in shared networking namespaces)
- Loading branch information
Showing
19 changed files
with
517 additions
and
80 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,256 @@ | ||
package endpoint | ||
|
||
import ( | ||
"bytes" | ||
"encoding/binary" | ||
"net" | ||
"strconv" | ||
"sync" | ||
|
||
log "github.com/Sirupsen/logrus" | ||
bpflib "github.com/iovisor/gobpf/elf" | ||
"github.com/kinvolk/tcptracer-bpf/pkg/byteorder" | ||
"github.com/kinvolk/tcptracer-bpf/pkg/offsetguess" | ||
) | ||
|
||
type eventType uint32 | ||
|
||
// These constants should be in sync with the equivalent definitions in the ebpf program. | ||
const ( | ||
_ eventType = iota | ||
EventConnect | ||
EventAccept | ||
EventClose | ||
) | ||
|
||
func (e eventType) String() string { | ||
switch e { | ||
case EventConnect: | ||
return "connect" | ||
case EventAccept: | ||
return "accept" | ||
case EventClose: | ||
return "close" | ||
default: | ||
return "unknown" | ||
} | ||
} | ||
|
||
// tcpEvent should be in sync with the struct in the ebpf maps. | ||
type tcpEvent struct { | ||
// Timestamp must be the first field, the sorting depends on it | ||
Timestamp uint64 | ||
|
||
CPU uint64 | ||
Type uint32 | ||
Pid uint32 | ||
Comm [16]byte | ||
SAddr uint32 | ||
DAddr uint32 | ||
SPort uint16 | ||
DPort uint16 | ||
NetNS uint32 | ||
} | ||
|
||
// An ebpfConnection represents a TCP connection | ||
type ebpfConnection struct { | ||
tuple fourTuple | ||
networkNamespace string | ||
incoming bool | ||
pid int | ||
} | ||
|
||
type eventTracker interface { | ||
handleConnection(eventType string, tuple fourTuple, pid int, networkNamespace string) | ||
hasDied() bool | ||
run() | ||
walkConnections(f func(ebpfConnection)) | ||
initialize() | ||
isInitialized() bool | ||
stop() | ||
} | ||
|
||
var ebpfTracker *EbpfTracker | ||
|
||
// nilTracker is a tracker that does nothing, and it implements the eventTracker interface. | ||
// It is returned when the useEbpfConn flag is false. | ||
type nilTracker struct{} | ||
|
||
func (n nilTracker) handleConnection(_ string, _ fourTuple, _ int, _ string) {} | ||
func (n nilTracker) hasDied() bool { return true } | ||
func (n nilTracker) run() {} | ||
func (n nilTracker) walkConnections(f func(ebpfConnection)) {} | ||
func (n nilTracker) initialize() {} | ||
func (n nilTracker) isInitialized() bool { return false } | ||
func (n nilTracker) stop() {} | ||
|
||
// EbpfTracker contains the sets of open and closed TCP connections. | ||
// Closed connections are kept in the `closedConnections` slice for one iteration of `walkConnections`. | ||
type EbpfTracker struct { | ||
sync.Mutex | ||
reader *bpflib.Module | ||
initialized bool | ||
dead bool | ||
|
||
openConnections map[string]ebpfConnection | ||
closedConnections []ebpfConnection | ||
} | ||
|
||
func newEbpfTracker(useEbpfConn bool) eventTracker { | ||
if !useEbpfConn { | ||
return &nilTracker{} | ||
} | ||
|
||
bpfObjectFile, err := findBpfObjectFile() | ||
if err != nil { | ||
log.Errorf("Cannot find BPF object file: %v", err) | ||
return &nilTracker{} | ||
} | ||
|
||
bpfPerfEvent := bpflib.NewModule(bpfObjectFile) | ||
if bpfPerfEvent == nil { | ||
return &nilTracker{} | ||
} | ||
err = bpfPerfEvent.Load() | ||
if err != nil { | ||
log.Errorf("Error loading BPF program: %v", err) | ||
return &nilTracker{} | ||
} | ||
|
||
bpfPerfEvent.EnableKprobes() | ||
|
||
tracker := &EbpfTracker{ | ||
openConnections: map[string]ebpfConnection{}, | ||
reader: bpfPerfEvent, | ||
} | ||
tracker.run() | ||
|
||
ebpfTracker = tracker | ||
return tracker | ||
} | ||
|
||
func (t *EbpfTracker) handleConnection(eventType string, tuple fourTuple, pid int, networkNamespace string) { | ||
t.Lock() | ||
defer t.Unlock() | ||
log.Debugf("handleConnection(%v, [%v:%v --> %v:%v], pid=%v, netNS=%v)", | ||
eventType, tuple.fromAddr, tuple.fromPort, tuple.toAddr, tuple.toPort, pid, networkNamespace) | ||
|
||
switch eventType { | ||
case "connect": | ||
conn := ebpfConnection{ | ||
incoming: false, | ||
tuple: tuple, | ||
pid: pid, | ||
networkNamespace: networkNamespace, | ||
} | ||
t.openConnections[tuple.String()] = conn | ||
case "accept": | ||
conn := ebpfConnection{ | ||
incoming: true, | ||
tuple: tuple, | ||
pid: pid, | ||
networkNamespace: networkNamespace, | ||
} | ||
t.openConnections[tuple.String()] = conn | ||
case "close": | ||
if deadConn, ok := t.openConnections[tuple.String()]; ok { | ||
delete(t.openConnections, tuple.String()) | ||
t.closedConnections = append(t.closedConnections, deadConn) | ||
} else { | ||
log.Errorf("EbpfTracker error: unmatched close event: %s pid=%d netns=%s", tuple.String(), pid, networkNamespace) | ||
} | ||
} | ||
} | ||
|
||
func tcpEventCallback(event tcpEvent) { | ||
var alive bool | ||
typ := eventType(event.Type) | ||
pid := event.Pid & 0xffffffff | ||
|
||
saddrbuf := make([]byte, 4) | ||
daddrbuf := make([]byte, 4) | ||
|
||
byteorder.Host.PutUint32(saddrbuf, uint32(event.SAddr)) | ||
byteorder.Host.PutUint32(daddrbuf, uint32(event.DAddr)) | ||
|
||
sIP := net.IPv4(saddrbuf[0], saddrbuf[1], saddrbuf[2], saddrbuf[3]) | ||
dIP := net.IPv4(daddrbuf[0], daddrbuf[1], daddrbuf[2], daddrbuf[3]) | ||
|
||
sport := event.SPort | ||
dport := event.DPort | ||
|
||
if typ.String() == "close" || typ.String() == "unknown" { | ||
alive = true | ||
} else { | ||
alive = false | ||
} | ||
tuple := fourTuple{sIP.String(), dIP.String(), uint16(sport), uint16(dport), alive} | ||
|
||
log.Debugf("tcpEventCallback(%v, [%v:%v --> %v:%v], pid=%v, netNS=%v, cpu=%v, ts=%v)", | ||
typ.String(), tuple.fromAddr, tuple.fromPort, tuple.toAddr, tuple.toPort, pid, event.NetNS, event.CPU, event.Timestamp) | ||
ebpfTracker.handleConnection(typ.String(), tuple, int(pid), strconv.FormatUint(uint64(event.NetNS), 10)) | ||
} | ||
|
||
// walkConnections calls f with all open connections and connections that have come and gone | ||
// since the last call to walkConnections | ||
func (t *EbpfTracker) walkConnections(f func(ebpfConnection)) { | ||
t.Lock() | ||
defer t.Unlock() | ||
|
||
for _, connection := range t.openConnections { | ||
f(connection) | ||
} | ||
for _, connection := range t.closedConnections { | ||
f(connection) | ||
} | ||
t.closedConnections = t.closedConnections[:0] | ||
} | ||
|
||
func (t *EbpfTracker) run() { | ||
if err := offsetguess.Guess(t.reader); err != nil { | ||
log.Errorf("%v\n", err) | ||
return | ||
} | ||
|
||
channel := make(chan []byte) | ||
|
||
go func() { | ||
var event tcpEvent | ||
for { | ||
data := <-channel | ||
err := binary.Read(bytes.NewBuffer(data), byteorder.Host, &event) | ||
if err != nil { | ||
log.Errorf("Failed to decode received data: %s\n", err) | ||
continue | ||
} | ||
tcpEventCallback(event) | ||
} | ||
}() | ||
|
||
pmIPv4, err := bpflib.InitPerfMap(t.reader, "tcp_event_ipv4", channel) | ||
if err != nil { | ||
log.Errorf("%v\n", err) | ||
return | ||
} | ||
|
||
pmIPv4.PollStart() | ||
} | ||
|
||
func (t *EbpfTracker) hasDied() bool { | ||
t.Lock() | ||
defer t.Unlock() | ||
|
||
return t.dead | ||
} | ||
|
||
func (t *EbpfTracker) initialize() { | ||
t.initialized = true | ||
} | ||
|
||
func (t *EbpfTracker) isInitialized() bool { | ||
return t.initialized | ||
} | ||
|
||
func (t *EbpfTracker) stop() { | ||
// TODO: stop the go routine in run() | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
//+build linux | ||
|
||
package endpoint | ||
|
||
func findBpfObjectFile() (string, error) { | ||
return "/usr/libexec/scope/ebpf/ebpf.o", nil | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
//+build !linux | ||
|
||
package endpoint | ||
|
||
import "fmt" | ||
|
||
func findBpfObjectFile() (string, error) { | ||
return "", fmt.Errorf("not supported") | ||
} |
Oops, something went wrong.
544f5c7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ther/their
544f5c7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More like
s/ther/the/
544f5c7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to that explanation, I would like to mention which variable in the Makefile should be modified to update to a newer version of the ebpf code.
544f5c7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ebpf program uses kprobes mainly on the following kernel functions:
Either add "mainly" or mention tcp_set_state, and kretprobes.
544f5c7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify the "removed"? It is still started, but stopped after getting the initial state