Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run-time detection of kernel headers location for containers (follow up of #397) #743

Closed
alban opened this issue Oct 10, 2016 · 4 comments
Closed

Comments

@alban
Copy link
Contributor

alban commented Oct 10, 2016

I am trying to run bcc from within a Docker container. The Dockerfile has the following:

FROM zlim/bcc
RUN apt-get update -y && apt-get install -y bash runit conntrack iproute2 \
    util-linux curl python bcc-tools python-bcc libbcc

And then, I run the Docker container with options like:

docker run --rm --privileged -v /var/run:/var/run \
    -v /lib/modules:/lib/modules --net=host --pid=host ...

It does not always find the file ./include/linux/kconfig.h, depending on the host distributions. According to #397 (comment), the headers can be in the sources or in the build directory, depending on distributions. And the compilation option -DBCC_KERNEL_HAS_SOURCE_DIR=1 allows to distinguish that.

But at build-time of the Docker image, I don't know yet whether the host distribution will split the source and build directories.

Could bcc do the check of source or build at run-time instead of build-time? In my example, the bcc package in "zlim/bcc" image, based on Ubuntu 16.04, might not use the correct compilation option, depending on where the Docker image is executed. And this fails on CoreOS, which has /lib/modules/4.6.3-coreos/{build,source}.

/cc @iaguis @asymmetric

@krnowak
Copy link
Contributor

krnowak commented Oct 26, 2016

There is the code for that here, just needs testing - https://github.com/kinvolk/bcc/tree/krnowak/kernel-dirs-at-runtime

alban pushed a commit to kinvolk-archives/scope that referenced this issue Oct 31, 2016
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB
- weaveworks/scope with this patch:   147 MB

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images: 6 times bigger
- [ ] Requirement on kernel headers
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to
obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived
connections not tracked for containers in shared networking namespaces)
alban referenced this issue in kinvolk-archives/scope-http-statistics Nov 1, 2016
This patch adds an example on how to run the plugin on a kubernetes cluster.
@schu
Copy link

schu commented Nov 15, 2016

There is the code for that here, just needs testing - https://github.com/kinvolk/bcc/tree/krnowak/kernel-dirs-at-runtime

Tested manually on Debian testing, works for me 👍

@4ast
Copy link
Member

4ast commented Nov 15, 2016

@krnowak interesting idea! Could you send a PR, so we can test it broadly? cc @drzaeus77 @dark

alban pushed a commit to kinvolk-archives/scope that referenced this issue Nov 16, 2016
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB
- weaveworks/scope with this patch:   147 MB

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images: 6 times bigger
- [ ] Requirement on kernel headers
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to
obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived
connections not tracked for containers in shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this issue Nov 16, 2016
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB
- weaveworks/scope with this patch:   147 MB

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images: 6 times bigger
- [ ] Requirement on kernel headers
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to
obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived
connections not tracked for containers in shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this issue Nov 21, 2016
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB
- weaveworks/scope with this patch:   147 MB

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images: 6 times bigger
- [ ] Requirement on kernel headers
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to
obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived
connections not tracked for containers in shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this issue Nov 22, 2016
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB
- weaveworks/scope with this patch:   147 MB

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images: 6 times bigger
- [ ] Requirement on kernel headers
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to
obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived
connections not tracked for containers in shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this issue Nov 22, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master: 22 MB (compressed), 71 MB
  (uncompressed)
- weaveworks/scope with this patch:   97 MB (compressed), 363 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this issue Nov 22, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB (compressed),  71 MB
  (uncompressed)
- weaveworks/scope with this patchset: 83 MB (compressed), 223 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this issue Nov 22, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB (compressed),  71 MB
  (uncompressed)
- weaveworks/scope with this patchset: 83 MB (compressed), 223 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
@schu
Copy link

schu commented Nov 29, 2016

Could you send a PR, so we can test it broadly?

We opened a PR: #823

drzaeus77 added a commit that referenced this issue Nov 29, 2016
alepuccetti pushed a commit to kinvolk-archives/scope that referenced this issue Dec 7, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB (compressed),  71 MB
  (uncompressed)
- weaveworks/scope with this patchset: 83 MB (compressed), 223 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
alepuccetti pushed a commit to kinvolk-archives/scope that referenced this issue Dec 7, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB (compressed),  71 MB
  (uncompressed)
- weaveworks/scope with this patchset: 83 MB (compressed), 223 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
alepuccetti pushed a commit to kinvolk-archives/scope that referenced this issue Dec 7, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB (compressed),  71 MB
  (uncompressed)
- weaveworks/scope with this patchset: 83 MB (compressed), 223 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
iaguis pushed a commit to kinvolk-archives/scope that referenced this issue Dec 7, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB (compressed),  71 MB
  (uncompressed)
- weaveworks/scope with this patchset: 83 MB (compressed), 223 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants