-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add eBPF connection tracking #1967
Add eBPF connection tracking #1967
Conversation
Introduces connection tracking via eBPF. This allows scope to get notified of every connection event, without relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve performance. The eBPF program is in a python script using bcc: docker/tcpv4tracer.py. It is contributed upstream via iovisor/bcc#762 It is using kprobes on the following kernel functions: - tcp_v4_connect - inet_csk_accept - tcp_close It generates "connect", "accept" and "close" events containing the connection tuple but also the pid and the netns. The python script is piped into the Scope Probe and probe/endpoint/ebpf.go maintains the list of connections. Similarly to conntrack, we keep the dead connections for one iteration in order to report the short-lived connections. The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still there and still used at start-up because eBPF only brings us the events and not the initial state. However, the /proc parsing for the initial state is now done in foreground instead of background, via newForegroundReader(). Scope Probe also falls back on the the old /proc parsing if eBPF is not working (e.g. too old kernel, or missing kernel headers). There is also a flag "probe.ebpf.connections" that could disable eBPF if set to false. NAT resolutions on connections from eBPF works in the same way as it did on connections from /proc: by using conntrack. The Scope Docker image is bigger because we need a few more packages for bcc: - weaveworks/scope in current master: 22 MB - weaveworks/scope with this patch: 147 MB Limitations: - [ ] Does not support IPv6 - [ ] Sets `procspied: true` on connections coming from eBPF - [ ] Size of the Docker images: 6 times bigger - [ ] Requirement on kernel headers - [ ] Location of kernel headers: iovisor/bcc#743 Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive) Fixes weaveworks#1260 (Short-lived connections not tracked for containers in shared networking namespaces)
@alban Is the loopback connection tracking problem seen in the last demo fixed in this PR? |
Limitation:
WIP fix in https://github.com/kinvolk/bcc/commits/iaguis/tcp_tracer_established |
Are there any other limitations apart from refused connections and the loopback connections? Can you document what testing has been done? |
I could reproduce the issue with the loopback connection with the Scope master branch. In fact, I reproduce the issue both when connecting to the 127.0.0.1 address and connecting to the eth0 address. I can see the two "nc" processes. When I look at /proc, I see they are connected, but Scope does not show it. Reproducer:
So it does not seem to be a regression from this branch. |
The other limitations listed in the description of the PR, but after that I don't know other limitations.
Works for me:
Not tested:
Does not work:
|
Ah, I see. Could you please create a separate issue for it? |
Done in #1972 |
To compare the performance between the master branch and the ebpf branch, I used the following script dialer.sh. It reads the cpu time spent by the scope-probe process in kernel space and userspace from /proc. It does 2 measures: the first 10 seconds, and the next 10 seconds. I only consider the second part so that we don't count the initialization part. The script makes sure there is 1000 tcp connections established to a nginx container.
It is not as expected, unfortunately. To investigate what's going on, we ran the "go tool pprof". pprof does not work with Golang 1.7, so we need to revert to using Golang 1.6: Then I can edit dialer.sh to run Scope for longer and enable probe.http.listen with:
Then, we can run commands like:
See images As expected, we see that spyLoop takes the most of time in master but not anymore in golang-ebpf. But the regression is in the GC-related functions in golang-ebpf: runtime.gcDrain, runtime.bgsweep, runtime.scanobject take most of the time. By looking at the code, I don't see much difference in the way objects are allocated and released between master and golang-ebpf. So I'm still investigating. |
@alban It would help to get:
|
Side note: We should try to reduce the allocations (caching? pools?) in:
|
No, with the proc connector branch, ReadDirNames() is called only once at startup to get the initial state. Note that I am testing without the proc connector code, since it is independent from the ebpf branch. |
FlamegraphsIt does not seem possible to upload flamegraphs SVG on github, imgur or similar. And converting the flamegraphs to png lose information. I have the flamegraphs
object allocation profileBlocking profiles |
On master, with my test of 1000 connections, the reporter gets 1000 connections from conntrack but only 66 connections from /proc (checked with @iaguis' patch kinvolk-archives@ec98c48). So the test with 1000 connections is not a good comparison between master and the ebpf branch because they don't handle the same amount of connections. I tried to change the rate limiting parameters with this patch on master. It becomes slower, but the reporters adds all the connections from /proc (instead of only 66), so it seems more correct.
Changing the timings in this way might not be what we want to test either since it makes master run the /proc parsing more often... @2opremio any idea of what would be a better test? |
Why is it a bad comparison? IMO it's a good overall black-box comparison.
They do, it's 1000 in both. The internals shouldn't matter for absolute CPU metrics. If I am following your reasoning correctly, it seems the problem is most likely that we are duplicating work with conntrack and ebpf. Did you disable conntrack for everything except nat? |
It doesn't seem you didn't: |
Based on work from Lorenzo, updated by Iago and Alban This is the second attempt to add eBPF connection tracking. The first one was via weaveworks#1967 by forking a python script using bcc. This one is done in Golang directly thanks to [gobpf](https://github.com/iovisor/gobpf). This is not enabled by default. For now, it should be enabled manually with: ``` sudo ./scope launch --probe.ebpf.connections=true ``` Scope Probe also falls back on the the old /proc parsing if eBPF is not working (e.g. too old kernel, or missing kernel headers). This allows scope to get notified of every connection event, without relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve performance. The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc via iovisor/bcc#762. It is using kprobes on the following kernel functions: - tcp_v4_connect - inet_csk_accept - tcp_close It generates "connect", "accept" and "close" events containing the connection tuple but also the pid and the netns. probe/endpoint/ebpf.go maintains the list of connections. Similarly to conntrack, we keep the dead connections for one iteration in order to report the short-lived connections. The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still there and still used at start-up because eBPF only brings us the events and not the initial state. However, the /proc parsing for the initial state is now done in foreground instead of background, via newForegroundReader(). NAT resolutions on connections from eBPF works in the same way as it did on connections from /proc: by using conntrack. One of the two conntrack instances was removed since eBPF is able to get short-lived connections. The Scope Docker image is bigger because we need a few more packages for bcc: - weaveworks/scope in current master: 22 MB (compressed), 71 MB (uncompressed) - weaveworks/scope with this patch: 97 MB (compressed), 363 MB (uncompressed) But @iaguis has ongoing work to reduce the size of the image. Limitations: - [ ] Does not support IPv6 - [ ] Sets `procspied: true` on connections coming from eBPF - [ ] Size of the Docker images - [ ] Requirement on kernel headers for now - [ ] Location of kernel headers: iovisor/bcc#743 Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive) Fixes weaveworks#1260 (Short-lived connections not tracked for containers in shared networking namespaces)
Based on work from Lorenzo, updated by Iago and Alban This is the second attempt to add eBPF connection tracking. The first one was via weaveworks#1967 by forking a python script using bcc. This one is done in Golang directly thanks to [gobpf](https://github.com/iovisor/gobpf). This is not enabled by default. For now, it should be enabled manually with: ``` sudo ./scope launch --probe.ebpf.connections=true ``` Scope Probe also falls back on the the old /proc parsing if eBPF is not working (e.g. too old kernel, or missing kernel headers). This allows scope to get notified of every connection event, without relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve performance. The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc via iovisor/bcc#762. It is using kprobes on the following kernel functions: - tcp_v4_connect - inet_csk_accept - tcp_close It generates "connect", "accept" and "close" events containing the connection tuple but also the pid and the netns. probe/endpoint/ebpf.go maintains the list of connections. Similarly to conntrack, we keep the dead connections for one iteration in order to report the short-lived connections. The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still there and still used at start-up because eBPF only brings us the events and not the initial state. However, the /proc parsing for the initial state is now done in foreground instead of background, via newForegroundReader(). NAT resolutions on connections from eBPF works in the same way as it did on connections from /proc: by using conntrack. One of the two conntrack instances was removed since eBPF is able to get short-lived connections. The Scope Docker image is bigger because we need a few more packages for bcc: - weaveworks/scope in current master: 22 MB (compressed), 71 MB (uncompressed) - weaveworks/scope with this patchset: 83 MB (compressed), 223 MB (uncompressed) But @iaguis has ongoing work to reduce the size of the image. Limitations: - [ ] Does not support IPv6 - [ ] Sets `procspied: true` on connections coming from eBPF - [ ] Size of the Docker images - [ ] Requirement on kernel headers for now - [ ] Location of kernel headers: iovisor/bcc#743 Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive) Fixes weaveworks#1260 (Short-lived connections not tracked for containers in shared networking namespaces)
Based on work from Lorenzo, updated by Iago and Alban This is the second attempt to add eBPF connection tracking. The first one was via weaveworks#1967 by forking a python script using bcc. This one is done in Golang directly thanks to [gobpf](https://github.com/iovisor/gobpf). This is not enabled by default. For now, it should be enabled manually with: ``` sudo ./scope launch --probe.ebpf.connections=true ``` Scope Probe also falls back on the the old /proc parsing if eBPF is not working (e.g. too old kernel, or missing kernel headers). This allows scope to get notified of every connection event, without relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve performance. The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc via iovisor/bcc#762. It is using kprobes on the following kernel functions: - tcp_v4_connect - inet_csk_accept - tcp_close It generates "connect", "accept" and "close" events containing the connection tuple but also the pid and the netns. probe/endpoint/ebpf.go maintains the list of connections. Similarly to conntrack, we keep the dead connections for one iteration in order to report the short-lived connections. The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still there and still used at start-up because eBPF only brings us the events and not the initial state. However, the /proc parsing for the initial state is now done in foreground instead of background, via newForegroundReader(). NAT resolutions on connections from eBPF works in the same way as it did on connections from /proc: by using conntrack. One of the two conntrack instances was removed since eBPF is able to get short-lived connections. The Scope Docker image is bigger because we need a few more packages for bcc: - weaveworks/scope in current master: 22 MB (compressed), 71 MB (uncompressed) - weaveworks/scope with this patchset: 83 MB (compressed), 223 MB (uncompressed) But @iaguis has ongoing work to reduce the size of the image. Limitations: - [ ] Does not support IPv6 - [ ] Sets `procspied: true` on connections coming from eBPF - [ ] Size of the Docker images - [ ] Requirement on kernel headers for now - [ ] Location of kernel headers: iovisor/bcc#743 Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive) Fixes weaveworks#1260 (Short-lived connections not tracked for containers in shared networking namespaces)
Based on work from Lorenzo, updated by Iago and Alban This is the second attempt to add eBPF connection tracking. The first one was via weaveworks#1967 by forking a python script using bcc. This one is done in Golang directly thanks to [gobpf](https://github.com/iovisor/gobpf). This is not enabled by default. For now, it should be enabled manually with: ``` sudo ./scope launch --probe.ebpf.connections=true ``` Scope Probe also falls back on the the old /proc parsing if eBPF is not working (e.g. too old kernel, or missing kernel headers). This allows scope to get notified of every connection event, without relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve performance. The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc via iovisor/bcc#762. It is using kprobes on the following kernel functions: - tcp_v4_connect - inet_csk_accept - tcp_close It generates "connect", "accept" and "close" events containing the connection tuple but also the pid and the netns. probe/endpoint/ebpf.go maintains the list of connections. Similarly to conntrack, we keep the dead connections for one iteration in order to report the short-lived connections. The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still there and still used at start-up because eBPF only brings us the events and not the initial state. However, the /proc parsing for the initial state is now done in foreground instead of background, via newForegroundReader(). NAT resolutions on connections from eBPF works in the same way as it did on connections from /proc: by using conntrack. One of the two conntrack instances was removed since eBPF is able to get short-lived connections. The Scope Docker image is bigger because we need a few more packages for bcc: - weaveworks/scope in current master: 22 MB (compressed), 71 MB (uncompressed) - weaveworks/scope with this patchset: 83 MB (compressed), 223 MB (uncompressed) But @iaguis has ongoing work to reduce the size of the image. Limitations: - [ ] Does not support IPv6 - [ ] Sets `procspied: true` on connections coming from eBPF - [ ] Size of the Docker images - [ ] Requirement on kernel headers for now - [ ] Location of kernel headers: iovisor/bcc#743 Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive) Fixes weaveworks#1260 (Short-lived connections not tracked for containers in shared networking namespaces)
Based on work from Lorenzo, updated by Iago and Alban This is the second attempt to add eBPF connection tracking. The first one was via weaveworks#1967 by forking a python script using bcc. This one is done in Golang directly thanks to [gobpf](https://github.com/iovisor/gobpf). This is not enabled by default. For now, it should be enabled manually with: ``` sudo ./scope launch --probe.ebpf.connections=true ``` Scope Probe also falls back on the the old /proc parsing if eBPF is not working (e.g. too old kernel, or missing kernel headers). This allows scope to get notified of every connection event, without relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve performance. The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc via iovisor/bcc#762. It is using kprobes on the following kernel functions: - tcp_v4_connect - inet_csk_accept - tcp_close It generates "connect", "accept" and "close" events containing the connection tuple but also the pid and the netns. probe/endpoint/ebpf.go maintains the list of connections. Similarly to conntrack, we keep the dead connections for one iteration in order to report the short-lived connections. The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still there and still used at start-up because eBPF only brings us the events and not the initial state. However, the /proc parsing for the initial state is now done in foreground instead of background, via newForegroundReader(). NAT resolutions on connections from eBPF works in the same way as it did on connections from /proc: by using conntrack. One of the two conntrack instances was removed since eBPF is able to get short-lived connections. The Scope Docker image is bigger because we need a few more packages for bcc: - weaveworks/scope in current master: 22 MB (compressed), 71 MB (uncompressed) - weaveworks/scope with this patchset: 83 MB (compressed), 223 MB (uncompressed) But @iaguis has ongoing work to reduce the size of the image. Limitations: - [ ] Does not support IPv6 - [ ] Sets `procspied: true` on connections coming from eBPF - [ ] Size of the Docker images - [ ] Requirement on kernel headers for now - [ ] Location of kernel headers: iovisor/bcc#743 Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive) Fixes weaveworks#1260 (Short-lived connections not tracked for containers in shared networking namespaces)
Based on work from Lorenzo, updated by Iago and Alban This is the second attempt to add eBPF connection tracking. The first one was via weaveworks#1967 by forking a python script using bcc. This one is done in Golang directly thanks to [gobpf](https://github.com/iovisor/gobpf). This is not enabled by default. For now, it should be enabled manually with: ``` sudo ./scope launch --probe.ebpf.connections=true ``` Scope Probe also falls back on the the old /proc parsing if eBPF is not working (e.g. too old kernel, or missing kernel headers). This allows scope to get notified of every connection event, without relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve performance. The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc via iovisor/bcc#762. It is using kprobes on the following kernel functions: - tcp_v4_connect - inet_csk_accept - tcp_close It generates "connect", "accept" and "close" events containing the connection tuple but also the pid and the netns. probe/endpoint/ebpf.go maintains the list of connections. Similarly to conntrack, we keep the dead connections for one iteration in order to report the short-lived connections. The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still there and still used at start-up because eBPF only brings us the events and not the initial state. However, the /proc parsing for the initial state is now done in foreground instead of background, via newForegroundReader(). NAT resolutions on connections from eBPF works in the same way as it did on connections from /proc: by using conntrack. One of the two conntrack instances was removed since eBPF is able to get short-lived connections. The Scope Docker image is bigger because we need a few more packages for bcc: - weaveworks/scope in current master: 22 MB (compressed), 71 MB (uncompressed) - weaveworks/scope with this patchset: 83 MB (compressed), 223 MB (uncompressed) But @iaguis has ongoing work to reduce the size of the image. Limitations: - [ ] Does not support IPv6 - [ ] Sets `procspied: true` on connections coming from eBPF - [ ] Size of the Docker images - [ ] Requirement on kernel headers for now - [ ] Location of kernel headers: iovisor/bcc#743 Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive) Fixes weaveworks#1260 (Short-lived connections not tracked for containers in shared networking namespaces)
Based on work from Lorenzo, updated by Iago and Alban This is the second attempt to add eBPF connection tracking. The first one was via weaveworks#1967 by forking a python script using bcc. This one is done in Golang directly thanks to [gobpf](https://github.com/iovisor/gobpf). This is not enabled by default. For now, it should be enabled manually with: ``` sudo ./scope launch --probe.ebpf.connections=true ``` Scope Probe also falls back on the the old /proc parsing if eBPF is not working (e.g. too old kernel, or missing kernel headers). This allows scope to get notified of every connection event, without relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve performance. The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc via iovisor/bcc#762. It is using kprobes on the following kernel functions: - tcp_v4_connect - inet_csk_accept - tcp_close It generates "connect", "accept" and "close" events containing the connection tuple but also the pid and the netns. probe/endpoint/ebpf.go maintains the list of connections. Similarly to conntrack, we keep the dead connections for one iteration in order to report the short-lived connections. The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still there and still used at start-up because eBPF only brings us the events and not the initial state. However, the /proc parsing for the initial state is now done in foreground instead of background, via newForegroundReader(). NAT resolutions on connections from eBPF works in the same way as it did on connections from /proc: by using conntrack. One of the two conntrack instances was removed since eBPF is able to get short-lived connections. The Scope Docker image is bigger because we need a few more packages for bcc: - weaveworks/scope in current master: 22 MB (compressed), 71 MB (uncompressed) - weaveworks/scope with this patchset: 83 MB (compressed), 223 MB (uncompressed) But @iaguis has ongoing work to reduce the size of the image. Limitations: - [ ] Does not support IPv6 - [ ] Sets `procspied: true` on connections coming from eBPF - [ ] Size of the Docker images - [ ] Requirement on kernel headers for now - [ ] Location of kernel headers: iovisor/bcc#743 Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive) Fixes weaveworks#1260 (Short-lived connections not tracked for containers in shared networking namespaces)
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.
The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.
The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.
The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.
NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.
The Scope Docker image is bigger because we need a few more packages
for bcc:
Limitations:
procspied: true
on connections coming from eBPFFixes #1168 (walking /proc to
obtain connections is very expensive)
Fixes #1260 (Short-lived
connections not tracked for containers in shared networking namespaces)
This is the code from @asymmetric's ebpf branch after rebasing on master and squashing it.