Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools: add tcptracer #762

Merged

Conversation

iaguis
Copy link
Contributor

@iaguis iaguis commented Oct 18, 2016

This allows tracking TCP connections by tracking TCP connects, closes
and accepts.

This is different from existing tools like tcpconnect or tcpaccept in
that:

  • It includes more information like network namespace or source ports
    for tcpconnects or remote ports for tcpaccepts
  • It traces tcp_close allowing to see when a connection ends
  • It only shows information about established connections

// failed to send SYNC packet, may not have populated
// socket __sk_common.{skc_rcv_saddr, ...}
connectsock.delete(&pid);
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What can we do to catch the connection details on EINPROGRESS on asynchronous sockets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with curl and it works as expected. During the connect() we get the connect event with all the information.

@brendangregg
Copy link
Member

Thanks; some quick comments:

  • This isn't an /example, it's a /tool. Please see CONTRIBUTING_SCRIPTS.md.
  • It uses 5 kernel dynamic tracepoints just for IPv4. This makes it especially brittle. tcpconnect uses 2, and tcpaccept uses 1, and they both cover IPv4 & IPv6.

This is the kind of tool I'd rather wait until we have TCP tracepoints to attempt -- speaking from experience. My tcpsnoop script from Solaris 10 used several dynamic kernel tracepoints, and it broke at least 7 times, and I ended up with a lot of upset end users. Once we had static tracepoints, I was able to write such tools once and they never broke again. We need them on Linux. And, should they arrive, supporting IPv6 should be much easier -- this would become a "tcptracer" tool.

Although, Solaris is obviously not Linux. The real question here is how likely these are to change:

  • int kprobe__tcp_v4_connect(struct pt_regs *ctx, struct sock *sk)
  • int kprobe__tcp_close(struct pt_regs *ctx, struct sock *sk)
  • int kretprobe__tcp_v4_connect(struct pt_regs *ctx)
  • int kretprobe__tcp_close(struct pt_regs *ctx)
  • int kretprobe__inet_csk_accept(struct pt_regs *ctx)

We can add details like ports and namespaces to the existing tcpconnect/tcpaccept. And there's another ticket to implement a tcplife tool, for tcp_close() tracing.

@brendangregg
Copy link
Member

The output is also over 80 chars, by default, when it doesn't need to be. That breaks a lot of things, and should only be done if it's unavoidable.

If I were writing it, I'd have made the TYPE column a single character "T" with "C" == connect, etc. NETNS only printed with a -n or -v option. Truncated COMM a little (12 chars) if need be. Etc.

@4ast
Copy link
Member

4ast commented Oct 18, 2016

Although, Solaris is obviously not Linux. The real question here is how likely these are to change:

int kprobe__tcp_v4_connect(struct pt_regs *ctx, struct sock *sk)
int kprobe__tcp_close(struct pt_regs *ctx, struct sock *sk)
int kretprobe__tcp_v4_connect(struct pt_regs *ctx)
int kretprobe__tcp_close(struct pt_regs *ctx)
int kretprobe__inet_csk_accept(struct pt_regs *ctx)

These funcs and their args are highly unlikely to change any time soon.
I think it's ok to go with kprobe approach for now, since there is no one actively proposing tracepoints for tcp stack and it's not clear what kind of push back there will be.

@brendangregg
Copy link
Member

Thanks @4ast. Yes, I really want TCP tracepoints, and maybe I'll just have to do propose them, but that's all in the future at the moment.

@iaguis So I'd check CONTRIBUTING_SCRIPTS.md, which will mean pep8 styling, an examples file, and a man page. Try to adjust the output width to keep it <80, which probably means putting the NETNS column in a verbose mode (-v). After all, if you run it with -n and have selected a NETNS, then you don't need to print it out by default.

I'd also consider changing -n to -N. -n is often used to match a process name, and that's a reasonable addition to this tool at some point ("tcpv4tracer java").

@brendangregg
Copy link
Member

Also, why do you bother with the kretprobe of tcp_close()? Is something not valid on the kprobe?

@iaguis
Copy link
Contributor Author

iaguis commented Oct 19, 2016

Thanks everybody for the quick responses!

Also, why do you bother with the kretprobe of tcp_close()? Is something not valid on the kprobe?

You're right, the kprobe should be enough. Thanks!

I'm sorry I haven't read CONTRIBUTING_SCRIPTS.md before submitting this, I'll rework this PR taking that and #762 (comment) into account.

alban pushed a commit to kinvolk-archives/scope that referenced this pull request Oct 31, 2016
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB
- weaveworks/scope with this patch:   147 MB

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
alban pushed a commit to kinvolk-archives/scope that referenced this pull request Oct 31, 2016
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB
- weaveworks/scope with this patch:   147 MB

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
alban pushed a commit to kinvolk-archives/scope that referenced this pull request Oct 31, 2016
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB
- weaveworks/scope with this patch:   147 MB

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images: 6 times bigger
- [ ] Requirement on kernel headers
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to
obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived
connections not tracked for containers in shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this pull request Nov 16, 2016
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB
- weaveworks/scope with this patch:   147 MB

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images: 6 times bigger
- [ ] Requirement on kernel headers
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to
obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived
connections not tracked for containers in shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this pull request Nov 16, 2016
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB
- weaveworks/scope with this patch:   147 MB

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images: 6 times bigger
- [ ] Requirement on kernel headers
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to
obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived
connections not tracked for containers in shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this pull request Nov 21, 2016
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB
- weaveworks/scope with this patch:   147 MB

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images: 6 times bigger
- [ ] Requirement on kernel headers
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to
obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived
connections not tracked for containers in shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this pull request Nov 22, 2016
Introduces connection tracking via eBPF. This allows scope to get
notified of every connection event, without relying on the parsing of
/proc/$pid/net/tcp{,6} and /proc/$pid/fd/*, and therefore improve
performance.

The eBPF program is in a python script using bcc: docker/tcpv4tracer.py.
It is contributed upstream via iovisor/bcc#762
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

The python script is piped into the Scope Probe and
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers). There is also
a flag "probe.ebpf.connections" that could disable eBPF if set to false.

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB
- weaveworks/scope with this patch:   147 MB

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images: 6 times bigger
- [ ] Requirement on kernel headers
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to
obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived
connections not tracked for containers in shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this pull request Nov 22, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master: 22 MB (compressed), 71 MB
  (uncompressed)
- weaveworks/scope with this patch:   97 MB (compressed), 363 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this pull request Nov 22, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB (compressed),  71 MB
  (uncompressed)
- weaveworks/scope with this patchset: 83 MB (compressed), 223 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
alban pushed a commit to kinvolk-archives/scope that referenced this pull request Nov 22, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB (compressed),  71 MB
  (uncompressed)
- weaveworks/scope with this patchset: 83 MB (compressed), 223 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
alepuccetti pushed a commit to kinvolk-archives/scope that referenced this pull request Dec 7, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB (compressed),  71 MB
  (uncompressed)
- weaveworks/scope with this patchset: 83 MB (compressed), 223 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
alepuccetti pushed a commit to kinvolk-archives/scope that referenced this pull request Dec 7, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB (compressed),  71 MB
  (uncompressed)
- weaveworks/scope with this patchset: 83 MB (compressed), 223 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
alepuccetti pushed a commit to kinvolk-archives/scope that referenced this pull request Dec 7, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB (compressed),  71 MB
  (uncompressed)
- weaveworks/scope with this patchset: 83 MB (compressed), 223 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
iaguis pushed a commit to kinvolk-archives/scope that referenced this pull request Dec 7, 2016
Based on work from Lorenzo, updated by Iago and Alban

This is the second attempt to add eBPF connection tracking. The first
one was via weaveworks#1967 by forking a
python script using bcc. This one is done in Golang directly thanks to
[gobpf](https://github.com/iovisor/gobpf).

This is not enabled by default. For now, it should be enabled manually
with:
```
sudo ./scope launch --probe.ebpf.connections=true
```
Scope Probe also falls back on the the old /proc parsing if eBPF is not
working (e.g. too old kernel, or missing kernel headers).

This allows scope to get notified of every connection event, without
relying on the parsing of /proc/$pid/net/tcp{,6} and /proc/$pid/fd/*,
and therefore improve performance.

The eBPF program is in probe/endpoint/ebpf.go. It was discussed in bcc
via iovisor/bcc#762.
It is using kprobes on the following kernel functions:
- tcp_v4_connect
- inet_csk_accept
- tcp_close

It generates "connect", "accept" and "close" events containing the
connection tuple but also the pid and the netns.

probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, we keep the dead connections for one iteration in order to
report the short-lived connections.

The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().

NAT resolutions on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances was removed since eBPF is able to get short-lived connections.

The Scope Docker image is bigger because we need a few more packages
for bcc:
- weaveworks/scope in current master:  22 MB (compressed),  71 MB
  (uncompressed)
- weaveworks/scope with this patchset: 83 MB (compressed), 223 MB
  (uncompressed)

But @iaguis has ongoing work to reduce the size of the image.

Limitations:
- [ ] Does not support IPv6
- [ ] Sets `procspied: true` on connections coming from eBPF
- [ ] Size of the Docker images
- [ ] Requirement on kernel headers for now
- [ ] Location of kernel headers: iovisor/bcc#743

Fixes weaveworks#1168 (walking /proc to obtain connections is very expensive)

Fixes weaveworks#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
@goldshtn
Copy link
Collaborator

@iaguis This PR is still open -- is it ready for another review?

@iaguis
Copy link
Contributor Author

iaguis commented Feb 14, 2017

@iaguis This PR is still open -- is it ready for another review?

I'll try to rebase it and update by the end of the week.

@4ast
Copy link
Member

4ast commented Mar 9, 2017

@iaguis waiting for rebase and addressing @brendangregg comments re: contributing_scripts.md

@goldshtn
Copy link
Collaborator

@iaguis Ping? :)

@iaguis iaguis force-pushed the iaguis/to_upstream/tcp_tracer branch from a0d1de7 to 857d40f Compare March 27, 2017 17:48
@iaguis iaguis changed the title [RFC] examples/tracing: add tcpv4tracer tools: add tcptracer Mar 27, 2017
@iaguis
Copy link
Contributor Author

iaguis commented Mar 27, 2017

Rebased, addressed comments (I think I didn't forget any).

@iaguis iaguis force-pushed the iaguis/to_upstream/tcp_tracer branch from 857d40f to 1369dc5 Compare March 27, 2017 22:04
@goldshtn
Copy link
Collaborator

[buildbot, test this please]

@goldshtn
Copy link
Collaborator

Copying @brendangregg @4ast to take a look.

@brendangregg
Copy link
Member

It's pretty long, at 649 lines, which would make it the 3rd longest tool in bcc (coming after the multi-tools trace and argdist). My longest tool is tcplife, at 360 lines, and my average is 194. Can the header include something like?:

# You should generally try to avoid writing long scripts that measure multiple functions and
# walk multiple kernel structures, as they will be a burden to maintain as the kernel changes.
# The following code should be replaced, and simplified, when static TCP probes exist.

My hope is to discourage others copying this and making it longer/worse. We need static TCP tracepoints, which should cut down a lot of these lines, and make the script stable.

BTW:

# ./tcptracer.py -h
usage: tcptracer.py [-h] [-p PID] [-N NETNS] [-v]

Trace TCP connections

optional arguments:
  -h, --help            show this help message and exit
  -p PID, --pid PID     trace this PID only
  -N NETNS, --netns NETNS
                        trace this Network Namespace only
  -v, --verbose         include Network Namespace in the output

It doesn't do that much right now, either. When people start adding options, this tool will quickly become the biggest in bcc.

Other comments...

-v has a \n in the header:

# ./tcptracer.py -v
Tracing TCP established connections. Ctrl-C to end.
TYPE         PID    COMM             IP SADDR            DADDR            SPORT  DPORT 
NETNS   

And the T column:

./tcptracer.py 
Tracing TCP established connections. Ctrl-C to end.
T  PID    COMM             IP SADDR            DADDR            SPORT  DPORT 
AC 1171   sshd             4  100.66.11.247    100.72.22.55     22     27692 
CL 19801  sshd             4  100.66.11.247    100.72.22.55     22     27692 

When I see two chars, I think flags (like the RWBS field from the block tracepoints). I'd consider changing it to a single character (which matches the header, "T"):

  • A: accept
  • C: connect
  • X: close

or, to be TCP-terminology pedantic:

  • A: active open (what at the socket layer would be a connect())
  • P: passive open (what at the socket layer would be an accept())
  • C: close

Maybe the A/C/X is better, as those terms are more commonly used.

This allows tracking TCP connections by tracking TCP connects, closes
and accepts.

This is different from existing tools like tcpconnect or tcpaccept in
that:

* It includes more information like network namespace or source ports
for tcpconnects or remote ports for tcpaccepts
* It traces tcp_close allowing to see when a connection ends
* It only shows information about established connections
@iaguis iaguis force-pushed the iaguis/to_upstream/tcp_tracer branch from 1369dc5 to c717376 Compare March 29, 2017 09:41
@iaguis
Copy link
Contributor Author

iaguis commented Mar 29, 2017

Thanks for the comments. Updated.

@goldshtn
Copy link
Collaborator

[buildbot, ok to test]

X 23323 curl 6 [::1] [::1] 36226 80
X 23278 nc 6 [::1] [::1] 80 36226
A 15195 nc 4 10.202.210.1 10.202.109.12 8080 43904
X 15195 nc 4 10.202.210.1 10.202.109.12 8080 43904
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the PID column header really misaligned now with the first digit of the pid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops. No, that was me being off-by-one when editing the file. Let me re-run the test scenario.

@iaguis iaguis force-pushed the iaguis/to_upstream/tcp_tracer branch from c717376 to 4848c4f Compare March 29, 2017 10:34
@goldshtn
Copy link
Collaborator

[buildbot, ok to test]

@drzaeus77
Copy link
Collaborator

@goldshtn
I updated one more setting on the buildbot, your [] should work next time.

@brendangregg
Copy link
Member

LGTM

@brendangregg
Copy link
Member

[buildbot, ok to test]

@goldshtn goldshtn merged commit f37434b into iovisor:master Mar 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants