eBPF connection tracking does not support socket fd moving between processes #2253

alban · 2017-02-16T14:22:24Z

When a socket is shared by more than one process, eg. via fork(), dupfd(), sending the fd over a unix socket, this code causes scope to continue to associate it with the original connecting process, even after this process is long dead. This produces an odd, empty-except-for-pid node with the connection. In the worst case, it could attribute the connection to a totally different process that has since started using the old pid.

In contrast, the old procspy method would attribute it to an arbitrary process that is still alive, which isn't perfect but is better.

One scenario where this affects the user-experience is when sshd is socket acticated. On CoreOS with systemd socket-activated sshd, it appeared that all incoming connections belong to pid 1 (systemd) instead of a ssh process because of socket activation.

It's difficult to solve in a generic way but for the socket activation case, maybe we could use the proc connector (see also #1931) to be notified of new processes and check if they inherited fds to reassign sockets to their "last" owner.

Do we have other examples of real applications using fork/dupfd/fd-passing to transfer the socket from one process to another?

/cc @ekimekim @2opremio

ekimekim · 2017-02-16T18:17:47Z

Two other examples of kinds of applications come to mind:

A server that accepts connections and hands them out to one or more child processes for processing. This should be relatively rare, since most programs of this type either do the accept() in the child (eg. apache) or use threading, not fully-fledged child processes.
A program that performs hot reloading by spawning a child and passing all active connections to it. I was going to say that nginx and haproxy did this, but on closer inspection both of them do a soft restart by no longer accepting new connections but staying alive until old connections are finished. So I can't name an example of a program that does this either, but I wouldn't be surprised if one exists.
This second type is how I performed my test confirming this behaviour, but it was a synthetic test - I wrote a short python script that had this behaviour just for testing.

In short, I can come up with a few cases where a user's program may wish to do something like this, but I can't name any commonly-used services which do. Part of why I stressed that I didn't see this as an urgent issue.

It should also be noted that the old code fares no better than the EBPF code when both processes remain alive - in all my testing, it stayed showing the endpoint connected to the original PID, though it's possible that was just coincidence and sometimes it may select the other PID instead.

As I see it, there's three fixes to be made here. In order of complexity:

(easy) Prevent the "empty node" from displaying by filtering out nodes which lack basic characteristics such as name at the renderer level. This would have the effect of hiding the offending connection instead of attributing it to a non-existent process. This is the only one I actually think is worth doing for the forseeable future.
(hard) Track the connection's ownership as a set of processes using additional kprobes. We could then use a consistent result like "most recently added process that is still alive", which would solve the socket activation case.
(very hard) In addition to the above, change Scope's internal data structures to allow a connection to map to multiple processes, so that even in cases where two processes hold the connection for a long time (I don't know of any use case that does this), scope will correctly show them both.

rade · 2017-06-25T11:59:22Z

I am pretty sure the apache example - multiple fork()ed processes that all do accept() - will trigger #1849.

alban mentioned this issue Feb 16, 2017

Add eBPF connection tracking without dependencies on kernel headers #2135

Merged

2 tasks

2opremio added this to the ebpf tracking milestone Mar 5, 2017

rade mentioned this issue Jun 26, 2017

different process association of existing vs new connections #2647

Open

rade modified the milestones: Backlog, ebpf tracking Jul 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eBPF connection tracking does not support socket fd moving between processes #2253

eBPF connection tracking does not support socket fd moving between processes #2253

alban commented Feb 16, 2017

ekimekim commented Feb 16, 2017 •

edited

Loading

rade commented Jun 25, 2017 •

edited

Loading

eBPF connection tracking does not support socket fd moving between processes #2253

eBPF connection tracking does not support socket fd moving between processes #2253

Comments

alban commented Feb 16, 2017

ekimekim commented Feb 16, 2017 • edited Loading

rade commented Jun 25, 2017 • edited Loading

ekimekim commented Feb 16, 2017 •

edited

Loading

rade commented Jun 25, 2017 •

edited

Loading