You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
accuracyIncorrect information is being shown to the user; usually a bugbugBroken end user or developer functionality; not working as the developers intended it
When a socket is shared by more than one process, eg. via fork(), dupfd(), sending the fd over a unix socket, this code causes scope to continue to associate it with the original connecting process, even after this process is long dead. This produces an odd, empty-except-for-pid node with the connection. In the worst case, it could attribute the connection to a totally different process that has since started using the old pid.
In contrast, the old procspy method would attribute it to an arbitrary process that is still alive, which isn't perfect but is better.
One scenario where this affects the user-experience is when sshd is socket acticated. On CoreOS with systemd socket-activated sshd, it appeared that all incoming connections belong to pid 1 (systemd) instead of a ssh process because of socket activation.
It's difficult to solve in a generic way but for the socket activation case, maybe we could use the proc connector (see also #1931) to be notified of new processes and check if they inherited fds to reassign sockets to their "last" owner.
Do we have other examples of real applications using fork/dupfd/fd-passing to transfer the socket from one process to another?
Two other examples of kinds of applications come to mind:
A server that accepts connections and hands them out to one or more child processes for processing. This should be relatively rare, since most programs of this type either do the accept() in the child (eg. apache) or use threading, not fully-fledged child processes.
A program that performs hot reloading by spawning a child and passing all active connections to it. I was going to say that nginx and haproxy did this, but on closer inspection both of them do a soft restart by no longer accepting new connections but staying alive until old connections are finished. So I can't name an example of a program that does this either, but I wouldn't be surprised if one exists.
This second type is how I performed my test confirming this behaviour, but it was a synthetic test - I wrote a short python script that had this behaviour just for testing.
In short, I can come up with a few cases where a user's program may wish to do something like this, but I can't name any commonly-used services which do. Part of why I stressed that I didn't see this as an urgent issue.
It should also be noted that the old code fares no better than the EBPF code when both processes remain alive - in all my testing, it stayed showing the endpoint connected to the original PID, though it's possible that was just coincidence and sometimes it may select the other PID instead.
As I see it, there's three fixes to be made here. In order of complexity:
(easy) Prevent the "empty node" from displaying by filtering out nodes which lack basic characteristics such as name at the renderer level. This would have the effect of hiding the offending connection instead of attributing it to a non-existent process. This is the only one I actually think is worth doing for the forseeable future.
(hard) Track the connection's ownership as a set of processes using additional kprobes. We could then use a consistent result like "most recently added process that is still alive", which would solve the socket activation case.
(very hard) In addition to the above, change Scope's internal data structures to allow a connection to map to multiple processes, so that even in cases where two processes hold the connection for a long time (I don't know of any use case that does this), scope will correctly show them both.
rade
added
chore
Related to fix/refinement/improvement of end user or new/existing developer functionality
accuracy
Incorrect information is being shown to the user; usually a bug
bug
Broken end user or developer functionality; not working as the developers intended it
and removed
chore
Related to fix/refinement/improvement of end user or new/existing developer functionality
labels
Feb 20, 2017
accuracyIncorrect information is being shown to the user; usually a bugbugBroken end user or developer functionality; not working as the developers intended it
Capturing #2135 (review):
One scenario where this affects the user-experience is when sshd is socket acticated. On CoreOS with systemd socket-activated sshd, it appeared that all incoming connections belong to pid 1 (systemd) instead of a ssh process because of socket activation.
It's difficult to solve in a generic way but for the socket activation case, maybe we could use the proc connector (see also #1931) to be notified of new processes and check if they inherited fds to reassign sockets to their "last" owner.
Do we have other examples of real applications using fork/dupfd/fd-passing to transfer the socket from one process to another?
/cc @ekimekim @2opremio
The text was updated successfully, but these errors were encountered: