Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eBPF connection tracking does not support socket fd moving between processes #2253

Open
alban opened this issue Feb 16, 2017 · 2 comments
Open
Labels
accuracy Incorrect information is being shown to the user; usually a bug bug Broken end user or developer functionality; not working as the developers intended it
Milestone

Comments

@alban
Copy link
Contributor

alban commented Feb 16, 2017

Capturing #2135 (review):

When a socket is shared by more than one process, eg. via fork(), dupfd(), sending the fd over a unix socket, this code causes scope to continue to associate it with the original connecting process, even after this process is long dead. This produces an odd, empty-except-for-pid node with the connection. In the worst case, it could attribute the connection to a totally different process that has since started using the old pid.

In contrast, the old procspy method would attribute it to an arbitrary process that is still alive, which isn't perfect but is better.

One scenario where this affects the user-experience is when sshd is socket acticated. On CoreOS with systemd socket-activated sshd, it appeared that all incoming connections belong to pid 1 (systemd) instead of a ssh process because of socket activation.

It's difficult to solve in a generic way but for the socket activation case, maybe we could use the proc connector (see also #1931) to be notified of new processes and check if they inherited fds to reassign sockets to their "last" owner.

Do we have other examples of real applications using fork/dupfd/fd-passing to transfer the socket from one process to another?

/cc @ekimekim @2opremio

@ekimekim
Copy link
Contributor

ekimekim commented Feb 16, 2017

Two other examples of kinds of applications come to mind:

  • A server that accepts connections and hands them out to one or more child processes for processing. This should be relatively rare, since most programs of this type either do the accept() in the child (eg. apache) or use threading, not fully-fledged child processes.
  • A program that performs hot reloading by spawning a child and passing all active connections to it. I was going to say that nginx and haproxy did this, but on closer inspection both of them do a soft restart by no longer accepting new connections but staying alive until old connections are finished. So I can't name an example of a program that does this either, but I wouldn't be surprised if one exists.
    This second type is how I performed my test confirming this behaviour, but it was a synthetic test - I wrote a short python script that had this behaviour just for testing.

In short, I can come up with a few cases where a user's program may wish to do something like this, but I can't name any commonly-used services which do. Part of why I stressed that I didn't see this as an urgent issue.

It should also be noted that the old code fares no better than the EBPF code when both processes remain alive - in all my testing, it stayed showing the endpoint connected to the original PID, though it's possible that was just coincidence and sometimes it may select the other PID instead.

As I see it, there's three fixes to be made here. In order of complexity:

  • (easy) Prevent the "empty node" from displaying by filtering out nodes which lack basic characteristics such as name at the renderer level. This would have the effect of hiding the offending connection instead of attributing it to a non-existent process. This is the only one I actually think is worth doing for the forseeable future.
  • (hard) Track the connection's ownership as a set of processes using additional kprobes. We could then use a consistent result like "most recently added process that is still alive", which would solve the socket activation case.
  • (very hard) In addition to the above, change Scope's internal data structures to allow a connection to map to multiple processes, so that even in cases where two processes hold the connection for a long time (I don't know of any use case that does this), scope will correctly show them both.

@rade rade added chore Related to fix/refinement/improvement of end user or new/existing developer functionality accuracy Incorrect information is being shown to the user; usually a bug bug Broken end user or developer functionality; not working as the developers intended it and removed chore Related to fix/refinement/improvement of end user or new/existing developer functionality labels Feb 20, 2017
@2opremio 2opremio added this to the ebpf tracking milestone Mar 5, 2017
@rade
Copy link
Member

rade commented Jun 25, 2017

I am pretty sure the apache example - multiple fork()ed processes that all do accept() - will trigger #1849.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accuracy Incorrect information is being shown to the user; usually a bug bug Broken end user or developer functionality; not working as the developers intended it
Projects
None yet
Development

No branches or pull requests

4 participants