Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connection to dead process associated with different process #2638

Closed
rade opened this issue Jun 24, 2017 · 1 comment · Fixed by #2639
Closed

connection to dead process associated with different process #2638

rade opened this issue Jun 24, 2017 · 1 comment · Fixed by #2639
Assignees
Labels
accuracy Incorrect information is being shown to the user; usually a bug bug Broken end user or developer functionality; not working as the developers intended it

Comments

@rade
Copy link
Member

rade commented Jun 24, 2017

Running

$ scope launch --weave=false --probe.ebpf.connections=false
$ tcpserver -v 0 1122 echo foo

and then

$ nc localhost 1122

produces
screenshot from 2017-06-24 08-36-45
i.e. the connection is erroneously associated with the chrome process.

@rade rade added accuracy Incorrect information is being shown to the user; usually a bug bug Broken end user or developer functionality; not working as the developers intended it labels Jun 24, 2017
@rade
Copy link
Member Author

rade commented Jun 24, 2017

During other occurrences of the problem, the destination endpoint got associated with other processes I have running, such as firefox or dropbox. In all cases these are long-running processes, so I don't think the problem is due to pid re-use.

tcpserver forks for new connections, executing the specified program. In the above example the new process terminates quickly, so the connection really shouldn't show up at all in scope since the process associated with the destination is long gone (though the source process remains alive).

When instead running with

$ tcpserver -v 0 1122 cat

which keeps the process running, the endpoint gets associated with the cat process:
screenshot from 2017-06-24 09-37-19
That kinda makes sense, but note that when running with eBPF connection tracking enabled...

$ scope launch --weave=false

the endpoint gets associated with the tcpserver process:
screenshot from 2017-06-24 09-59-01
This happens for both the echo foo and cat tests.

Here are some relevant details from the reports when the problem occurs...

For the first ~16s the connection doesn't show up. During that time the report contains

"xps;127.0.0.1;55794": {
  "adjacency": ["xps;127.0.0.1;1122"],
  "latest": {
    "conntracked": "true"
  },
  "topology": "endpoint"
}
"xps;127.0.0.1;1122": {
  "latest": {
    "conntracked": "true"
  },
  "topology": "endpoint"
},

i.e. the endpoints are conntracked, and are lacking pids.

Once the connection does show up, the report contains this:

"xps-4026531969;127.0.0.1;55794": {
  "adjacency": ["xps-4026531969;127.0.0.1;1122"],
  "latest": {
    "pid": "20692",
    "procspied": "true"
  },
  "topology": "endpoint"
},
"xps-4026531969;127.0.0.1;1122": {
  "latest": {
    "pid": "17057",
    "procspied": "true"
  },
  "topology": "endpoint"
},

So the endpoints are procspied now and have pids.

The connection disappears after after another 70s, at which point the endpoints look like this:

"xps-4026531969;127.0.0.1;55794": {
  "adjacency": ["xps-4026531969;127.0.0.1;1122"],
  "latest": {
    "pid": "20692",
    "procspied": "true"
  },
  "topology": "endpoint"
},
"xps-4026531969;127.0.0.1;1122": {
  "latest": {
    "procspied": "true"
  },
  "topology": "endpoint"
},

i.e. the destination endpoint is no longer associated with a process.

Throughout, the process topology contains

"xps;20692": {
  "latest": {
    "name": "nc"
  },
  "topology": "process"
},"
xps;17057": {
  "latest": {
    "name": "/opt/google/chrome/chrome"
  },
  "topology": "process"
},

So, as we can see, while the connection was showing up in the UI, the destination endpoint was indeed mysteriously associated with the chrome process.

@rade rade self-assigned this Jun 24, 2017
rade added a commit that referenced this issue Jun 24, 2017
ProcNet.Next does not allocate Connection structs, for efficiency.
Instead it always returns a *Connection pointing to the same instance.
As a result, any mutations by the caller to struct elements that
aren't actually set by ProcNet.Next, in particular Connection.Proc,
are carried across to subsequent calls.

This had hilarious consequences: connections referencing an inode
which we hadn't come across during proc walking would be associated
with the process corresponding to the last successfully looked up
inode.

The fix is to clear out the garbage left over from previous calls.

Fixes #2638.
rade added a commit that referenced this issue Jun 24, 2017
ProcNet.Next does not allocate Connection structs, for efficiency.
Instead it always returns a *Connection pointing to the same instance.
As a result, any mutations by the caller to struct elements that
aren't actually set by ProcNet.Next, in particular Connection.Proc,
are carried across to subsequent calls.

This had hilarious consequences: connections referencing an inode
which we hadn't come across during proc walking would be associated
with the process corresponding to the last successfully looked up
inode.

The fix is to clear out the garbage left over from previous calls.

Fixes #2638.
rade added a commit that referenced this issue Jun 25, 2017
ProcNet.Next does not allocate Connection structs, for efficiency.
Instead it always returns a *Connection pointing to the same instance.
As a result, any mutations by the caller to struct elements that
aren't actually set by ProcNet.Next, in particular Connection.Proc,
are carried across to subsequent calls.

This had hilarious consequences: connections referencing an inode
which we hadn't come across during proc walking would be associated
with the process corresponding to the last successfully looked up
inode.

The fix is to clear out the garbage left over from previous calls.

Fixes #2638.
rade added a commit that referenced this issue Jun 26, 2017
ensure connections from /proc/net/tcp{,6} get the right pid

Fixes #2638.
2opremio pushed a commit that referenced this issue Jul 5, 2017
ProcNet.Next does not allocate Connection structs, for efficiency.
Instead it always returns a *Connection pointing to the same instance.
As a result, any mutations by the caller to struct elements that
aren't actually set by ProcNet.Next, in particular Connection.Proc,
are carried across to subsequent calls.

This had hilarious consequences: connections referencing an inode
which we hadn't come across during proc walking would be associated
with the process corresponding to the last successfully looked up
inode.

The fix is to clear out the garbage left over from previous calls.

Fixes #2638.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accuracy Incorrect information is being shown to the user; usually a bug bug Broken end user or developer functionality; not working as the developers intended it
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant