Conntrack netlink #2130

josephglanville · 2017-01-15T05:25:32Z

This switches the endpoint probe to use the netfilter netlink interface over spawning conntrack as a child process. This is somewhat of a half measure between where master is now and the eBPF solution.

Netlink code is based on https://github.com/typetypetype/conntrack but modified substantially.

Also took the opportunity to switch to net.IP and standardise on uint16 for port numbers and clean up the Flow data structure.

It wasn't clear how to write unit tests for the netlink interface other than perhaps bin dumping some packets and testing the parsing logic on raw bytes, unsure of the usefulness of this.

Performance is much better on my machine in normal usage (2-3%/core less usage in absolute terms) however under very high load I still see the socket buffer getting overrun. I am unsure of the cause as the parsing no longer takes significant CPU.

2opremio · 2017-01-16T08:41:10Z

Thanks for this @josephglanville !

Fixes #1991 (alternatively to #2095)

Can I assume that the 2/3% was against master?

@alban Would you mind testing this against #2118 ?

josephglanville · 2017-01-16T10:01:12Z

@2opremio Yes that was against master. I couldn't get pprof to give me numbers for my function, I need to read up more on how to operate it.

alban · 2017-01-16T12:06:02Z

@alban Would you mind testing this against #2118 ?

Here are the results with the method described on #2118 (comment) and #2118 (comment) (4 clients connecting repetitively to nginx).

I ran each test twice just to make sure the tests were consistent, so I there are two numbers for each test:

With this PR: conntrack.readMsgs: 1.89s, 2.10s
With probe: conntrack: fix output parsing #2118: decodeStreamedFlow: 1.59s, 1.30s

However, using just pprof is not a fair comparison for comparing with #2118 because pprof does not count the time spent in the conntrack processes. So although this PR takes more CPU than #2118, it might still be better to take it.

See the pprof graphs below. I only include the graphs for the first test.

2opremio · 2017-01-16T13:41:16Z

@josephglanville A couple of things:

Would it make sense for you to submit your changes to https://github.com/typetypetype/conntrack upstream (or even better, implement Add Conntrack support (NETLINK_NETFILTER protocol) vishvananda/netlink#171 , since it's a maintained library ) instead of embedding the code in Scope?
Also, netlink is hairy, so I would expect some tests.

2opremio · 2017-01-16T13:44:50Z

My concern is that supporting the NETLINK_NETFILTER protocol doesn't seem to improve performance by much compared to #2118 but it does add some complicated code.

alban · 2017-01-16T15:41:21Z

I tried a different method to compare the perfs of this PR versus #2118. I still run 4 clients as described in #2118 (comment) but I use the cpuacct cgroup instead of pprof. In this way, it should capture the cpu time spent by all the processes of the Scope Docker container (i.e. the separate conntrack processes are included in the case of #2118 ).

I used the following script:

#!/bin/bash

ID=$(sudo docker ps --no-trunc|grep weavescope|awk '{print $1}')
ACCT=/sys/fs/cgroup/cpu,cpuacct/system.slice/docker-${ID}.scope/cpuacct.usage

echo 0 | sudo tee $ACCT > /dev/null
sleep 30
cat $ACCT

I converted the nanoseconds into seconds and I got the following results. I ran each test twice, so there are 2 columns of numbers:

PR	test 1	test 2
this PR	12.726s	12.725s
#2118	12.575s	11.767s

There does not seem to be a meaningful performance difference between the two PRs.

josephglanville · 2017-01-16T15:47:21Z

I don't think upstream is maintained, if you don't want to carry this code in Scope I can look at implementing it in Vish's library - it looks pretty decent, though will need some extensions to support this.

As @alban pointed out the CPU savings also include the conntrack binary which actually uses a decent amount of CPU to process the event buffer.
Quantifying this is somewhat difficult but eyeballing it: when probe is consuming 30% CPU on my machine conntrack is using 10%. This is a very significant saving, accounting for 25% of overall CPU usage (on master).

When quoting numbers before I didn't account for this, comparing this branch to master not only does probe use less CPU but conntrack is not using any CPU because it's not running.

To summarise:
#2118 only speeds up the text processing. So CPU used to track connections is text processing + conntrack CPU usage.
Whereas this changeset uses the roughly the same amount of cycles to do everything conntrack is doing + processing.

The real question here is how far away is the eBPF branch?
Because that is a more holistic approach to fixing probe performance from a data collection standpoint.
If it's ready soon then I agree merging this doesn't make sense, however if it's not ready for some time then this achieves very real performance benefits over and above what #2118 can deliver.

josephglanville · 2017-01-16T15:52:25Z

@alban fair enough, I didn't think of using cpuacct to measure it.
Performance difference seems immaterial to me so the simpler option should be favoured.
Closing in favour of #2118.

2opremio · 2017-01-16T15:57:39Z

This is a very significant saving

Unfortunately @alban's measurements show otherwise :(

The real question here is how far away is the eBPF branch?

The ebpf branch still uses conntrack for NAT tracking. So we still need these improvements.

I don't think upstream is maintained, if you don't want to carry this code in Scope I can look at implementing it in Vish's library - it looks pretty decent, though will need some extensions to support this.

There are high chances of merging this if incorporated on vish's library (even when showing similar performance to the text parser).

Thanks a lot for the effort @josephglanville ! Truly. It's just a matter of maintainability. nothing against your code.

alban · 2017-01-16T16:03:04Z

Quantifying this is somewhat difficult but eyeballing it: when probe is consuming 30% CPU on my
machine conntrack is using 10%. This is a very significant saving, accounting for 25% of overall CPU
usage (on master).

I am curious if you see one or two conntrack processes in the "top" list. "ps aux" shows the following two processes:

conntrack --buffer-size 212992 -E -o id -p tcp --any-nat
conntrack --buffer-size 212992 -E -o id -p tcp

In my test I saw the first conntrack with 0.7% in the "top" list and the second conntrack with 3.6%. Maybe your test scenario is different than mine.

The real question here is how far away is the eBPF branch?
Because that is a more holistic approach to fixing probe performance from a data collection standpoint.

@iaguis should do a PR soon (today/tomorrow?).
But the ebpf branch really removes only one conntrack. The conntrack for the NAT is still there. (Actually, the other conntrack is still executed to get the initial state but it is stopped shortly afterwards.)

josephglanville · 2017-01-16T16:15:29Z

@alban I am using Apache Bench to generate many flows against a simple webserver. I see 2 conntrack processes, one using fairly little CPU and one using about 10%.

I am also using a much larger buffer size to prevent conntrack from erroring out with ENOBUFS.
Perhaps you should also set the buffer much larger as it's likely without that both conntrack implementations arent running in realtime mode for the duration of the test.
i.e --probe.conntrack.buffer size=16777216.

16MB might seem like a lot but when handling many many connections it still overflows during burst but my implementation can maintain around ~6000 flows established+torndown per second, i.e approx 18000 conntrack event/s.

bboreham · 2018-07-23T22:34:48Z

FYI I started to look at this again. I notice the upstream library https://github.com/typetypetype/conntrack/ shows some activity - it has had 9 PRs merged since December.

probe/endpoint/conntrack/conntrack.go

+		return 0, nil, err
+	}
+	if bufferSize > 0 {
+		if err := syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_RCVBUFFORCE, bufferSize); err != nil {


josephglanville added 2 commits January 16, 2017 19:45

Import conntrack library

ce97714

Switch to new conntrack library

f689ba3

josephglanville force-pushed the conntrack-netlink branch from 2b92127 to f689ba3 Compare January 16, 2017 08:45

josephglanville closed this Jan 16, 2017

josephglanville mentioned this pull request May 5, 2017

Use Netlink to read conntrack flows #2495

Closed

bboreham reviewed Aug 5, 2018

View reviewed changes

probe/endpoint/conntrack/conntrack.go

return 0, nil, err

}

if bufferSize > 0 {

if err := syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_RCVBUFFORCE, bufferSize); err != nil {

This comment was marked as abuse.

Sign in to view

This comment was marked as abuse.

Sign in to view

bboreham mentioned this pull request Aug 5, 2018

Probe: use netlink to talk to conntrack #3298

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conntrack netlink #2130

Conntrack netlink #2130

josephglanville commented Jan 15, 2017

2opremio commented Jan 16, 2017 •

edited

Loading

josephglanville commented Jan 16, 2017

alban commented Jan 16, 2017 •

edited

Loading

2opremio commented Jan 16, 2017

2opremio commented Jan 16, 2017 •

edited

Loading

alban commented Jan 16, 2017

josephglanville commented Jan 16, 2017

josephglanville commented Jan 16, 2017

2opremio commented Jan 16, 2017

alban commented Jan 16, 2017

josephglanville commented Jan 16, 2017

bboreham commented Jul 23, 2018

This comment was marked as abuse.

This comment was marked as abuse.

Conntrack netlink #2130

Conntrack netlink #2130

Conversation

josephglanville commented Jan 15, 2017

2opremio commented Jan 16, 2017 • edited Loading

josephglanville commented Jan 16, 2017

alban commented Jan 16, 2017 • edited Loading

2opremio commented Jan 16, 2017

2opremio commented Jan 16, 2017 • edited Loading

alban commented Jan 16, 2017

josephglanville commented Jan 16, 2017

josephglanville commented Jan 16, 2017

2opremio commented Jan 16, 2017

alban commented Jan 16, 2017

josephglanville commented Jan 16, 2017

bboreham commented Jul 23, 2018

This comment was marked as abuse.

This comment was marked as abuse.

2opremio commented Jan 16, 2017 •

edited

Loading

alban commented Jan 16, 2017 •

edited

Loading

2opremio commented Jan 16, 2017 •

edited

Loading