Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conntrack netlink #2130

Closed

Conversation

josephglanville
Copy link
Contributor

This switches the endpoint probe to use the netfilter netlink interface over spawning conntrack as a child process. This is somewhat of a half measure between where master is now and the eBPF solution.

Netlink code is based on https://github.com/typetypetype/conntrack but modified substantially.

Also took the opportunity to switch to net.IP and standardise on uint16 for port numbers and clean up the Flow data structure.

It wasn't clear how to write unit tests for the netlink interface other than perhaps bin dumping some packets and testing the parsing logic on raw bytes, unsure of the usefulness of this.

Performance is much better on my machine in normal usage (2-3%/core less usage in absolute terms) however under very high load I still see the socket buffer getting overrun. I am unsure of the cause as the parsing no longer takes significant CPU.

@2opremio
Copy link
Contributor

2opremio commented Jan 16, 2017

Thanks for this @josephglanville !

Fixes #1991 (alternatively to #2095)

Can I assume that the 2/3% was against master?

@alban Would you mind testing this against #2118 ?

@josephglanville
Copy link
Contributor Author

@2opremio Yes that was against master. I couldn't get pprof to give me numbers for my function, I need to read up more on how to operate it.

@alban
Copy link
Contributor

alban commented Jan 16, 2017

@alban Would you mind testing this against #2118 ?

Here are the results with the method described on #2118 (comment) and #2118 (comment) (4 clients connecting repetitively to nginx).

I ran each test twice just to make sure the tests were consistent, so I there are two numbers for each test:

However, using just pprof is not a fair comparison for comparing with #2118 because pprof does not count the time spent in the conntrack processes. So although this PR takes more CPU than #2118, it might still be better to take it.

See the pprof graphs below. I only include the graphs for the first test.

conntrack

parsing

@2opremio
Copy link
Contributor

@josephglanville A couple of things:

@2opremio
Copy link
Contributor

2opremio commented Jan 16, 2017

My concern is that supporting the NETLINK_NETFILTER protocol doesn't seem to improve performance by much compared to #2118 but it does add some complicated code.

@alban
Copy link
Contributor

alban commented Jan 16, 2017

I tried a different method to compare the perfs of this PR versus #2118. I still run 4 clients as described in #2118 (comment) but I use the cpuacct cgroup instead of pprof. In this way, it should capture the cpu time spent by all the processes of the Scope Docker container (i.e. the separate conntrack processes are included in the case of #2118 ).

I used the following script:

#!/bin/bash

ID=$(sudo docker ps --no-trunc|grep weavescope|awk '{print $1}')
ACCT=/sys/fs/cgroup/cpu,cpuacct/system.slice/docker-${ID}.scope/cpuacct.usage

echo 0 | sudo tee $ACCT > /dev/null
sleep 30
cat $ACCT

I converted the nanoseconds into seconds and I got the following results. I ran each test twice, so there are 2 columns of numbers:

PR test 1 test 2
this PR 12.726s 12.725s
#2118 12.575s 11.767s

There does not seem to be a meaningful performance difference between the two PRs.

@josephglanville
Copy link
Contributor Author

I don't think upstream is maintained, if you don't want to carry this code in Scope I can look at implementing it in Vish's library - it looks pretty decent, though will need some extensions to support this.

As @alban pointed out the CPU savings also include the conntrack binary which actually uses a decent amount of CPU to process the event buffer.
Quantifying this is somewhat difficult but eyeballing it: when probe is consuming 30% CPU on my machine conntrack is using 10%. This is a very significant saving, accounting for 25% of overall CPU usage (on master).

When quoting numbers before I didn't account for this, comparing this branch to master not only does probe use less CPU but conntrack is not using any CPU because it's not running.

To summarise:
#2118 only speeds up the text processing. So CPU used to track connections is text processing + conntrack CPU usage.
Whereas this changeset uses the roughly the same amount of cycles to do everything conntrack is doing + processing.

The real question here is how far away is the eBPF branch?
Because that is a more holistic approach to fixing probe performance from a data collection standpoint.
If it's ready soon then I agree merging this doesn't make sense, however if it's not ready for some time then this achieves very real performance benefits over and above what #2118 can deliver.

@josephglanville
Copy link
Contributor Author

@alban fair enough, I didn't think of using cpuacct to measure it.
Performance difference seems immaterial to me so the simpler option should be favoured.
Closing in favour of #2118.

@2opremio
Copy link
Contributor

This is a very significant saving

Unfortunately @alban's measurements show otherwise :(

The real question here is how far away is the eBPF branch?

The ebpf branch still uses conntrack for NAT tracking. So we still need these improvements.

I don't think upstream is maintained, if you don't want to carry this code in Scope I can look at implementing it in Vish's library - it looks pretty decent, though will need some extensions to support this.

There are high chances of merging this if incorporated on vish's library (even when showing similar performance to the text parser).

Thanks a lot for the effort @josephglanville ! Truly. It's just a matter of maintainability. nothing against your code.

@alban
Copy link
Contributor

alban commented Jan 16, 2017

Quantifying this is somewhat difficult but eyeballing it: when probe is consuming 30% CPU on my
machine conntrack is using 10%. This is a very significant saving, accounting for 25% of overall CPU
usage (on master).

I am curious if you see one or two conntrack processes in the "top" list. "ps aux" shows the following two processes:

conntrack --buffer-size 212992 -E -o id -p tcp --any-nat
conntrack --buffer-size 212992 -E -o id -p tcp

In my test I saw the first conntrack with 0.7% in the "top" list and the second conntrack with 3.6%. Maybe your test scenario is different than mine.

The real question here is how far away is the eBPF branch?
Because that is a more holistic approach to fixing probe performance from a data collection standpoint.

@iaguis should do a PR soon (today/tomorrow?).
But the ebpf branch really removes only one conntrack. The conntrack for the NAT is still there. (Actually, the other conntrack is still executed to get the initial state but it is stopped shortly afterwards.)

@josephglanville
Copy link
Contributor Author

@alban I am using Apache Bench to generate many flows against a simple webserver. I see 2 conntrack processes, one using fairly little CPU and one using about 10%.

I am also using a much larger buffer size to prevent conntrack from erroring out with ENOBUFS.
Perhaps you should also set the buffer much larger as it's likely without that both conntrack implementations arent running in realtime mode for the duration of the test.
i.e --probe.conntrack.buffer size=16777216.

16MB might seem like a lot but when handling many many connections it still overflows during burst but my implementation can maintain around ~6000 flows established+torndown per second, i.e approx 18000 conntrack event/s.

@bboreham
Copy link
Collaborator

FYI I started to look at this again. I notice the upstream library https://github.com/typetypetype/conntrack/ shows some activity - it has had 9 PRs merged since December.

return 0, nil, err
}
if bufferSize > 0 {
if err := syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_RCVBUFFORCE, bufferSize); err != nil {

This comment was marked as abuse.

This comment was marked as abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants