-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conntrack netlink #2130
Conntrack netlink #2130
Conversation
Thanks for this @josephglanville ! Fixes #1991 (alternatively to #2095) Can I assume that the 2/3% was against master? |
2b92127
to
f689ba3
Compare
@2opremio Yes that was against master. I couldn't get pprof to give me numbers for my function, I need to read up more on how to operate it. |
Here are the results with the method described on #2118 (comment) and #2118 (comment) (4 clients connecting repetitively to nginx). I ran each test twice just to make sure the tests were consistent, so I there are two numbers for each test:
However, using just pprof is not a fair comparison for comparing with #2118 because pprof does not count the time spent in the conntrack processes. So although this PR takes more CPU than #2118, it might still be better to take it. See the pprof graphs below. I only include the graphs for the first test. |
@josephglanville A couple of things:
|
My concern is that supporting the |
I tried a different method to compare the perfs of this PR versus #2118. I still run 4 clients as described in #2118 (comment) but I use the cpuacct cgroup instead of pprof. In this way, it should capture the cpu time spent by all the processes of the Scope Docker container (i.e. the separate conntrack processes are included in the case of #2118 ). I used the following script: #!/bin/bash
ID=$(sudo docker ps --no-trunc|grep weavescope|awk '{print $1}')
ACCT=/sys/fs/cgroup/cpu,cpuacct/system.slice/docker-${ID}.scope/cpuacct.usage
echo 0 | sudo tee $ACCT > /dev/null
sleep 30
cat $ACCT I converted the nanoseconds into seconds and I got the following results. I ran each test twice, so there are 2 columns of numbers:
There does not seem to be a meaningful performance difference between the two PRs. |
I don't think upstream is maintained, if you don't want to carry this code in Scope I can look at implementing it in Vish's library - it looks pretty decent, though will need some extensions to support this. As @alban pointed out the CPU savings also include the conntrack binary which actually uses a decent amount of CPU to process the event buffer. When quoting numbers before I didn't account for this, comparing this branch to master not only does probe use less CPU but conntrack is not using any CPU because it's not running. To summarise: The real question here is how far away is the eBPF branch? |
Unfortunately @alban's measurements show otherwise :(
The ebpf branch still uses conntrack for NAT tracking. So we still need these improvements.
There are high chances of merging this if incorporated on vish's library (even when showing similar performance to the text parser). Thanks a lot for the effort @josephglanville ! Truly. It's just a matter of maintainability. nothing against your code. |
I am curious if you see one or two conntrack processes in the "top" list. "ps aux" shows the following two processes:
In my test I saw the first conntrack with 0.7% in the "top" list and the second conntrack with 3.6%. Maybe your test scenario is different than mine.
@iaguis should do a PR soon (today/tomorrow?). |
@alban I am using Apache Bench to generate many flows against a simple webserver. I see 2 conntrack processes, one using fairly little CPU and one using about 10%. I am also using a much larger buffer size to prevent conntrack from erroring out with ENOBUFS. 16MB might seem like a lot but when handling many many connections it still overflows during burst but my implementation can maintain around ~6000 flows established+torndown per second, i.e approx 18000 conntrack event/s. |
FYI I started to look at this again. I notice the upstream library https://github.com/typetypetype/conntrack/ shows some activity - it has had 9 PRs merged since December. |
return 0, nil, err | ||
} | ||
if bufferSize > 0 { | ||
if err := syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_RCVBUFFORCE, bufferSize); err != nil { |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This switches the endpoint probe to use the netfilter netlink interface over spawning conntrack as a child process. This is somewhat of a half measure between where master is now and the eBPF solution.
Netlink code is based on https://github.com/typetypetype/conntrack but modified substantially.
Also took the opportunity to switch to
net.IP
and standardise onuint16
for port numbers and clean up theFlow
data structure.It wasn't clear how to write unit tests for the netlink interface other than perhaps bin dumping some packets and testing the parsing logic on raw bytes, unsure of the usefulness of this.
Performance is much better on my machine in normal usage (2-3%/core less usage in absolute terms) however under very high load I still see the socket buffer getting overrun. I am unsure of the cause as the parsing no longer takes significant CPU.