-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple performance regressions #1940
Comments
One of the options to debug TCP latency issues is to use SO_TIMESTAMPING, e.g. generate a massive workload and trace a single testing TCP stream with the timestamps. References:
BPF_PROG_TYPE_SOCK_OPS also allows tracing of TCP issues with retransmissions and windows |
HTTP/2 benchmarkCurrent
and the old website index page (~26KB).
The same workload against Nginx:
RPS, latencies, TTFB are better for Tempesta FW, but not so significantly. Nginx on the standard Ubuntu kernel
, so apparently there is a kernel regression. |
Need to create a new Wiki page about HTTP/2 performance to report the current performance and make the results reproducable |
We should use the
|
I have the following results on our server: for nginx:
for Tempesta:
|
10 bytes response nginx: Tempesta: On branch MekhanikEvgenii/fix-socket-cpu-migration |
10 bytes keep-alive nginx: Tempesta: On branch MekhanikEvgenii/fix-socket-cpu-migration |
10 Kb response nginx: taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com On branch MekhanikEvgenii/fix-socket-cpu-migration |
Testing environment: Used 3 CPUs VM for Tempesta, 3 CPUs VM for client. 612 bytes content-length. 210 headers length on nginx. 257 headers length on Tempesta. Default minimal config with enabled caching. Interesting thing, testing Tempesta I can't load processor more then 50% doesn't meter how many CPUs used generating traffic. For both cases(1 and 3 CPU per client) I got maximum 50% of average cpu load. However, nginx has average load around 100% when for generating traffic used 3 CPUs. Tempesta:
Nginx:
|
PING Flood Handling BenchmarkRelated issue: #2117. Install python3.12 and golang. Refer to these source files:
sudo python3.12 h2_server.py
./ping_flood -address 192.168.2.1:443 -threads 1 -connections 10 -debug 1 Use
sudo -E TFW_CFG_PATH=$HOME/tempesta-ping.conf ./scripts/tempesta.sh --restart
./ping_flood -address 192.168.2.1:443 -threads 1 -connections 10 -debug 1
|
Make sure that during the verification tests the user space servers run on vanilla kernel (our patch still implies changes even with undefined Tempesta).
Test configuration
Benchmark results on CPU i9-12900HK (0-11 are performance cores) for the current master d7294b8 (the same as release-0.7 as of Jul 13) vs Nginx 1.18.0 build with OpenSSL 3.0.2. Tempesta FW and Nginx are running in a VM with 2 vCPUs bound to performance cores 0 and 2. All the benchmark tools were running from the host machine. The VM uses multiqueue virtio-net.
Tempesta config:
Nginx config servicing
/var/www/tempesta-tech.com
with 26KB index file or/var/www/html
with a 10 bytes index:TLS regression
We saw the problem earlier on migration from the Linux kerne 4.14 to 5.10 #1504 (comment) . Also see the tail latency problem in #1434
The reference could the FOSDEM'21 demo (results were pretty stable and we ran it on different machines):
OpenSSL 3 also integrated Bernstein elliptic curve multiplication, so it's expected that it's faster than OpenSSL 1, which we tested previously. But there is still no reason why Tempesta TLS can be slower.
10 bytes cached response, no keep-alive
Nginx is only negligibly slower. I'd suppose that
wrk
uses abbreviated TLS handshakes, but I'm not sure and this should be verified.10 bytes cached response, keep-alive
Nginx is almost 2 times faster than Tempesta.
26KB cached response, no keep-alive
Nginx is still faster, but not so dramatically. IIRC on today demo we saw reversed results: Tempesta behaved better on the smaller file than on the large one.
26KB cached response, keep-alive
Nginx shows about 2.5 better result.
Conclusions
TODO
Also need to test large body workload (VOD), about 5-10MB to test**
The text was updated successfully, but these errors were encountered: