Multiple performance regressions #1940

krizhanovsky · 2023-07-13T23:19:26Z

Make sure that during the verification tests the user space servers run on vanilla kernel (our patch still implies changes even with undefined Tempesta).

Test configuration

Benchmark results on CPU i9-12900HK (0-11 are performance cores) for the current master d7294b8 (the same as release-0.7 as of Jul 13) vs Nginx 1.18.0 build with OpenSSL 3.0.2. Tempesta FW and Nginx are running in a VM with 2 vCPUs bound to performance cores 0 and 2. All the benchmark tools were running from the host machine. The VM uses multiqueue virtio-net.

Tempesta config:

listen 192.168.100.4:443 proto=https;
listen 192.168.100.4:80;

frang_limits {
	http_methods GET HEAD;
	http_uri_len 1000;
}

block_action attack reply;
block_action error reply;

srv_group default {
	server 192.168.100.4:8000;
}

vhost tempesta-tech.com {
	tls_certificate /root/tempesta/etc/tfw-root.crt;
        tls_certificate_key /root/tempesta/etc/tfw-root.key;
	tls_tickets secret="f00)9eR59*_/22" lifetime=7200;

	resp_hdr_set Strict-Transport-Security "max-age=31536000; includeSubDomains";

	proxy_pass default;
}

cache 1;
cache_fulfill * *;

http_chain redirection_chain {
	uri == "/blog"		-> 301 = /blog/1;
	uri == "/blog/"		-> 301 = /blog/1;
	uri == "/services"	-> 301 = /development-services;
	uri == "/services.html"	-> 301 = /development-services;
	uri == "/c++-services"	-> 301 = /development-services;
	uri == "/index.html"	-> 301 = /index;
	uri == "/company.html"	-> 301 = /company;
	uri == "/blog/fast-programming-languages-c-c++-rust-assembly" -> 301 = /blog/fast-programming-languages-c-cpp-rust-assembly;

	-> tempesta-tech.com;
}

http_chain {
	host == "tempesta-tech.com" -> redirection_chain;
}

Nginx config servicing /var/www/tempesta-tech.com with 26KB index file or /var/www/html with a 10 bytes index:

user www-data;
worker_processes auto;
pid /run/nginx.pid;

events {
    worker_connections 65535;
    multi_accept on;
    use epoll;
}
worker_rlimit_nofile 65535;

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 100000;
    types_hash_max_size 2048;

    include /etc/nginx/mime.types;
    default_type text/html;

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ssl_certificate /root/tempesta/etc/tfw-root.crt;
    ssl_certificate_key /root/tempesta/etc/tfw-root.key;

    ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256';
    ssl_prefer_server_ciphers on;

    server {
        listen 192.168.100.4:8000;
        listen 192.168.100.4:8443 default_server ssl http2;

        #root /var/www/tempesta-tech.com;
        root /var/www/html;

        index index;

        server_name _;

	add_header X-Crash-1377 ' ';

        location = /blog { # FIXME: this URI still redirects to http://tempesta-tech.com:8000/blog/
            return 301 https://tempesta-tech.com/blog/1;
        }
        location = /blog/ {
            return 301 https://tempesta-tech.com/blog/1;
        }
        location ~ /services(|.html)$ {
            return 301 https://tempesta-tech.com/development-services;
        }
        location ~ /c\+\+-services$ {
            return 301 https://tempesta-tech.com/development-services;
        }
        location = /index.html {
            return 301 https://tempesta-tech.com/index;
        }
        location ~ /blog/fast-programming-languages-c-c\+\+-rust-assembly$ {
            return 301 https://tempesta-tech.com/blog/fast-programming-languages-c-cpp-rust-assembly;
        }

        error_page 403 404 /oops;
    }
}

TLS regression

We saw the problem earlier on migration from the Linux kerne 4.14 to 5.10 #1504 (comment) . Also see the tail latency problem in #1434

$ taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com --tickets on -l 1000 -t 4 -T 10 192.168.100.4 443
( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 10; HANDSHAKES 209609
 HANDSHAKES/sec:  MAX 22239; AVG 20959; 95P 12057; MIN 12057
 LATENCY (ms):    MIN 5; AVG 14; 95P 7; MAX 856

$ taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com --tickets on -l 1000 -t 4 -T 10 192.168.100.4 8443
( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 10; HANDSHAKES 148105
 HANDSHAKES/sec:  MAX 15718; AVG 14808; 95P 10052; MIN 10052
 LATENCY (ms):    MIN 16; AVG 64; 95P 111; MAX 1724

$ taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com -l 1000 -t 4 -T 10 192.168.100.4 443
( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 10; HANDSHAKES 61622
 HANDSHAKES/sec:  MAX 7071; AVG 6160; 95P 4372; MIN 4372
 LATENCY (ms):    MIN 258; AVG 406; 95P 534; MAX 564

$ taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com -l 1000 -t 4 -T 10 192.168.100.4 8443
( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 9; HANDSHAKES 61392
 HANDSHAKES/sec:  MAX 7184; AVG 6496; 95P 5937; MIN 5937
 LATENCY (ms):    MIN 63; AVG 488; 95P 584; MAX 763

The reference could the FOSDEM'21 demo (results were pretty stable and we ran it on different machines):

now abbreviated handshakes are less than 2 times faster than OpenSSL while previously we had more than 2 times.
now we have slower full handshakes while previously we had almost 2 times faster. The difference is quite small and can be just a testing deviation (I just made one run), but we still should not see even equal numbers.

OpenSSL 3 also integrated Bernstein elliptic curve multiplication, so it's expected that it's faster than OpenSSL 1, which we tested previously. But there is still no reason why Tempesta TLS can be slower.

10 bytes cached response, no keep-alive

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    15.38ms   37.70ms   1.57s    97.00%
    Req/Sec     2.99k   409.46     5.75k    71.40%
  354473 requests in 30.10s, 128.12MB read
Requests/sec:  11776.08
Transfer/sec:      4.26MB

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    44.59ms   17.90ms 121.42ms   70.45%
    Req/Sec     2.78k   288.37     4.36k    75.34%
  329486 requests in 30.07s, 83.58MB read
Requests/sec:  10957.63
Transfer/sec:      2.78MB

Nginx is only negligibly slower. I'd suppose that wrk uses abbreviated TLS handshakes, but I'm not sure and this should be verified.

10 bytes cached response, keep-alive

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    23.86ms   20.21ms  77.83ms   73.18%
    Req/Sec    11.83k     6.28k   25.26k    78.99%
  1404153 requests in 30.09s, 482.08MB read
Requests/sec:  46671.73
Transfer/sec:     16.02MB

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    12.62ms    4.77ms 112.47ms   77.13%
    Req/Sec    19.98k     1.42k   22.05k    92.35%
  2368764 requests in 30.05s, 612.20MB read
Requests/sec:  78831.90
Transfer/sec:     20.37MB

Nginx is almost 2 times faster than Tempesta.

26KB cached response, no keep-alive

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    25.35ms   39.93ms   1.58s    96.93%
    Req/Sec     2.04k   331.83     4.76k    71.52%
  242204 requests in 30.08s, 5.92GB read
Requests/sec:   8052.70
Transfer/sec:    201.53MB

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    51.40ms   19.75ms 273.29ms   70.91%
    Req/Sec     2.51k   223.67     3.17k    75.76%
  297959 requests in 30.08s, 7.25GB read
Requests/sec:   9906.66
Transfer/sec:    246.84MB

Nginx is still faster, but not so dramatically. IIRC on today demo we saw reversed results: Tempesta behaved better on the smaller file than on the large one.

26KB cached response, keep-alive

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    87.61ms  108.18ms   1.94s    86.76%
    Req/Sec     4.18k   564.97     8.59k    74.43%
  495844 requests in 30.09s, 12.12GB read
  Socket errors: connect 0, read 0, write 0, timeout 7
Requests/sec:  16477.60
Transfer/sec:    412.44MB

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    24.38ms    2.69ms 123.94ms   84.14%
    Req/Sec    10.29k   755.72    11.72k    79.12%
  1219107 requests in 30.09s, 29.67GB read
Requests/sec:  40511.45
Transfer/sec:      0.99GB

Nginx shows about 2.5 better result.

Conclusions

Having that I observed somewhat different results today on the same software versions, the same hardware and the same command, there are some testing methodology issues. In this case a best practice must be documented in the wiki
I didn't profile any of the cases, but there could be networking issues on our side, e.g. extra copies.
Another possible networking issue could be in TCP, e.g. unacknowledged segments or something like this - need to trace connections

TODO

Also need to test large body workload (VOD), about 5-10MB to test**

The text was updated successfully, but these errors were encountered:

krizhanovsky · 2023-11-01T22:34:18Z

One of the options to debug TCP latency issues is to use SO_TIMESTAMPING, e.g. generate a massive workload and trace a single testing TCP stream with the timestamps. References:

BPF_PROG_TYPE_SOCK_OPS also allows tracing of TCP issues with retransmissions and windows

krizhanovsky · 2023-11-14T23:30:59Z

HTTP/2 benchmark

Current master as of 690cb94 with config

listen 192.168.100.4:443 proto=h2;
listen 192.168.100.4:80;

frang_limits {
	client_header_timeout 20;
	client_body_timeout 10;
	http_header_chunk_cnt 10;
	http_methods GET HEAD;
	http_uri_len 1000;
	http_resp_code_block 400 403 404 100 10;
}

# Allow only following characters in URI (no '%'): /a-zA-Z0-9&?:-._=
http_uri_brange 0x2f 0x41-0x5a 0x61-0x7a 0x30-0x39 0x26 0x3f 0x3a 0x2d 0x2e 0x5f 0x3d;

block_action attack reply;
block_action error reply;

srv_group default {
	server 192.168.100.4:8000;
}

tls_match_any_server_name;
tls_certificate /root/tempesta/etc/tfw-root.crt;
tls_certificate_key /root/tempesta/etc/tfw-root.key;

vhost tempesta-tech.com {
	tls_tickets secret="f00)9eR59*_/22" lifetime=7200;

	resp_hdr_set Strict-Transport-Security "max-age=31536000; includeSubDomains";

	proxy_pass default;
}

cache 1;
cache_fulfill * *;

#access_log on;

http_chain redirection_chain {
	uri == "/blog"		-> 301 = /blog/1;
	uri == "/blog/"		-> 301 = /blog/1;
	uri == "/services"	-> 301 = /development-services;
	uri == "/services.html"	-> 301 = /development-services;
	uri == "/c++-services"	-> 301 = /development-services;
	uri == "/index.html"	-> 301 = /index;
	uri == "/company.html"	-> 301 = /company;
	uri == "/blog/fast-programming-languages-c-c++-rust-assembly" -> 301 = /blog/fast-programming-languages-c-cpp-rust-assembly;

	-> tempesta-tech.com;
}

http_chain {
	host == "tempesta-tech.com" -> redirection_chain;
}

and the old website index page (~26KB).

$ taskset --cpu-list 1,2,4,6 h2load -n 1000000 -c 1024 -t 4 https://tempesta-tech.com
starting benchmark...
spawning thread #0: 256 total client(s). 250000 total requests
spawning thread #1: 256 total client(s). 250000 total requests
spawning thread #2: 256 total client(s). 250000 total requests
spawning thread #3: 256 total client(s). 250000 total requests
client could not connect to hostclient could not connect to host

client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES128-GCM-SHA256
Server Temp Key: ECDH P-256 256 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done

finished in 27.03s, 36736.66 req/s, 915.98MB/s
requests: 1000000 total, 993168 started, 993168 done, 993168 succeeded, 6832 failed, 6832 errored, 0 timeout
status codes: 993168 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 24.18GB (25966282770) total, 214.65MB (225080802) headers (space savings 24.36%), 23.92GB (25688290320) data
                     min         max         mean         sd        +/- sd
time for request:       33us    388.79ms     23.10ms     18.06ms    79.81%
time for connect:   137.49ms    274.73ms    217.24ms     29.47ms    69.22%
time to 1st byte:   201.56ms    307.52ms    239.35ms     24.53ms    55.85%
req/s           :       0.00       56.33       43.69        8.06    76.46%

The same workload against Nginx:

$ taskset --cpu-list 1,2,4,6 h2load -n 1000000 -c 1024 -t 4 https://tempesta-tech.com:8443
starting benchmark...
spawning thread #0: 256 total client(s). 250000 total requests
spawning thread #1: 256 total client(s). 250000 total requests
spawning thread #2: 256 total client(s). 250000 total requests
spawning thread #3: 256 total client(s). 250000 total requests
client could not connect to hostclient could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host

client could not connect to host
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES256-GCM-SHA384
Server Temp Key: X25519 253 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done

finished in 37.75s, 26307.76 req/s, 653.34MB/s
requests: 1000000 total, 993168 started, 993168 done, 993168 succeeded, 6832 failed, 6832 errored, 0 timeout
status codes: 993168 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 24.09GB (25863137721) total, 124.08MB (130105008) headers (space savings 35.15%), 23.92GB (25688290320) data
                     min         max         mean         sd        +/- sd
time for request:       56us    517.08ms     32.86ms     21.60ms    72.57%
time for connect:   217.08ms    394.19ms    334.81ms     29.56ms    71.19%
time to 1st byte:   306.27ms    549.34ms    361.49ms     33.89ms    64.50%
req/s           :       0.00       38.09       30.64        5.50    66.21%

RPS, latencies, TTFB are better for Tempesta FW, but not so significantly.

Nginx on the standard Ubuntu kernel 5.15.0-86-generic:

$ taskset --cpu-list 1,2,4,6 h2load -n 1000000 -c 1024 -t 4 https://tempesta-tech.com:8443
starting benchmark...
spawning thread #0: 256 total client(s). 250000 total requests
spawning thread #1: 256 total client(s). 250000 total requests
spawning thread #2: 256 total client(s). 250000 total requests
spawning thread #3: 256 total client(s). 250000 total requests
client could not connect to hostclient could not connect to host

client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES256-GCM-SHA384
Server Temp Key: X25519 253 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done

finished in 27.87s, 35631.10 req/s, 884.90MB/s
requests: 1000000 total, 993168 started, 993168 done, 993168 succeeded, 6832 failed, 6832 errored, 0 timeout
status codes: 993168 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 24.09GB (25863498283) total, 124.42MB (130465570) headers (space savings 34.97%), 23.92GB (25688290320) data
                     min         max         mean         sd        +/- sd
time for request:       41us    365.74ms     23.85ms     12.87ms    78.71%
time for connect:   202.26ms    435.84ms    341.74ms     41.72ms    76.20%
time to 1st byte:   309.13ms    476.61ms    365.89ms     38.78ms    56.54%
req/s           :       0.00       72.89       43.55       12.67    74.22%

, so apparently there is a kernel regression.

krizhanovsky · 2023-11-16T15:49:47Z

Need to create a new Wiki page about HTTP/2 performance to report the current performance and make the results reproducable

RomanBelozerov · 2023-11-17T07:33:58Z

We should use the -m option for h2load when checking performance. By default max concurrent streams is 1 and in this case we will not see any changes in working with a multiple streams.
In tempesta-test we use: -c 10 and -m 100

-m, --max-concurrent-streams=N Max concurrent streams to issue per session. When http/1.1 is used, this specifies the number of HTTP pipelining requests in-flight.
Default: 1

EvgeniiMekhanik · 2024-03-21T09:33:13Z

I have the following results on our server:

for nginx:

taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com -l 1000 -t 4 -T 10 94.242.233.20 8443

( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 9; HANDSHAKES 29436
 HANDSHAKES/sec:  MAX 3899; AVG 3062; 95P 1015; MIN 1015
 LATENCY (ms):    MIN 162.772; AVG 908.555; 95P 1397.94; MAX 2521.93

taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com --tickets on -l 1000 -t 4 -T 10 94.242.233.20 8443

( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 9; HANDSHAKES 274905
 HANDSHAKES/sec:  MAX 34469; AVG 30146; 95P 7001; MIN 7001
 LATENCY (ms):    MIN 0.399567; AVG 56.5873; 95P 65.331; MAX 76.1464

for Tempesta:

taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com -l 1000 -t 4 -T 10 94.242.233.20 443

( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 8; HANDSHAKES 19854
 HANDSHAKES/sec:  MAX 3269; AVG 2137; 95P 931; MIN 931
 LATENCY (ms):    MIN 25.5909; AVG 1188.36; 95P 1756.09; MAX 2783.18


taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com --tickets on -l 1000 -t 4 -T 10 94.242.233.20 443

( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 9; HANDSHAKES 276967
 HANDSHAKES/sec:  MAX 38332; AVG 30516; 95P 2638; MIN 2638
 LATENCY (ms):    MIN 0.12646; AVG 51.5548; 95P 57.4831; MAX 60.4444

EvgeniiMekhanik · 2024-03-21T13:42:42Z

10 bytes response

nginx:
taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 12.32ms 8.03ms 68.78ms 71.24%
Req/Sec 6.65k 573.02 7.73k 90.37%
792874 requests in 30.10s, 658.60MB read
Requests/sec: 26340.64
Transfer/sec: 21.88MB

Tempesta:
taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 8.88ms 2.66ms 33.44ms 96.10%
Req/Sec 5.77k 1.38k 7.33k 72.29%
611480 requests in 30.08s, 572.66MB read
Requests/sec: 20328.04
Transfer/sec: 19.04MB

On branch MekhanikEvgenii/fix-socket-cpu-migration
taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 8.88ms 3.83ms 371.71ms 92.83%
Req/Sec 7.55k 1.13k 9.60k 72.94%
893831 requests in 30.09s, 835.97MB read
Requests/sec: 29702.13
Transfer/sec: 27.78MB

EvgeniiMekhanik · 2024-03-21T13:47:17Z

10 bytes keep-alive

nginx:
taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.55ms 5.30ms 173.44ms 82.61%
Req/Sec 84.80k 12.83k 105.94k 71.15%
10089405 requests in 30.01s, 8.23GB read
Requests/sec: 336167.84
Transfer/sec: 280.84MB

Tempesta:
taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.77ms 3.22ms 384.87ms 99.72%
Req/Sec 48.36k 3.86k 54.57k 98.48%
5729633 requests in 30.08s, 5.13GB read
Requests/sec: 190455.56
Transfer/sec: 174.67MB

On branch MekhanikEvgenii/fix-socket-cpu-migration
taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.71ms 1.91ms 202.20ms 95.45%
Req/Sec 48.27k 3.45k 55.82k 98.73%
5724774 requests in 30.09s, 5.13GB read
Requests/sec: 190224.66
Transfer/sec: 174.70MB

EvgeniiMekhanik · 2024-03-21T14:03:32Z

10 Kb response

nginx:
taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 9.85ms 5.74ms 61.71ms 82.56%
Req/Sec 7.27k 714.25 8.29k 86.87%
867649 requests in 30.07s, 335.12MB read
Requests/sec: 28850.70
Transfer/sec: 11.14MB

taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 9.06ms 3.13ms 226.67ms 90.75%
Req/Sec 7.43k 0.99k 9.49k 79.93%
880554 requests in 30.05s, 824.65MB read
Requests/sec: 29302.12
Transfer/sec: 27.44MB

On branch MekhanikEvgenii/fix-socket-cpu-migration
taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 8.86ms 3.11ms 283.83ms 86.17%
Req/Sec 7.47k 1.08k 9.55k 76.65%
884835 requests in 30.11s, 829.50MB read
Requests/sec: 29390.95
Transfer/sec: 27.55MB

const-t · 2024-06-10T11:08:20Z

Testing environment: Used 3 CPUs VM for Tempesta, 3 CPUs VM for client. 612 bytes content-length. 210 headers length on nginx. 257 headers length on Tempesta. Default minimal config with enabled caching.

Interesting thing, testing Tempesta I can't load processor more then 50% doesn't meter how many CPUs used generating traffic. For both cases(1 and 3 CPU per client) I got maximum 50% of average cpu load. However, nginx has average load around 100% when for generating traffic used 3 CPUs.

Tempesta:

taskset --cpu-list 5 h2load -c100 -m95 -t2 -D20 https://ubuntu
finished in 20.03s, 271824.35 req/s, 207.64MB/s
requests: 5436487 total, 5445987 started, 5436487 done, 5436487 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 5436487 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 4.06GB (4354630287) total, 886.57MB (929639277) headers (space savings 24.00%), 3.10GB (3327130044) data
                     min         max         mean         sd        +/- sd
time for request:      935us    267.49ms     31.21ms     15.22ms    68.37%
time for connect:     4.78ms     40.09ms     26.07ms     12.22ms    35.00%
time to 1st byte:    18.30ms     53.28ms     39.96ms     10.40ms    59.00%
req/s           :    2237.13     3182.46     2718.05      220.84    65.00%

In: 50 Mbit/s
Out: 1.83 GBit/s

taskset --cpu-list 3,4,5 h2load -c100 -m95 -t2 -D20 https://ubuntu
finished in 20.03s, 224867.65 req/s, 171.99MB/s
requests: 4497353 total, 4506853 started, 4497353 done, 4497353 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4497353 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.36GB (3606881306) total, 737.71MB (773544716) headers (space savings 23.89%), 2.56GB (2752380036) data
                     min         max         mean         sd        +/- sd
time for request:      250us    661.34ms     40.37ms     28.09ms    78.48%
time for connect:     7.61ms     36.24ms     25.25ms      8.26ms    71.00%
time to 1st byte:    18.72ms     51.68ms     35.63ms      9.60ms    61.00%
req/s           :    1527.13     4271.32     2248.38      520.91    63.00%

In: 105 Mbit/s
Out: 1.45 GBit/s

taskset --cpu-list 5 h2load -c10 -m95 -t2 -D20 https://ubuntu
finished in 20.00s, 288863.05 req/s, 220.94MB/s
requests: 5777261 total, 5778211 started, 5777261 done, 5777261 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 5777261 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 4.32GB (4633363742) total, 947.66MB (993688892) headers (space savings 23.89%), 3.29GB (3535683732) data
                     min         max         mean         sd        +/- sd
time for request:      213us     15.70ms      2.55ms      1.17ms    80.37%
time for connect:     2.80ms      6.68ms      4.30ms      1.59ms    70.00%
time to 1st byte:     4.29ms     11.70ms      7.56ms      2.94ms    60.00%
req/s           :   27561.91    30657.87    28883.85     1175.32    50.00%

In: 41 Mbit/s
Out: 1.95 GBit/s


taskset --cpu-list 3,4,5 h2load -c200 -m1 -t2 -D20 https://ubuntu
finished in 20.04s, 219628.00 req/s, 167.98MB/s
requests: 4392560 total, 4392760 started, 4392560 done, 4392560 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4392560 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.28GB (3522841520) total, 720.52MB (755520320) headers (space savings 23.89%), 2.50GB (2688246720) data
                     min         max         mean         sd        +/- sd
time for request:      100us      5.95ms       684us       199us    73.68%
time for connect:    20.75ms     37.25ms     33.79ms      3.30ms    78.50%
time to 1st byte:    34.85ms     40.94ms     37.69ms      1.91ms    55.50%
req/s           :    1081.90     1124.74     1098.06       15.69    70.50%

In: 183 Mbit/s
Out: 1.47 GBit/s

Nginx:

taskset --cpu-list 5 h2load -c100 -m95 -t2 -D20 https://ubuntu
finished in 20.03s, 231835.10 req/s, 165.38MB/s
requests: 4636702 total, 4646202 started, 4636702 done, 4636702 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4636702 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.23GB (3468257996) total, 521.78MB (547130836) headers (space savings 33.33%), 2.64GB (2837661624) data
                     min         max         mean         sd        +/- sd
time for request:      544us    123.87ms     39.47ms     16.11ms    60.45%
time for connect:     3.22ms     30.94ms     17.57ms      8.01ms    58.00%
time to 1st byte:    15.96ms     89.42ms     52.64ms     13.97ms    66.00%
req/s           :    1512.95     3509.12     2318.01      842.96    79.00%

In: 34 Mbit/s
Out: 1.38 GBit/s

taskset --cpu-list 3,4,5 h2load -c100 -m95 -t2 -D20 https://ubuntu
finished in 20.04s, 220740.35 req/s, 157.47MB/s
requests: 4414807 total, 4424307 started, 4414807 done, 4414807 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4414807 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.08GB (3302280536) total, 496.81MB (520947226) headers (space savings 33.33%), 2.52GB (2701861884) data
                     min         max         mean         sd        +/- sd
time for request:      180us    109.29ms     41.96ms     11.95ms    62.94%
time for connect:     3.60ms     32.29ms     13.93ms      7.22ms    65.00%
time to 1st byte:    14.84ms     84.05ms     44.77ms     19.05ms    65.00%
req/s           :    1794.75     3344.97     2207.21      589.51    78.00%

In: 43 Mbit/s
Out: 1.28 GBit/s

kingluo · 2024-06-27T11:57:24Z

PING Flood Handling Benchmark

Related issue: #2117.

Install python3.12 and golang.

Refer to these source files:
https://gist.github.com/kingluo/07e66502b420a96ceaa5dd430140f43b

test python3 h2 + openssl (optionally with KTLS enabled)

sudo python3.12 h2_server.py
./ping_flood -address 192.168.2.1:443 -threads 1 -connections 10 -debug 1

Use btm to observe the network throughput.

test tempesta

sudo -E TFW_CFG_PATH=$HOME/tempesta-ping.conf ./scripts/tempesta.sh --restart
./ping_flood -address 192.168.2.1:443 -threads 1 -connections 10 -debug 1

ping_flood will print an error if the ping ack sequence is not continuous.
You can see that python3 has more than 10 times higher throughput than tempesta.
The memory usage of python3 remains the same, but the memory usage of tempesta keeps increasing (OOM).

EvgeniiMekhanik · 2024-10-16T10:27:11Z

After PR 2257 the problem with ping flood partially gone.
Python:

Tempesta:

There is no memory leak. But one of the connections is still broken because the queue is full and ss_send returns
-EBUSY

krizhanovsky added bug crucial performance labels Jul 13, 2023

krizhanovsky added this to the 0.8 - Beta milestone Jul 13, 2023

krizhanovsky self-assigned this Jul 13, 2023

krizhanovsky modified the milestones: 0.8 - Beta, 0.9 - LA Nov 7, 2023

krizhanovsky mentioned this issue Nov 12, 2023

Test: automated performance testing suite #781

Open

13 tasks

krizhanovsky mentioned this issue Nov 15, 2023

Mekhanik evgenii/1196 itog 2 #1973

Merged

krizhanovsky assigned EvgeniiMekhanik Mar 1, 2024

krizhanovsky changed the title ~~Multiple perforance regressions~~ Multiple performance regressions Apr 2, 2024

krizhanovsky mentioned this issue Jul 30, 2024

h2 perfromance regression for HTTP/1 cache #1422

Open

2 tasks

krizhanovsky mentioned this issue Aug 14, 2024

fix(2117): if si_wq is full, reset connection in case of flooding #2150

Closed

krizhanovsky unassigned EvgeniiMekhanik Aug 22, 2024

krizhanovsky mentioned this issue Sep 29, 2024

SETTINGS_NO_RFC7540_PRIORITIES and RFC 9218 streams prioritization #2171

Open

krizhanovsky mentioned this issue Oct 13, 2024

Fix memory under ping flood #2257

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple performance regressions #1940

Multiple performance regressions #1940

krizhanovsky commented Jul 13, 2023 •

edited

Loading

krizhanovsky commented Nov 1, 2023 •

edited

Loading

krizhanovsky commented Nov 14, 2023 •

edited

Loading

krizhanovsky commented Nov 16, 2023

RomanBelozerov commented Nov 17, 2023 •

edited

Loading

EvgeniiMekhanik commented Mar 21, 2024 •

edited by krizhanovsky

Loading

EvgeniiMekhanik commented Mar 21, 2024

EvgeniiMekhanik commented Mar 21, 2024

EvgeniiMekhanik commented Mar 21, 2024

const-t commented Jun 10, 2024

kingluo commented Jun 27, 2024 •

edited

Loading

EvgeniiMekhanik commented Oct 16, 2024

Multiple performance regressions #1940

Multiple performance regressions #1940

Comments

krizhanovsky commented Jul 13, 2023 • edited Loading

Test configuration

TLS regression

10 bytes cached response, no keep-alive

10 bytes cached response, keep-alive

26KB cached response, no keep-alive

26KB cached response, keep-alive

Conclusions

TODO

krizhanovsky commented Nov 1, 2023 • edited Loading

krizhanovsky commented Nov 14, 2023 • edited Loading

HTTP/2 benchmark

krizhanovsky commented Nov 16, 2023

RomanBelozerov commented Nov 17, 2023 • edited Loading

EvgeniiMekhanik commented Mar 21, 2024 • edited by krizhanovsky Loading

EvgeniiMekhanik commented Mar 21, 2024

EvgeniiMekhanik commented Mar 21, 2024

EvgeniiMekhanik commented Mar 21, 2024

const-t commented Jun 10, 2024

kingluo commented Jun 27, 2024 • edited Loading

PING Flood Handling Benchmark

EvgeniiMekhanik commented Oct 16, 2024

krizhanovsky commented Jul 13, 2023 •

edited

Loading

krizhanovsky commented Nov 1, 2023 •

edited

Loading

krizhanovsky commented Nov 14, 2023 •

edited

Loading

RomanBelozerov commented Nov 17, 2023 •

edited

Loading

EvgeniiMekhanik commented Mar 21, 2024 •

edited by krizhanovsky

Loading

kingluo commented Jun 27, 2024 •

edited

Loading