-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge tail latency in the TLS performance tests #1434
Comments
Actually, in all our tests using tls-perf we saw huge tail/max latnecy, sometimes even higher than for Nginx/OpenSSL, so there is definitely an issue. |
I suppose the high tail latency is linked with the heavy cryptographic computations in softirq and how Linux executes Note that the TTLS routines on the flame graph (collected in a VM) are called in two different contexts: Regarding the original issue with high tail latency, up to +3000%, I believe this isn't just about whether to process the crypto in ksoftirqd or do_IRQ. I assume significant network packet drops are involved, e.g. we do some heavy work in The problem is linked with #1446 , kTLS encryption in softirq. |
Here is statistics about different concurrent client connections. Tested on Linux 4.14. Tempesta is on 2b61411 . I run
So actual number of concurent connections was
Tempesta usually provides a lower tail latencies on all connections num except 100, 250, and Nginx had huge tail latencies on connections nums bigger than 500. During bechmarking for Netdev I saw absolutely opposite results on 1000 concurrent connections! Average performance here is also better than I had in tests for the article. Then I set up sysctl similar to values used in "VM tests" from the article:
Statistically nothing has changes, Nginx has larger tails, not Tempesta... Still guessing, how it could be. I run the bench many times, sometimes large latency tails appeared on Tempesta, but not always, and Nginx still got larger ones.
Tried to monitor packet drops via https://github.com/nhorman/dropwatch:
Don't bother with udp drops - its not related to the issue. |
Scope
While performance testing we bumped into unexpected latency gaps on huge number of connections. Seems like our processing on receive (in softirq) may add some inequity in parallel connections progress. Seems like epoll-mode does extra balancing and latencies for all connections are almost the same.
The text was updated successfully, but these errors were encountered: