You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When hardware-assisted UDP GSO is present, overhead of the crypto becomes the biggest bottleneck of QUIC sender performance. In the case shown below doing GSO of 40 packets * 1460 bytes, 45.18% of the time is spent in AEAD encryption (ptls_aead_encrypt), 6.29% is spent in header protection (default_finalize_send_packet). These numbers (almost) represent the real cost, as 96% of the CPU time were spent in user+sys (consisting 100% of this perf tree).
While the actual cost of doing crypto cannot be reduced, there are certain amount of overhead within this 45.18% + 6.29%. The actual cost of AEAD (initialization, aad processing, encrypt, finalize) is 2.73% + 0.78% + 33.31% + 3.15% = 39.97%. The actual cost of generating unpredictable bits for header protection is 1.18%. We can assume that the large fraction of the remaining 10.32% is API overhead.
A sensible approach to fix this issue would be to create add a function to OpenSSL that does everything at once (i.e. all AEAD + header protection). We can hope to see ~10% performance improvement by adding such a function.
The text was updated successfully, but these errors were encountered:
A sensible approach to fix this issue would be to create add a function to OpenSSL that does everything at once (i.e. all AEAD + header protection). We can hope to see ~10% performance improvement by adding such a function.
Furthermore, we could take advantage of pipelining, by running the initialization, aad, and finalization phases of multiple packets in parallel. The reason why they show up as much as above compared to the permutation of the payload is because, the first three is not parallelized, while the permutation of the payload is applied to multiple blocks of 16-bytes at once (notice the 6x suffix of the functions). Unlike the case of TLS, we have can parallelize the three phases because we are generating multiple packets at once. Also, we could consider applying that encryption logic in the kernel to reduce the overhead of context switch.
When hardware-assisted UDP GSO is present, overhead of the crypto becomes the biggest bottleneck of QUIC sender performance. In the case shown below doing GSO of 40 packets * 1460 bytes, 45.18% of the time is spent in AEAD encryption (ptls_aead_encrypt), 6.29% is spent in header protection (default_finalize_send_packet). These numbers (almost) represent the real cost, as 96% of the CPU time were spent in user+sys (consisting 100% of this perf tree).
While the actual cost of doing crypto cannot be reduced, there are certain amount of overhead within this 45.18% + 6.29%. The actual cost of AEAD (initialization, aad processing, encrypt, finalize) is 2.73% + 0.78% + 33.31% + 3.15% = 39.97%. The actual cost of generating unpredictable bits for header protection is 1.18%. We can assume that the large fraction of the remaining 10.32% is API overhead.
A sensible approach to fix this issue would be to create add a function to OpenSSL that does everything at once (i.e. all AEAD + header protection). We can hope to see ~10% performance improvement by adding such a function.
The text was updated successfully, but these errors were encountered: