Reducing crypto overhead #334

kazuho · 2020-04-30T04:42:25Z

When hardware-assisted UDP GSO is present, overhead of the crypto becomes the biggest bottleneck of QUIC sender performance. In the case shown below doing GSO of 40 packets * 1460 bytes, 45.18% of the time is spent in AEAD encryption (ptls_aead_encrypt), 6.29% is spent in header protection (default_finalize_send_packet). These numbers (almost) represent the real cost, as 96% of the CPU time were spent in user+sys (consisting 100% of this perf tree).

While the actual cost of doing crypto cannot be reduced, there are certain amount of overhead within this 45.18% + 6.29%. The actual cost of AEAD (initialization, aad processing, encrypt, finalize) is 2.73% + 0.78% + 33.31% + 3.15% = 39.97%. The actual cost of generating unpredictable bits for header protection is 1.18%. We can assume that the large fraction of the remaining 10.32% is API overhead.

A sensible approach to fix this issue would be to create add a function to OpenSSL that does everything at once (i.e. all AEAD + header protection). We can hope to see ~10% performance improvement by adding such a function.

kazuho · 2020-04-30T11:41:56Z

A sensible approach to fix this issue would be to create add a function to OpenSSL that does everything at once (i.e. all AEAD + header protection). We can hope to see ~10% performance improvement by adding such a function.

Furthermore, we could take advantage of pipelining, by running the initialization, aad, and finalization phases of multiple packets in parallel. The reason why they show up as much as above compared to the permutation of the payload is because, the first three is not parallelized, while the permutation of the payload is applied to multiple blocks of 16-bytes at once (notice the 6x suffix of the functions). Unlike the case of TLS, we have can parallelize the three phases because we are generating multiple packets at once. Also, we could consider applying that encryption logic in the kernel to reduce the overhead of context switch.

janaiyengar mentioned this issue Jun 18, 2020

Improve AEAD performance through use of the fusion AES-GCM engine #359

Merged

3 tasks

kazuho closed this as completed in #359 Jun 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing crypto overhead #334

Reducing crypto overhead #334

kazuho commented Apr 30, 2020 •

edited

Loading

kazuho commented Apr 30, 2020

Reducing crypto overhead #334

Reducing crypto overhead #334

Comments

kazuho commented Apr 30, 2020 • edited Loading

kazuho commented Apr 30, 2020

kazuho commented Apr 30, 2020 •

edited

Loading