-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crypto extensions and performance #1335
Labels
Milestone
Comments
krizhanovsky
added a commit
to tempesta-tech/tempesta-test
that referenced
this issue
Jan 1, 2021
the curve while the test suite already has the test for unsupported secp521, so just remove the test for secp384.
This was referenced Jan 1, 2021
Updated benchmarks for ECDSA (performance core on i9-12900HK):
(Results are basically the same). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Scope
Following algorithms must be implemented or optimized in Tempesta TLS:
[TLS 1.3, TLS 1.3 #1031] Curve25519 (Montgomery, already is in the kernel) as defined in RFC 7748 for ECDHE. (EdDSA certificates seem aren't wide spread enough Please support EdDSA certificates letsencrypt/boulder#3649). See High-performance Implementation of Elliptic Curve Cryptography Using Vector Instructions by Armando Faz-Hernández
ChaCha20_Poly1305 (already is in the kernel, also required for QUIC, usually preferred by mobile devices) must be implemented. MbedTLS uses additional callback stream_func in mbedtls_cipher_base_t which is used by ChaCha20 and ARC4 only, but maybe we can find a better solution.This algorithm is also slower than AES TLS 1.3 #1031 (comment) , so we can go to the first version w/o the algorithm.TLS: further performance improvements and cleanups #1064 improves MPI, but doesn't takes specific steps to improve RSA performance, so the algorithm must be optimized. See
rsaz
code and referenced papers in OpenSSL. Actually, RSA is asymmetric, so that client computations are less expensive, than the servers' ones. It's not recommended to use RSA in terms of performance and DDoS resistance, so probably it makes sense to abandon it or at least recommend users not to use it and not to spend much development resources on it.TLS: further performance improvements and cleanups #1064 was focused on SECP 256 elliptic curve, so SECP 384 should also be profiled and optimized. https://w3.lasca.ic.unicamp.br/media/publications/ST4-1.pdf addresses the curve-specific optimizations.SECP 384 should be removed.The Intel Ice Lake CPU familiy doesn't have the dramatic downclocking on AVX-512 any more, so explore algorithms for SECP 256 (Hernandez proposed AVX2), Curve25519 (the kernel uses plain MULX implementation, see Hernandez and Hisil) and RSA for AVX-512 (also see for generic bigints).
The kernel AES-GCM optimizations. Use Karatsuba precomputations for AES in the same TLS connection (see at the below), TLS performance characterization on modern x86 CPUs. Also AVX-512 version of VPCLMULQDQ instruction can be used for faster carry-less multiplication (the current OpenSSL also doesn't use this though).
Testing
tls/t
for curve25519test_tls_cert
: certificates and handshakes forRSASSA_PSS
and certificates for other EC (TTLS_PK_ECKEY
vsTTLS_PK_ECKEY_DH
andTTLS_PK_ECKEY_ECDSA
)Notes
Deprecation of SECP 384
SECP 384 technically a legacy and x448 provides better performance (checked w/ OpenSSL):
It seems that OpenSSL doesn't optimize the curve at all, since even 521 has better performance. However, CA/B Forum Baseline Requirements section 6.1.5 requires certificates to be signed with either RSA or NIST curves of 256, 384 or 521. Let's leave RSA for the legacy usage and remove secp384 completely. Also note that ECDSA secp256 outperforms Ed25519 for signing, so we should leave secp256 to support EC certificates. ECDHE is faster for x25519:
AES-GCM precomputations for Karatsuba multiplication
The paper TLS performance characterization on modern x86 CPUs references two original Intel papers:
The header comments for the Linux implementation explicitly says that it was developed by these two papers. The first one mentions hash key precomputations:
Htbl
in OpenSSLcrypto/modes/asm/ghash-x86_64.pl
andHashKey*
offsets inlinux/arch/x86/crypto/aesni-intel_avx-x86_64.S
, so these precomputations are used in both the implementations. The second one proposes to precompute carry-less multiplication ofBh
andBl
parts in Karatsuba multiplication. There is also Intel paper Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode, which doesn't consider the precomputation optimizations.The text was updated successfully, but these errors were encountered: