Crypto extensions and performance #1335

krizhanovsky · 2019-08-11T23:02:18Z

Scope

Following algorithms must be implemented or optimized in Tempesta TLS:

[TLS 1.3, TLS 1.3 #1031] Curve25519 (Montgomery, already is in the kernel) as defined in RFC 7748 for ECDHE. (EdDSA certificates seem aren't wide spread enough Please support EdDSA certificates letsencrypt/boulder#3649). See High-performance Implementation of Elliptic Curve Cryptography Using Vector Instructions by Armando Faz-Hernández
ChaCha20_Poly1305 (already is in the kernel, also required for QUIC, usually preferred by mobile devices) must be implemented. MbedTLS uses additional callback stream_func in mbedtls_cipher_base_t which is used by ChaCha20 and ARC4 only, but maybe we can find a better solution. This algorithm is also slower than AES TLS 1.3 #1031 (comment) , so we can go to the first version w/o the algorithm.
TLS: further performance improvements and cleanups #1064 improves MPI, but doesn't takes specific steps to improve RSA performance, so the algorithm must be optimized. See rsaz code and referenced papers in OpenSSL. Actually, RSA is asymmetric, so that client computations are less expensive, than the servers' ones. It's not recommended to use RSA in terms of performance and DDoS resistance, so probably it makes sense to abandon it or at least recommend users not to use it and not to spend much development resources on it.
TLS: further performance improvements and cleanups #1064 was focused on SECP 256 elliptic curve, so SECP 384 should also be profiled and optimized. https://w3.lasca.ic.unicamp.br/media/publications/ST4-1.pdf addresses the curve-specific optimizations. SECP 384 should be removed.
The Intel Ice Lake CPU familiy doesn't have the dramatic downclocking on AVX-512 any more, so explore algorithms for SECP 256 (Hernandez proposed AVX2), Curve25519 (the kernel uses plain MULX implementation, see Hernandez and Hisil) and RSA for AVX-512 (also see for generic bigints).
The kernel AES-GCM optimizations. Use Karatsuba precomputations for AES in the same TLS connection (see at the below), TLS performance characterization on modern x86 CPUs. Also AVX-512 version of VPCLMULQDQ instruction can be used for faster carry-less multiplication (the current OpenSSL also doesn't use this though).

Testing

Unit tests in tls/t for curve25519
Adopt appropriate tests from wycheproof
test_tls_cert: certificates and handshakes for RSASSA_PSS and certificates for other EC (TTLS_PK_ECKEY vs TTLS_PK_ECKEY_DH and TTLS_PK_ECKEY_ECDSA)
Functional test for RSA ciphersuites with certificates chain (see Assertion at tls/x509_crt.h:167 #1498).

Notes

Deprecation of SECP 384

SECP 384 technically a legacy and x448 provides better performance (checked w/ OpenSSL):

$ openssl speed ecdsa
                              sign    verify    sign/s verify/s
 224 bits ecdsa (nistp224)   0.0001s   0.0001s  14928.8   6707.9
 256 bits ecdsa (nistp256)   0.0000s   0.0001s  35504.2  11838.0
 384 bits ecdsa (nistp384)   0.0011s   0.0009s    890.6   1079.1
 521 bits ecdsa (nistp521)   0.0004s   0.0007s   2770.6   1401.8

$ openssl speed eddsa
                              sign    verify    sign/s verify/s
 253 bits EdDSA (Ed25519)   0.0001s   0.0001s  19837.8   7459.3
 456 bits EdDSA (Ed448)   0.0004s   0.0007s   2657.7   1482.6

It seems that OpenSSL doesn't optimize the curve at all, since even 521 has better performance. However, CA/B Forum Baseline Requirements section 6.1.5 requires certificates to be signed with either RSA or NIST curves of 256, 384 or 521. Let's leave RSA for the legacy usage and remove secp384 completely. Also note that ECDSA secp256 outperforms Ed25519 for signing, so we should leave secp256 to support EC certificates. ECDHE is faster for x25519:

$ openssl speed ecdh
                              op      op/s
 224 bits ecdh (nistp224)   0.0001s  11621.8
 256 bits ecdh (nistp256)   0.0001s  16690.9
 384 bits ecdh (nistp384)   0.0011s    915.1
 521 bits ecdh (nistp521)   0.0004s   2265.4
 253 bits ecdh (X25519)   0.0000s  24055.1
 448 bits ecdh (X448)   0.0006s   1612.5

AES-GCM precomputations for Karatsuba multiplication

The paper TLS performance characterization on modern x86 CPUs references two original Intel papers:

The header comments for the Linux implementation explicitly says that it was developed by these two papers. The first one mentions hash key precomputations: Htbl in OpenSSL crypto/modes/asm/ghash-x86_64.pl and HashKey* offsets in linux/arch/x86/crypto/aesni-intel_avx-x86_64.S, so these precomputations are used in both the implementations. The second one proposes to precompute carry-less multiplication of Bh and Bl parts in Karatsuba multiplication. There is also Intel paper Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode, which doesn't consider the precomputation optimizations.

The text was updated successfully, but these errors were encountered:

the curve while the test suite already has the test for unsupported secp521, so just remove the test for secp384.

Add comments for #1064.

krizhanovsky · 2024-06-30T21:20:09Z

Updated benchmarks for ECDSA (performance core on i9-12900HK):

$ openssl version
OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

$ taskset --cpu-list 2 openssl speed ecdsa
....
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-olCZw9/openssl-3.0.2=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_TLS_SECURITY_LEVEL=2 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
CPUINFO: OPENSSL_ia32cap=0x7ffaf3ffffebffff:0x98c007bc239ca7eb
                              sign    verify    sign/s verify/s
 160 bits ecdsa (secp160r1)   0.0001s   0.0001s  10476.5   9825.9
 192 bits ecdsa (nistp192)   0.0001s   0.0001s   8490.5   8223.9
 224 bits ecdsa (nistp224)   0.0000s   0.0001s  34613.6  16034.2
 256 bits ecdsa (nistp256)   0.0000s   0.0000s  63743.3  20365.9
 384 bits ecdsa (nistp384)   0.0005s   0.0004s   2097.6   2455.4
 521 bits ecdsa (nistp521)   0.0002s   0.0003s   5812.7   2880.5
 163 bits ecdsa (nistk163)   0.0001s   0.0002s   8635.0   4368.9
 233 bits ecdsa (nistk233)   0.0002s   0.0003s   6390.0   3231.3
 283 bits ecdsa (nistk283)   0.0003s   0.0006s   3569.2   1808.5
 409 bits ecdsa (nistk409)   0.0005s   0.0010s   2060.4   1051.3
 571 bits ecdsa (nistk571)   0.0011s   0.0021s    924.0    470.3
 163 bits ecdsa (nistb163)   0.0001s   0.0002s   8257.3   4171.4
 233 bits ecdsa (nistb233)   0.0002s   0.0003s   6005.8   3078.1
 283 bits ecdsa (nistb283)   0.0003s   0.0006s   3367.1   1706.5
 409 bits ecdsa (nistb409)   0.0005s   0.0010s   1946.0    989.7
 571 bits ecdsa (nistb571)   0.0012s   0.0023s    858.7    437.6
 256 bits ecdsa (brainpoolP256r1)   0.0002s   0.0002s   4942.6   4953.8
 256 bits ecdsa (brainpoolP256t1)   0.0002s   0.0002s   4939.3   5119.0
 384 bits ecdsa (brainpoolP384r1)   0.0005s   0.0004s   2080.8   2338.1
 384 bits ecdsa (brainpoolP384t1)   0.0005s   0.0004s   2113.9   2474.1
 512 bits ecdsa (brainpoolP512r1)   0.0008s   0.0007s   1243.8   1458.2
 512 bits ecdsa (brainpoolP512t1)   0.0008s   0.0006s   1267.2   1556.7

(Results are basically the same).

krizhanovsky added enhancement performance labels Aug 11, 2019

krizhanovsky added this to the 1.1 TBD (ML, QUIC, DoH etc.) milestone Aug 11, 2019

krizhanovsky mentioned this issue Nov 4, 2019

#1064: small MPI cleanups and improvements #1363

Merged

krizhanovsky modified the milestones: 1.1 TBD (ML, QUIC, DoH etc.), 0.8 TLS 1.3 & TDBv0.2 Nov 6, 2019

krizhanovsky mentioned this issue Jan 10, 2020

#1064: TLS performance imporovements #1375

Merged

krizhanovsky modified the milestones: 0.8 TLS 1.3 & TDBv0.2, 1.1 TBD (ML, QUIC, DoH etc.), 1.0 Stability - GA Jan 21, 2020

krizhanovsky modified the milestones: 1.0 Stability - GA, 0.8 TLS 1.3 & TDBv0.2 Apr 27, 2020

krizhanovsky added the TLS Tempesta TLS module and related issues label Apr 27, 2020

krizhanovsky added the crucial label May 17, 2020

krizhanovsky mentioned this issue May 30, 2020

TLS 1.3 #1031

Open

8 tasks

krizhanovsky self-assigned this Jul 7, 2020

krizhanovsky changed the title ~~TLS crypto extensions~~ Crypto extensions and performance Dec 28, 2020

krizhanovsky added a commit to tempesta-tech/tempesta-test that referenced this issue Jan 1, 2021

tempesta-tech/tempesta#1335 deprecates

203c3a7

the curve while the test suite already has the test for unsupported secp521, so just remove the test for secp384.

krizhanovsky added a commit that referenced this issue Jan 1, 2021

Remove secp384 (issue #1335) as unused one.

1a16214

Add comments for #1064.

krizhanovsky added a commit that referenced this issue Jan 1, 2021

Remove secp384 (issue #1335) as unused one.

3a8d8c5

Add comments for #1064.

This was referenced Jan 1, 2021

Remove test for secp384 tempesta-tech/tempesta-test#181

Merged

Performance optimizations for NIST p256 curve #1481

Merged

TLS: further performance improvements and cleanups #1064

Open

krizhanovsky modified the milestones: 0.8 - TBD, 1.1 - TLS 1.3 Jan 3, 2022

krizhanovsky modified the milestones: 1.xx - TBD, 1.x: TBD Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crypto extensions and performance #1335

Crypto extensions and performance #1335

krizhanovsky commented Aug 11, 2019 •

edited

Loading

krizhanovsky commented Jun 30, 2024

Crypto extensions and performance #1335

Crypto extensions and performance #1335

Comments

krizhanovsky commented Aug 11, 2019 • edited Loading

Scope

Testing

Notes

Deprecation of SECP 384

AES-GCM precomputations for Karatsuba multiplication

krizhanovsky commented Jun 30, 2024

krizhanovsky commented Aug 11, 2019 •

edited

Loading