-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AES-GCM AArch64: Store swapped Htable values #1403
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1403 +/- ##
==========================================
+ Coverage 78.19% 78.20% +0.01%
==========================================
Files 571 571
Lines 95465 95465
Branches 13704 13705 +1
==========================================
+ Hits 74653 74663 +10
+ Misses 20201 20191 -10
Partials 611 611 ☔ View full report in Codecov by Sentry. |
2e3ff96
to
5ebae3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks on a Graviton 3 instance running 1,000 iterations for each benchmark. All values are in microseconds (lower is better), the ratio is simply before/after (greater than 1 is better).
Graviton 3 AEAD-AES-256-GCM seal
num bytes or init | Main min | Main avg | Main max | PR min | PR avg | PR max | min ratio | avg ratio | max ratio |
---|---|---|---|---|---|---|---|---|---|
init | 0.09242 | 0.09401 | 0.09685 | 0.0943 | 0.09651 | 0.0994 | 0.98003 | 0.97413 | 0.97426 |
16 | 0.0904 | 0.0923 | 0.0998 | 0.09 | 0.0917 | 0.0993 | 1.0055 | 1.0061 | 1.0045 |
256 | 0.1327 | 0.1346 | 0.1402 | 0.1317 | 0.1338 | 0.1390 | 1.0077 | 1.0058 | 1.008 |
1350 | 0.3191 | 0.3211 | 0.3285 | 0.3112 | 0.3136 | 0.3198 | 1.0256 | 1.0237 | 1.0272 |
8192 | 1.3207 | 1.3251 | 1.337 | 1.278 | 1.2828 | 1.2908 | 1.0334 | 1.0330 | 1.0361 |
16384 | 2.5457 | 2.553 | 2.575 | 2.4582 | 2.4675 | 2.487 | 1.0356 | 1.0348 | 1.0352 |
Graviton 3 AEAD-AES-128-GCM seal
num bytes or init | Main min | Main avg | Main max | PR min | PR avg | PR max | min ratio | avg ratio | max ratio |
---|---|---|---|---|---|---|---|---|---|
init | 0.09558 | 0.09738 | 0.09997 | 0.09812 | 0.09981 | 0.10255 | 0.97414 | 0.97571 | 0.97487 |
16 | 0.0948 | 0.0967 | 0.1026 | 0.0939 | 0.0954 | 0.1019 | 1.0095 | 1.0139 | 1.0072 |
256 | 0.1425 | 0.1449 | 0.1507 | 0.1422 | 0.1442 | 0.1499 | 1.0019 | 1.0044 | 1.0057 |
1350 | 0.3632 | 0.3655 | 0.3724 | 0.3534 | 0.3560 | 0.3623 | 1.0277 | 1.0266 | 1.0278 |
8192 | 1.5352 | 1.5415 | 1.5601 | 1.4948 | 1.5018 | 1.51 | 1.0270 | 1.0264 | 1.0284 |
16384 | 2.9717 | 2.9817 | 3.0072 | 2.8912 | 2.9030 | 2.9 | 1.0278 | 1.027 | 1.0263 |
Graviton 2 AEAD-AES-128-GCM seal
num bytes | Main min | Main avg | Main max | PR min | PR avg | PR max | min ratio | avg ratio | max ratio |
---|---|---|---|---|---|---|---|---|---|
init | 0.1351 | 0.1371 | 0.1392 | 0.1370 | 0.1393 | 0.1423 | 0.9864 | 0.9841 | |
16 | 0.1267 | 0.1326 | 0.13 | 0.128 | 0.1333 | 0.1374 | 0.98 | 0.9951 | 0.9897 |
256 | 0.2212 | 0.2264 | 0.2315 | 0.2202 | 0.2256 | 0.2300 | 1.0046 | 1.0033 | 1.0064 |
1350 | 0.7260 | 0.7306 | 0.7359 | 0.7272 | 0.7321 | 0.739 | 0.9984 | 0.9979 | 0.9951 |
8192 | 3.6756 | 3.6830 | 3.7103 | 3.6793 | 3.6985 | 3.7166 | 0.99 | 0.995 | 0.998 |
16384 | 7.23 | 7.2479 | 7.28 | 7.24 | 7.2778 | 7.39 | 0.998 | 0.9958 | 0.9853 |
Overall on Graviton 3 the init is slightly slower but encrypting all sizes is slightly faster. Graviton 2 is also slower for init but basically no change for encryption time.
Implementations of AES-GCM in AWS-LC may use an "H-Table" to precompute and cache common computations across multiple invocations of AES-GCM using the same key, thereby improving performance. The main example of such common precomputation is the computation of powers of the H-value used in the GHASH algorithm -- giving the H-Table its name. However, despite the name, the structure of the H-Table is opaque to the code invoking AES-GCM, and implementations are free to populate it with arbitrary data. This freedom is already being leveraged: Currently, the AArch64 implementation of AES-GCM not only stores powers of H in the HTable (H1-H8 in the code), but also their 'Karatsuba preprocessing's, which are the EORs of the low and high halves. Those naturally occur when using Karatsuba's algorithm to reduce a 128-bit polynomial multiplication over GF(2) to 3x 64-bit polynomial. This commit changes the structure of the H-Table for AArch64 implementations of AES-GCM slightly to obtain a small performance gain: It is observed that every time a power of H is loaded from the H-Table (H1-H8), the first operation that happens to it in both aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low and high halves via `ext arg.16b, arg.16b, arg.16b, aws#8`. Those swaps can be precomputed, and the H{1-8} values stored in swapped form in the HTable, thereby eliminating the swaps from the critical loop of AES-GCM. This commit modifies the H-table precomputation ghash_init_v8 in the simplest way possible to introduce the desired swaps, bracketing store instructions for H-table values X with `vext.8 X, X, X, aws#8`. The resulting initialization code is slightly slower than the original one and will be simplified in the next commit.
This commit simplifies the pre-computation of the H-table by 'absorbing' the newly introduced swap instructions `vext` into the surrounding code. This brings the performance of the H-table initialization on par with the previous initialiation routine.
5f901da
to
0c86c7d
Compare
@nebeid @andrewhop Let me know if there is something I can do to facilitate the review. |
@nebeid @andrewhop @dkostic Any update on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @hanno-becker for this change. I suggest that since you dove into the details of this implementation to add comments at the beginning to explain what's calculated and where it is stored in the H table, maybe using ASCII representation of the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nebeid It is a good idea to document better what is being stored in the HTable. However, it is not necessary to vet this PR, I think: The main point is that certain entries in the HTable are always swapped right after loading -- so one may just store the swapped versions to begin with. This does not rely on knowledge of what it is that is being stored.
### Description: The AES-GCM programs are updated in the following two PRs, aws/aws-lc#1403 and PR aws/aws-lc#1639. Updating them in LNSym as well. ### Testing: Make all succeeds and conformance testing is successful. ### License: By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. Co-authored-by: Shilpi Goel <shigoel@gmail.com>
Implementations of AES-GCM in AWS-LC may use an "H-Table" to precompute and cache common computations across multiple invocations of AES-GCM using the same key, thereby improving performance.
The main example of such common precomputation is the computation of powers of the H-value used in the GHASH algorithm -- giving the H-Table its name. However, despite the name, the structure of the H-Table is opaque to the code invoking AES-GCM, and implementations are free to populate it with arbitrary data.
This freedom is already being leveraged: Currently, the AArch64 implementation of AES-GCM not only stores powers of H in the HTable (H1-H8 in the code), but also their 'Karatsuba preprocessing's, which are the EORs of the low and high halves. Those naturally occur when using Karatsuba's algorithm to reduce a 128-bit polynomial multiplication over GF(2) to 3x 64-bit polynomial.
This PR changes the structure of the H-Table for AArch64 implementations of AES-GCM slightly to obtain a small performance gain:
It is observed that every time a power of H is loaded from the H-Table (H1-H8), the first operation that happens to it in both aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low and high halves via
ext arg.16b, arg.16b, arg.16b, #8
. Those swaps can be precomputed, and the H{1-8} values stored in swapped form in the HTable, thereby eliminating the swaps from the critical loop of AES-GCM.This gives a small performance gain for AES-GCM on Graviton3, at the cost of slightly slower one-off initialization. For Graviton2, the AES-GCM AArch64 assembly loads the H-table only once, outside of the critical loop; hence, there is no performance benefit.
Testing:
ssl_test
andcrypto/crypto_test
bssl speed
: TBDBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.