crypto.sha2: Use intrinsics for SHA-256 on x86-64 and AArch64 #13272

There's probably plenty of room to optimize these further in the future, but for the moment this gives ~3x improvement on Intel x86-64 processors, ~5x on AMD, and ~10x on M1 Macs. These extensions are very new - Most processors prior to 2020 do not support them. AVX-512 is a slightly older alternative that we could use on Intel for a much bigger performance bump, but it's been fused off on Intel's latest hybrid architectures and it relies on computing independent SHA hashes in parallel. In contrast, these SHA intrinsics provide the usual single-threaded, single-stream interface, and should continue working on new processors. AArch64 also has SHA-512 intrinsics that we could take advantage of in the future

This feature detection must be done at comptime so that we avoid generating invalid ASM for the target.

This gets us most of the way back to the performance I had when I was using the LLVM intrinsics: - Intel Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz: 190.67 MB/s (w/o intrinsics) -> 1285.08 MB/s - AMD EPYC 7763 (VM) @ 2.45 GHz: 240.09 MB/s (w/o intrinsics) -> 1360.78 MB/s - Apple M1: 216.96 MB/s (w/o intrinsics) -> 2133.69 MB/s Minor changes to this source can swing performance from 400 MB/s to 1400 MB/s or... 20 MB/s, depending on how it interacts with the optimizer. I have a sneaking suspicion that despite LLVM inheriting GCC's extremely strict inline assembly semantics, its passes are rather skittish around inline assembly (and almost certainly, its instruction cost models can assume nothing)

Comptime code can't execute assembly code, so we need some way to force comptime code to use the generic path. This should be replaced with whatever is implemented for ziglang#868, when that day comes. I am seeing that the result for the hash is incorrect in stage1 and crashes stage2, so presumably this never worked correctly. I will follow up on that soon.

This also fixes a bug where the feature gating was not taking effect at comptime due to ziglang#6768

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crypto.sha2: Use intrinsics for SHA-256 on x86-64 and AArch64 #13272

crypto.sha2: Use intrinsics for SHA-256 on x86-64 and AArch64 #13272

Commits on Oct 28, 2022

Commits on Oct 29, 2022