Sha256 refactor #206

mratsim · 2022-09-18T22:28:21Z

This PR updates the SHA256 implementation

cleanly separates message scheduling and hashing round. Message scheduling is independent of previous data and can be parallelized 4x uint32 per 4x uint32. For known padding blocks, such as those that arises in Merkle trees (see Accelerate Merkle tree hashing #205), it can be precomputed for significant speedups.
implements SSSE3 (introduced in 2006, via Core 2 Duo) to parallelize message scheduling. This speeds SHA256 by 30%.

Note: the SIMD rotate instruction _mm_ror_epi32 to trivially translate sha256 was added with AVX512F+AVX512VL.
The speedup would only be helpful on Skylake-X as the only architecture with AVX512VL but no hardware SHA (Cannon Lake never shipped). And that speedup would be limited to replacing shifts+xors with rotate
implements hardware SHA acceleration
reduce the size of sha256 context and simplify its update flow

Performance on small messages is greater than OpenSSL and BLST

…nt specific use-cases like #205; also implement SSSE3 acceleration (2006, Intel Core 2 Duo)

mratsim added 5 commits September 19, 2022 01:47

sha256: separate message scheduling and state updates to help impleme…

5f9ac5f

…nt specific use-cases like #205; also implement SSSE3 acceleration (2006, Intel Core 2 Duo)

sha256: simplify update flow, store less metadata in context

17d78f2

sha256: Fix reworked update function

69d77a4

Implement x86 hardware SHA acceleration

aa592cc

typo

d45f49d

mratsim force-pushed the sha256-simd branch from a181283 to d45f49d Compare September 18, 2022 23:50

mratsim merged commit 351a3f6 into master Sep 19, 2022

mratsim deleted the sha256-simd branch September 19, 2022 00:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sha256 refactor #206

Sha256 refactor #206

mratsim commented Sep 18, 2022

Sha256 refactor #206

Sha256 refactor #206

Conversation

mratsim commented Sep 18, 2022