Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sha256 refactor #206

Merged
merged 5 commits into from
Sep 19, 2022
Merged

Sha256 refactor #206

merged 5 commits into from
Sep 19, 2022

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented Sep 18, 2022

This PR updates the SHA256 implementation

  • cleanly separates message scheduling and hashing round. Message scheduling is independent of previous data and can be parallelized 4x uint32 per 4x uint32. For known padding blocks, such as those that arises in Merkle trees (see Accelerate Merkle tree hashing #205), it can be precomputed for significant speedups.

  • implements SSSE3 (introduced in 2006, via Core 2 Duo) to parallelize message scheduling. This speeds SHA256 by 30%.

    Note: the SIMD rotate instruction _mm_ror_epi32 to trivially translate sha256 was added with AVX512F+AVX512VL.
    The speedup would only be helpful on Skylake-X as the only architecture with AVX512VL but no hardware SHA (Cannon Lake never shipped). And that speedup would be limited to replacing shifts+xors with rotate

  • implements hardware SHA acceleration

  • reduce the size of sha256 context and simplify its update flow

Performance on small messages is greater than OpenSSL and BLST
image

@mratsim mratsim merged commit 351a3f6 into master Sep 19, 2022
@mratsim mratsim deleted the sha256-simd branch September 19, 2022 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant