-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement much faster sha256 and sha512. #41
Conversation
Looks very interesting, thanks! Left some notes. |
BTW, is |
What do you mean by that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Looks interesting indeed.
|
@0xdeafbeef it makes the build deterministic, which makes it easier to spot problems arising from particular dependency changes. It's something we do across the board, although perhaps there are repos like this one which it makes less sense for. |
@0xdeafbeef did you say you compared the If they're the same speed (which is what I'd expect), then it probably doesn't make sense to include ASM SHA-NI support as we already have that case covered in pure Rust. |
Speed is the same. I think we should include it because if somebody uses |
Since we already have the intrinsic code in the Otherwise, there is duplication of the feature across the |
Hmm, build failure seems unrelated I think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One minor suggestion.
@0xdeafbeef can you rebase? I think #42 should've taken care of the build failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think it looks good for merging. I have only two nits and it would be nice to rebase it first.
sha2/src/lib.rs
Outdated
extern "C" { | ||
fn sha256_compress(state: &mut [u32; 8], block: &[u8; 64]); | ||
fn sha256_transform_rorx(state: &mut [u32; 8], block: *const [u8; 64], num_blocks: u64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to change num_blocks
to usize
here. Also we probably should change sha256_compress
to explicit pointer and length as well (same for sha512_compress
). IIRC memory layout of slices is not guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it's guaranteed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your link talks about layout of slice itself (i.e. about how elements of a slice a stored in memory). In this context it's more about ABI guarantees, i.e. I don't think it's currently guaranteed that val: &[u8; 16]
is equivalent to val_ptr: *const [u8; 16], len: usize
when used in extern "C" fn
s. Can you please modify the signature just to be extra safe?
@@ -13,23 +13,37 @@ | |||
#[cfg(not(any(target_arch = "x86_64", target_arch = "x86", target_arch = "aarch64")))] | |||
compile_error!("crate can only be used on x86, x86-64 and aarch64 architectures"); | |||
|
|||
cpufeatures::new!(cpuid_avx2, "avx2"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gate this line on #[cfg(any(target_arch = "x86_64", target_arch = "x86"))]
. Otherwise it causes compilation failure on Aarch64 targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to modify the compress256
function (see the CI failure). Currently it tries to use the cpuid_avx2
module on all targets. I think the easiest solution would be to introduce two function with the same name one gated on x86(-64) and another one on AArch64.
BTW could you also compare performance of the AVX2 based assembly with the intrinsics-based implementation from RustCrypto/hashes#312? |
asm
intrinsic
Force soft.
I think that asm version is not needed anymore. |
After pinning to the same core
intrinsic
@newpavlov should I close pr? |
Hm, I am not 100% sure. Some may prefer the assembly implementation from reliability point of view, since with an intrinsics-based implementation we at the mercy of the compiler and in some cases achieved performance can be brittle. From another point of view, people usually expect that an assembly implementation is faster than a "software" one. @tarcieri |
Yeah, it's definitely a tradeoff. I think the biggest risk is actually miscompilation (see e.g. rust-lang/rust#79865). That said I'd weakly be in favor of an all-intrinsics approach if performance is comparable to assembly. I think that better fits the philosophy of "Rust Crypto", and unless there are big performance wins with ASM it's probably best avoided, at least within the crates we maintain. A pure Rust approach solves a lot of problems, especially relating to portability. Relevant: RustCrypto/hashes#315 |
I also lean towards the stance "assembly impls only for sufficient performance improvements", so I guess we can close this PR. @0xdeafbeef |
I took sha256 and sha512 variants from linux sources.
On AMD Ryzen 9 5900HS comparing
with
gives such results:
Closes #5