-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chacha: safer outputting #1181
chacha: safer outputting #1181
Conversation
Thanks! Could you do |
It is actually between 1-2% slower. Looking at the ASM, the optimizer does the right thing with the bytewise loop (unroll the loop, move data in SIMD chunks), but it doesn't see through the wordwise loop. However, I found that if I manually unroll the loop, the optimizer produces SIMD output equivalent to the current Benchmarks (SSE4.1 machine): Before this PR: After original PR: After updated PR: |
What if you use explicit indexing ( |
This comment has been minimized.
This comment has been minimized.
This reverts commit 7d9607a. (Had a bug, after fixing the bug perf was poor)
@Ralith: Thanks for the idea, but in this case I'm getting poor performance with a 0..4 loop. Tried as follows:
|
Ah well, thanks for trying! |
Perf numbers for another PR I'm making weren't what I expected, but I narrowed the results down to this:
That's 15-29% slower. (CPU is 5800X aka Vermeer/Zen 3.) |
What is "before", and what is "after"? Your numbers look faster. @kazcw Did you perform your benchmarks with native optimizations or without? |
I also observe the performance regression on a Ryzen 9 4900HS, independent of native optimizations. So it looks like the new code does not optimize properly for AVX? Before:
After:
|
mod guts was originally designed for the byteslice interface RustCrypto APIs require--but the algorithm operates on u32 words internally, and rand wants a wordslice interface, so we were converting to bytes in mod guts and converting back to words in mod chacha. We can simply output directly to a wordslice in guts. It is simpler; it may be marginally faster; it avoids an unsafe (cf. #1170).