-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AES hash is significantly slower than fallback for short strings on Broadwell #66
Comments
Yes, in intel systems there was a major optimization introduced in skylake which reduced the latency of the the AES instruction from 7 to 4 cycles. See: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=aes&expand=227 It is worth noting that it affected the latency but not the throughput, so if instructions get properly aligned and pipelined (assuming there is other work to do) the delay should not be an issue. If its showing up in macrobenchmarks that obviously isn't happening. From the raw results above the strings less than 8 characters are slower than those which are longer. I've seen that before and it came down to code alignment. It is possible to force alignments, and I did some experimenting with that but couldn't find anything that consistently gave better results than letting the compiler figure it out. But that's not likely the same issue you are seeing in the macro benchmarks. (Or at least not entirely). It is possible to switch the algorithm based on the type in nightly, using specialization. However it is not possible to dispatch based on the size. So I cannot change which algorithm is used, but can only change the update function within the algorithm. There is already a code path for < 8 byte strings, in the aes variant. It should be possible to replace this with an alternate update function, but it's not as straightforward as copying the code from the fallback because it is designed to work with a 64bit state not a 128bit one, and order of the update is different. I have to think more about how to deal with this. If you have any ideas, let me know. |
I thought of a way to reduce it to 3 aes rounds instead of 4. If we want to go beyond that we might need more information. |
The macrobenchmark involves handling JSON documents with keys that are of variable lengths using IndexMap, but based on the performance numbers (10-20% performance decrease with AES enabled), and based on examination of the documents, I would presume that the key lengths skew a lot shorter. A nightly only solution would be fine for my use-case, but probably not for most people using this library. |
@as-com Can you run a test with the short-string branch and let me know how that performs? |
Testing on the same machine with Rust Nightly (1.51, 2021-01-14), the performance appears to be significantly improved, but still slightly slower than fallback:
In the JSON-handling macrobenchmark, performance appears to be mostly unchanged compared to the master branch with AES enabled (i.e. still slower by 10% or so). I suspect there is some weirdness going on with AES instructions stalling something, or something else. |
Well, there is also the size of the state. The AES version needs to create keys which consist of 3 128 bit values. The fallback only needs 4 64 bit values. So the difference in instantiation of the hasher may also be different. |
I had to restructure the way specialization worked, and the approach I had earlier won't work. @as-com Let me know how this preforms for you. |
On Rust Nightly (1.51 2021-01-22), performance is either on-par or improved in the benchmark:
In the JSON macrobenchmark, performance appears to be unchanged compared to the master branch. Disabling AES support appears to cause about a 6% increase in performance. Seems the larger state of the AES hash is the real problem here. |
This allows strings less than 8 bytes to be special cased to avoid the AES call as mentioned in #66 Signed-off-by: Tom Kaitchuck <Tom.Kaitchuck@gmail.com>
@as-com can you try the json benchmark with HashBrown 0.10.0 on nightly? (I know it's yanked and am attempting to sort that out) |
Running the JSON benchmark with hashbrown's master branch (feature Compared to hashbrown 0.9.1 with aHash 0.7, performance is unchanged. |
Note to self: run tests again in light of rust-lang/rust#83027 and rust-lang/rust#83084 |
Update: the performance regression from using |
Tested on a Broadwell Xeon E5-2690 v4 with Rust Nightly (1.51, 2020-01-09):
This performance difference is very noticeable in some macrobenchmarks that involve aHash-powered hashmaps. If this is an inherent limitation of the AES-powered hash, perhaps it would be nice to have a feature flag or some other argument to force the use of the fallback hash if the hashed values are known to be short.
Raw test results
The text was updated successfully, but these errors were encountered: