You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to this comment in hashbrown it might not be worth the trouble:
// Use the SSE2 implementation if possible: it allows us to scan 16 buckets
// at once instead of 8. We don't bother with AVX since it would require
// runtime dispatch and wouldn't gain us much anyways: the probability of
// finding a match drops off drastically after the first few buckets.
//
// I attempted an implementation on ARM using NEON instructions, but it
// turns out that most NEON instructions have multi-cycle latency, which in
// the end outweighs any gains over the generic implementation.
Also, according to local benchmarks someone ran for me on an M1 MacMini, the non-SIMD version there still easily outperformed the SIMD version on an AMD Ryzen 5900x 😃
Briefly mentioned in #16, but as ARM devices become more popular, it would great to have an accelerated implementation for them as well.
The text was updated successfully, but these errors were encountered: