Provide a SIMD implementation of swisstable_group_query suitable for ARM #17

wesleywiser · 2021-09-20T13:41:36Z

Briefly mentioned in #16, but as ARM devices become more popular, it would great to have an accelerated implementation for them as well.

michaelwoerister · 2021-09-20T13:56:04Z

According to this comment in hashbrown it might not be worth the trouble:

// Use the SSE2 implementation if possible: it allows us to scan 16 buckets
// at once instead of 8. We don't bother with AVX since it would require
// runtime dispatch and wouldn't gain us much anyways: the probability of
// finding a match drops off drastically after the first few buckets.
//
// I attempted an implementation on ARM using NEON instructions, but it
// turns out that most NEON instructions have multi-cycle latency, which in
// the end outweighs any gains over the generic implementation.

Also, according to local benchmarks someone ran for me on an M1 MacMini, the non-SIMD version there still easily outperformed the SIMD version on an AMD Ryzen 5900x 😃

michaelwoerister · 2021-10-11T09:08:42Z

I just found this PR/discussion in the hashbrown repo: rust-lang/hashbrown#269
Very interesting!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a SIMD implementation of swisstable_group_query suitable for ARM #17

Provide a SIMD implementation of swisstable_group_query suitable for ARM #17

wesleywiser commented Sep 20, 2021

michaelwoerister commented Sep 20, 2021

michaelwoerister commented Oct 11, 2021

Provide a SIMD implementation of swisstable_group_query suitable for ARM #17

Provide a SIMD implementation of swisstable_group_query suitable for ARM #17

Comments

wesleywiser commented Sep 20, 2021

michaelwoerister commented Sep 20, 2021

michaelwoerister commented Oct 11, 2021