Take advantage of avx512 instructions when available #85

tkaitchuck · 2021-05-22T05:22:10Z

Using the instruction: https://doc.rust-lang.org/nightly/core/arch/x86_64/fn._mm512_aesdec_epi128.html
It should be possible to get 4x the throughput on large strings.

Note: Currently very few processors support this.

SchrodingerZhu · 2022-10-19T19:25:20Z

@tkaitchuck
I have an ongoing investigation at:
SchrodingerZhu@c117213

This utilizes 256bit SIMD registers that both relatively new intel and amd cpus support.
I haven't run the benchmark yet. But according to my previous experience with VPCLMULQDQ (which is used in CRC64 calculation), such change should be able to bring speed up.

Also notice that if you really want to use 512bit registers, just unroll more loops and similar tricks apply.

As you may have noticed that the commit above was a little bit messy. This is because I noticed some potential bugs within rust's core that makes the code generation bad for Zen 3 CPUs. I will sync the info once I reported the issue to rust's stdarch library.

tkaitchuck added the enhancement New feature or request label May 22, 2021

SchrodingerZhu mentioned this issue Oct 19, 2022

VAES should not be restricted to AVX512 rust-lang/stdarch#1343

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take advantage of avx512 instructions when available #85

Take advantage of avx512 instructions when available #85

tkaitchuck commented May 22, 2021

SchrodingerZhu commented Oct 19, 2022 •

edited

Loading

Take advantage of avx512 instructions when available #85

Take advantage of avx512 instructions when available #85

Comments

tkaitchuck commented May 22, 2021

SchrodingerZhu commented Oct 19, 2022 • edited Loading

SchrodingerZhu commented Oct 19, 2022 •

edited

Loading