VAES should not be restricted to AVX512 #1343

SchrodingerZhu · 2022-10-19T19:32:58Z

stdarch/crates/core_arch/src/x86/avx512vaes.rs

Line 65 in 790411f

pub unsafe fn _mm256_aesdec_epi128(a: __m256i, round_key: __m256i) -> __m256i {

It seems that the stdarch library unconditionally put VAES related intrinsics into the AVX512VL scope while VAES is actually available on platforms without AVX512 support (AMD Zen3).

I spot the issue when I was investigating VAES with ahash: tkaitchuck/aHash#85

When using _mm256_aesenc_epi128, instead of generating something like

   2cf7e:       c5 d1 6c ee             vpunpcklqdq %xmm6,%xmm5,%xmm5
   2cf82:       c4 c1 f9 6e f2          vmovq  %r10,%xmm6
   2cf87:       c5 c9 6c f7             vpunpcklqdq %xmm7,%xmm6,%xmm6
   2cf8b:       c4 e2 7d dc c4          vaesenc %ymm4,%ymm0,%ymm0
   2cf90:       c4 e2 75 dc cb          vaesenc %ymm3,%ymm1,%ymm1
   2cf95:       c4 e3 7d 38 f6 01       vinserti128 $0x1,%xmm6,%ymm0,%ymm6
   2cf9b:       c4 e3 55 02 ee f0       vpblendd $0xf0,%ymm6,%ymm5,%ymm5
   2cfa1:       c4 e2 55 00 ea          vpshufb %ymm2,%ymm5,%ymm5
   2cfa6:       c5 e5 d4 dd             vpaddq %ymm5,%ymm3,%ymm3
   2cfaa:       c4 e2 65 00 da          vpshufb %ymm2,%ymm3,%ymm3
   2cfaf:       c5 dd d4 db             vpaddq %ymm3,%ymm4,%ymm3
   2cfb3:       c4 e3 7d 39 dc 01       vextracti128 $0x1,%ymm3,%xmm4
   2cfb9:       c4 e3 f9 16 d8 01       vpextrq $0x1,%xmm3,%rax
   2cfbf:       c4 e1 f9 7e d9          vmovq  %xmm3,%rcx
   2cfc4:       c4 c3 f9 16 e1 01       vpextrq $0x1,%xmm4,%r9
   2cfca:       c4 c1 f9 7e e2          vmovq  %xmm4,%r10

the compiler generates

  2cf8d:       c5 fc 29 84 24 80 00    vmovaps %ymm0,0x80(%rsp)
   2cf94:       00 00 
   2cf96:       c5 fd 7f 94 24 20 01    vmovdqa %ymm2,0x120(%rsp)
   2cf9d:       00 00 
   2cf9f:       c5 fc 29 8c 24 40 01    vmovaps %ymm1,0x140(%rsp)
   2cfa6:       00 00 
   2cfa8:       c5 f8 77                vzeroupper
   2cfab:       e8 c0 db ff ff          call   2ab70 <core::core_arch::x86::avx512vaes::_mm256_aesenc_epi128::hdf0bd31f9011eecd>
   2cfb0:       c5 fd 6f 84 24 a0 00    vmovdqa 0xa0(%rsp),%ymm0
   2cfb7:       00 00 
   2cfb9:       c5 fd 6f 1c 24          vmovdqa (%rsp),%ymm3
   2cfbe:       c5 fd d4 44 24 60       vpaddq 0x60(%rsp),%ymm0,%ymm0
   2cfc4:       c4 e2 7d 00 05 d3 ad    vpshufb 0x8add3(%rip),%ymm0,%ymm0        # b7da0 <_fini+0x10a0>
   2cfcb:       08 00 
   2cfcd:       c5 fd d4 44 24 40       vpaddq 0x40(%rsp),%ymm0,%ymm0
   2cfd3:       c4 c3 f9 16 c7 01       vpextrq $0x1,%xmm0,%r15
   2cfd9:       c4 e1 f9 7e c3          vmovq  %xmm0,%rbx
   2cfde:       c4 e3 7d 39 c0 01       vextracti128 $0x1,%ymm0,%xmm0
   2cfe4:       c4 c3 f9 16 c6 01       vpextrq $0x1,%xmm0,%r14
   2cfea:       c4 e1 f9 7e c6          vmovq  %xmm0,%rsi

It is very strange that the compiler do not inline the function call under release profile even if target-cpu=native is set.
However, when I explicit write

    extern "C" {
        #[link_name = "llvm.x86.aesni.aesenc.256"]
        fn aesenc_256(a: __m256i, round_key: __m256i) -> __m256i;
    }

    unsafe {
        transmute(aesenc_256(transmute(value), transmute(xor)))
    }

The compiler will give the upper asm as expected.

I suspect that this is because of the intrinsic being marked as avx512vl instruction.

The text was updated successfully, but these errors were encountered:

SchrodingerZhu · 2022-10-19T19:34:33Z

My CPU info:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  128
  On-line CPU(s) list:   0-127
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7773X 64-Core Processor
    CPU family:          25
    Model:               1
    Thread(s) per core:  2
    Core(s) per socket:  64
    Socket(s):           1
    Stepping:            2
    Frequency boost:     enabled
    CPU(s) scaling MHz:  63%
    CPU max MHz:         3527.7339
    CPU min MHz:         1500.0000
    BogoMIPS:            4404.54
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 
                         fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 inv
                         pcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr 
                         rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca

Both code run without problem despite the unsatisfactory code generation.

These are available on AMD CPUs without AVX512, Fixes rust-lang#1343

SchrodingerZhu mentioned this issue Oct 20, 2022

Consider renaming the avx512gfni feature to just gfni rust-lang/rust#100752

Closed

Amanieu added a commit to Amanieu/stdarch that referenced this issue Oct 27, 2022

Don't require AVX512 for 256-bit VAES intrinsics

e47f3c0

These are available on AMD CPUs without AVX512, Fixes rust-lang#1343

Amanieu mentioned this issue Oct 27, 2022

Don't require AVX512 for 256-bit VAES intrinsics #1348

Merged

Amanieu closed this as completed in #1348 Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VAES should not be restricted to AVX512 #1343

VAES should not be restricted to AVX512 #1343

SchrodingerZhu commented Oct 19, 2022 •

edited

Loading

SchrodingerZhu commented Oct 19, 2022

VAES should not be restricted to AVX512 #1343

VAES should not be restricted to AVX512 #1343

Comments

SchrodingerZhu commented Oct 19, 2022 • edited Loading

SchrodingerZhu commented Oct 19, 2022

SchrodingerZhu commented Oct 19, 2022 •

edited

Loading