You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
_mm_cvtpd_epi64 that would convert 2x double using MXCSR would speed up things for arm and non-avx x86 => actually a AVX512DQ + AVX512VL existing instruction
_mm_abs_ps
_mm_movemask_epi16
_mm_cmpge_epi8
_mm_cmpge_epi16 (twice)
_mm_cmple_epi8
_mm_cmple_epi16
_mm_not_si128
Ideas from Alfred Klomp
mm_absdiff_epu16
mm_absdiff_epu8
mm_blendv_si128
mm_bswap_epi16
mm_bswap_epi32
mm_bswap_epi64
mm_bswap_si128
mm_cmpge_epu16
mm_cmpge_epu8
mm_cmpgt_epu16
mm_cmpgt_epu8
mm_cmple_epu16
mm_cmple_epu8
mm_cmplt_epu16
mm_cmplt_epu8
mm_div255_epu16
mm_div_epu8
mm_divfast_epu16
mm_divfast_epu8
mm_max_epu16
mm_min_epu16
mm_not_si128
mm_scale_epu8
_mm256_unpacklo_si128
_mm256_unpackhi_si128
The text was updated successfully, but these errors were encountered:
Add one here every time you wish for one:
_mm_cvtpd_epi64
that would convert 2x double using MXCSR would speed up things for arm and non-avx x86 => actually a AVX512DQ + AVX512VL existing instruction_mm_abs_ps
_mm_movemask_epi16
Ideas from Alfred Klomp
The text was updated successfully, but these errors were encountered: