Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] X86FixupVectorConstantsPass - add support for VPMOVSX/VPMOVZX constant extension #71078

Closed
RKSimon opened this issue Nov 2, 2023 · 3 comments

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Nov 2, 2023

X86FixupVectorConstantsPass currently just handles folding full vector loads to broadcasts, but we're missing the opportunity to use sign/zero extension loads for non-uniform cases.

void foo(const int *src, float *dst) {
    for (int i = 0; i != 16; ++i) {
        *dst++ = (float)(*src++ + ((i % 8) + 1));
    }
}

llc -mcpu=x86-64-v3

foo(int const*, float*): # @foo(int const*, float*)
  vmovdqa .LCPI0_0(%rip), %ymm0 # ymm0 = [1,2,3,4,5,6,7,8]
  vpaddd (%rdi), %ymm0, %ymm1
  vcvtdq2ps %ymm1, %ymm1
  vmovups %ymm1, (%rsi)
  vpaddd 32(%rdi), %ymm0, %ymm0
  vcvtdq2ps %ymm0, %ymm0
  vmovups %ymm0, 32(%rsi)
  vzeroupper
  retq

We can reduce the size of the constant pool entry by replacing the vmovdqa load with vpmovzxbd

@llvmbot
Copy link
Member

llvmbot commented Nov 2, 2023

@llvm/issue-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

X86FixupVectorConstantsPass currently just handles folding full vector loads to broadcasts, but we're missing the opportunity to use sign/zero extension loads for non-uniform cases. ```c void foo(const int *src, float *dst) { for (int i = 0; i != 16; ++i) { *dst++ = (float)(*src++ + ((i % 8) + 1)); } } ``` llc -mcpu=x86-64-v3 ```asm foo(int const*, float*): # @foo(int const*, float*) vmovdqa .LCPI0_0(%rip), %ymm0 # ymm0 = [1,2,3,4,5,6,7,8] vpaddd (%rdi), %ymm0, %ymm1 vcvtdq2ps %ymm1, %ymm1 vmovups %ymm1, (%rsi) vpaddd 32(%rdi), %ymm0, %ymm0 vcvtdq2ps %ymm0, %ymm0 vmovups %ymm0, 32(%rsi) vzeroupper retq ``` We can reduce the size of the constant pool entry by replacing the vmovdqa load with vpmovzxbd

RKSimon added a commit that referenced this issue Nov 17, 2023
…maller vector constant data

If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry.

Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads.

This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both #70947 and better X86FixupVectorConstantsPass usage for #71078).
sr-tream pushed a commit to sr-tream/llvm-project that referenced this issue Nov 20, 2023
…maller vector constant data

If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry.

Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads.

This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both llvm#70947 and better X86FixupVectorConstantsPass usage for llvm#71078).
zahiraam pushed a commit to zahiraam/llvm-project that referenced this issue Nov 20, 2023
…maller vector constant data

If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry.

Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads.

This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both llvm#70947 and better X86FixupVectorConstantsPass usage for llvm#71078).
RKSimon added a commit that referenced this issue Nov 20, 2023
…maller vector constant data (REAPPLIED)

If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry.

Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads.

This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both #70947 and better X86FixupVectorConstantsPass usage for #71078).

Reapplied with fix to ensure we don't 'flip-flop' between multiple matching constants - only perform the fold if the new constant pool entry is larger than the current entry.
RKSimon added a commit to RKSimon/llvm-project that referenced this issue Jan 18, 2024
This helps ensure the encoding details are next to the EVEX tag

Noticed while preparing to add more constant commenting as part of llvm#73783 and llvm#71078
RKSimon added a commit that referenced this issue Jan 18, 2024
…78585)

This helps ensure the encoding details are next to the EVEX tag

Noticed while preparing to add more constant commenting as part of #73783 and #71078
RKSimon added a commit that referenced this issue Jan 22, 2024
…or future patches. NFC.

Add helper to convert raw APInt bit stream into ConstantDataVector elements.

This was used internally by rebuildSplatableConstant but will be reused in future patches for #73783 and #71078
@RKSimon
Copy link
Collaborator Author

RKSimon commented Jan 29, 2024

VPMOVSX cases: #79815

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 7, 2024

VPMOVSX: b5d35fe
VPMOVZX: 7bfcf8c

@RKSimon RKSimon closed this as completed Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants