Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler no longer producing single horizontal add instruction after ab2c499d #49736

Closed
dyung opened this issue May 18, 2021 · 2 comments · Fixed by #114101
Closed

Compiler no longer producing single horizontal add instruction after ab2c499d #49736

dyung opened this issue May 18, 2021 · 2 comments · Fixed by #114101
Assignees
Labels
bugzilla Issues migrated from bugzilla vectorizers

Comments

@dyung
Copy link
Collaborator

dyung commented May 18, 2021

Bugzilla Link 50392
Version trunk
OS All
CC @alexey-bataev,@anton-afanasyev,@RKSimon

Extended Description

One of our internal tests compiles the following code and verifies that it generates a horizontal add instruction in the resulting assembly when optimizations are enabled and targeting btver2.

__attribute__((noinline))
__m256d add_pd_002(__m256d a, __m256d b) {
  __m256d r = (__m256d){ a[0] + a[1], a[2] + a[3], b[0] + b[1], b[2] + b[3] };
  return __builtin_shufflevector(r, a, 0, -1, 2, 3);
}

Prior to commit ab2c499, when compiled with "-g0 -O3 -march=btver2", the compiler would produce the following code:

        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        vhaddpd %ymm1, %ymm0, %ymm0

But following the mentioned change, the compiler is now producing the following code instead:

        vextractf128    $1, %ymm1, %xmm2
        vhaddpd %xmm0, %xmm0, %xmm0
        vhaddpd %xmm2, %xmm1, %xmm1
        vperm2f128      $2, %ymm0, %ymm1, %ymm0 # ymm0 = ymm0[0,1],ymm1[0,1]
@dyung
Copy link
Collaborator Author

dyung commented May 18, 2021

Link to review of the change that caused the regression: https://reviews.llvm.org/D98714

@anton-afanasyev
Copy link
Contributor

To add more details: this should be fixed after Alexey's patch (https://reviews.llvm.org/D57059) for the non-power-of-two vector sizes landing.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@RKSimon RKSimon self-assigned this Oct 29, 2024
RKSimon added a commit that referenced this issue Oct 30, 2024
I've kept the old PR50392 tag since this is such an old issue....
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this issue Nov 3, 2024
…"binop (shuffle), (shuffle)" (llvm#114101)

Add foldPermuteOfBinops - to fold a permute (single source shuffle) through a binary op that is being fed by other shuffles.

Fixes llvm#94546
Fixes llvm#49736
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this issue Nov 4, 2024
I've kept the old PR50392 tag since this is such an old issue....
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this issue Nov 4, 2024
…"binop (shuffle), (shuffle)" (llvm#114101)

Add foldPermuteOfBinops - to fold a permute (single source shuffle) through a binary op that is being fed by other shuffles.

Fixes llvm#94546
Fixes llvm#49736
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla vectorizers
Projects
None yet
4 participants