Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][GlobalISel] Legalization cycle with shufflevector #81244

Closed
nikic opened this issue Feb 9, 2024 · 10 comments · Fixed by #81479
Closed

[AArch64][GlobalISel] Legalization cycle with shufflevector #81244

nikic opened this issue Feb 9, 2024 · 10 comments · Fixed by #81479

Comments

@nikic
Copy link
Contributor

nikic commented Feb 9, 2024

; RUN: llc -mtriple=aarch64-- -O0 < %s
define <4 x i8> @test(<2 x i8> %arg) {
  %shuffle = shufflevector <2 x i8> %arg, <2 x i8> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
  ret <4 x i8> %shuffle
}

Results in a GlobalISel legalization cycle.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 9, 2024

@llvm/issue-subscribers-backend-aarch64

Author: Nikita Popov (nikic)

```llvm ; RUN: llc -mtriple=aarch64-- -O0 < %s define <16 x i8> @test(<2 x i8> %arg) { %shuffle = shufflevector <2 x i8> %arg, <2 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison> ret <16 x i8> %shuffle } ``` Results in a GlobalISel legalization cycle.

@nikic
Copy link
Contributor Author

nikic commented Feb 9, 2024

We start out with

bb.1 (%ir-block.0):
  liveins: $d0
  %1:_(<2 x s32>) = COPY $d0
  %0:_(<2 x s8>) = G_TRUNC %1:_(<2 x s32>)
  %8:_(s8) = G_IMPLICIT_DEF
  %6:_(<2 x s8>) = G_BUILD_VECTOR %8:_(s8), %8:_(s8)
  %7:_(<4 x s8>) = G_CONCAT_VECTORS %0:_(<2 x s8>), %6:_(<2 x s8>)
  %5:_(<4 x s16>) = G_ANYEXT %7:_(<4 x s8>)
  $d0 = COPY %5:_(<4 x s16>)
  RET_ReallyLR implicit $d0

Then after one iteration we get:

bb.1 (%ir-block.0):
  liveins: $d0
  %1:_(<2 x s32>) = COPY $d0
  %9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %1:_(<2 x s32>)
  %11:_(s16) = G_TRUNC %9:_(s32)
  %12:_(s16) = G_TRUNC %10:_(s32)
  %13:_(<2 x s16>) = G_BUILD_VECTOR %11:_(s16), %12:_(s16)
  %0:_(<2 x s8>) = G_TRUNC %13:_(<2 x s16>)
  %15:_(s32) = G_IMPLICIT_DEF
  %14:_(s32) = COPY %15:_(s32)
  %16:_(<2 x s32>) = G_BUILD_VECTOR %14:_(s32), %15:_(s32)
  %6:_(<2 x s8>) = G_TRUNC %16:_(<2 x s32>)
  %7:_(<4 x s8>) = G_CONCAT_VECTORS %0:_(<2 x s8>), %6:_(<2 x s8>)
  %5:_(<4 x s16>) = G_ANYEXT %7:_(<4 x s8>)
  $d0 = COPY %5:_(<4 x s16>)
  RET_ReallyLR implicit $d0

After two we get:

bb.1 (%ir-block.0):
  liveins: $d0
  %1:_(<2 x s32>) = COPY $d0
  %0:_(<2 x s8>) = G_TRUNC %1:_(<2 x s32>)
  %15:_(s32) = G_IMPLICIT_DEF
  %14:_(s32) = COPY %15:_(s32)
  %22:_(s16) = G_TRUNC %14:_(s32)
  %23:_(s16) = G_TRUNC %15:_(s32)
  %24:_(<2 x s16>) = G_BUILD_VECTOR %22:_(s16), %23:_(s16)
  %6:_(<2 x s8>) = G_TRUNC %24:_(<2 x s16>)
  %7:_(<4 x s8>) = G_CONCAT_VECTORS %0:_(<2 x s8>), %6:_(<2 x s8>)
  %5:_(<4 x s16>) = G_ANYEXT %7:_(<4 x s8>)
  $d0 = COPY %5:_(<4 x s16>)
  RET_ReallyLR implicit $d0

And then it repeats.

@tschuett
Copy link
Member

tschuett commented Feb 9, 2024

llc -march=aarch64 -global-isel -stop-after=irtranslator foo.ll -o foo.mir:

 bb.1 (%ir-block.0):
    liveins: $d0

    %1:_(<2 x s32>) = COPY $d0
    %0:_(<2 x s8>) = G_TRUNC %1(<2 x s32>)
    %4:_(s8) = G_CONSTANT i8 0
    %3:_(<2 x s8>) = G_BUILD_VECTOR %4(s8), %4(s8)
    %2:_(<4 x s8>) = G_SHUFFLE_VECTOR %0(<2 x s8>), %3, shufflemask(0, 1, undef, undef)
    %5:_(<4 x s16>) = G_ANYEXT %2(<4 x s8>)
    $d0 = COPY %5(<4 x s16>)
    RET_ReallyLR implicit $d0

llc -march=aarch64 -run-pass=aarch64-prelegalizer-combiner foo.mir -o foo2.mir

  liveins: $d0

    %1:_(<2 x s32>) = COPY $d0
    %0:_(<2 x s8>) = G_TRUNC %1(<2 x s32>)
    %6:_(<2 x s8>) = G_IMPLICIT_DEF
    %7:_(<4 x s8>) = G_CONCAT_VECTORS %0(<2 x s8>), %6(<2 x s8>)
    %5:_(<4 x s16>) = G_ANYEXT %7(<4 x s8>)
    $d0 = COPY %5(<4 x s16>)
    RET_ReallyLR implicit $d0

@tschuett
Copy link
Member

tschuett commented Feb 9, 2024

The G_CONCAT_VECTORS is illegal. There are no attempts to make illegal concat vectors legal:

getActionDefinitionsBuilder(G_CONCAT_VECTORS)

The G_IMPLICIT_DEF is illegal.

{G_IMPLICIT_DEF, G_FREEZE, G_CONSTANT_FOLD_BARRIER})

G_CONCAT_VECTORS and G_IMPLICIT_DEF would need a .moreElementsToNextPow2(0);.

@tschuett
Copy link
Member

tschuett commented Feb 9, 2024

It seems to be a bigger issue. I tried .moreElementsToNextPow2(0);., but it is still insufficient.

@nikic
Copy link
Contributor Author

nikic commented Feb 9, 2024

I spent way too much time today trying to figure this out, but have a really hard time wrapping my head around GlobalISel legalization.

I think the issue could be related to three things:

  • Trying to legalize trunc from <2 x s32> to <2 x s8> by splitting it into trunc <2 x s32> to <2 x s16> and then trunc <2 x s16> to <2 x s8>
  • Ability to combine such trunc pairs back to the original trunc.
  • Scalarization of trunc into scalar trunc and build_vector, where the build_vector then gets legalized back into ... a trunc.

cc @arsenm

@tschuett
Copy link
Member

tschuett commented Feb 9, 2024

%7:_(<4 x s8>) = G_CONCAT_VECTORS %0(<2 x s8>), %6(<2 x s8>)

must become

%7:_(<4 x s32>) = G_CONCAT_VECTORS %0(<2 x s32>), %6(<2 x s32>)

Note the change from s8 to s32. The legalizer has no means to perform this change.
Then

%6:_(<2 x s8>) = G_IMPLICIT_DEF

must become

%6:_(<2 x s32>) = G_IMPLICIT_DEF

which is legal.

@tschuett
Copy link
Member

tschuett commented Feb 9, 2024

Maybe a .widenScalarOrEltToNextPow2(0) for G_CONCAT_VECTOR could improve the situation.

nikic added a commit that referenced this issue Feb 13, 2024
If we have something like G_TRUNC from v2s32 to v2s16, then lowering
this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from
v2s16 to v2s8 does not bring us any closer to legality. In fact, the
first part of that is a G_BUILD_VECTOR whose legalization will produce a
new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get
combined to the original, causing a legalization cycle.

Make the lowering condition more precise, by requiring that the original
vector is >128 bits, which is I believe the only case where this
specific splitting approach is useful.

Note that this doesn't actually produce a legal result (the alwaysLegal
is a lie, as before), but it will cause a proper globalisel abort
instead of an infinite legalization loop.

Fixes #81244.
@nikic
Copy link
Contributor Author

nikic commented Feb 13, 2024

/cherry-pick 070848c

llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 13, 2024
If we have something like G_TRUNC from v2s32 to v2s16, then lowering
this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from
v2s16 to v2s8 does not bring us any closer to legality. In fact, the
first part of that is a G_BUILD_VECTOR whose legalization will produce a
new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get
combined to the original, causing a legalization cycle.

Make the lowering condition more precise, by requiring that the original
vector is >128 bits, which is I believe the only case where this
specific splitting approach is useful.

Note that this doesn't actually produce a legal result (the alwaysLegal
is a lie, as before), but it will cause a proper globalisel abort
instead of an infinite legalization loop.

Fixes llvm#81244.

(cherry picked from commit 070848c)
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 13, 2024

/pull-request #81581

nikic added a commit to rust-lang/llvm-project that referenced this issue Feb 13, 2024
If we have something like G_TRUNC from v2s32 to v2s16, then lowering
this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from
v2s16 to v2s8 does not bring us any closer to legality. In fact, the
first part of that is a G_BUILD_VECTOR whose legalization will produce a
new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get
combined to the original, causing a legalization cycle.

Make the lowering condition more precise, by requiring that the original
vector is >128 bits, which is I believe the only case where this
specific splitting approach is useful.

Note that this doesn't actually produce a legal result (the alwaysLegal
is a lie, as before), but it will cause a proper globalisel abort
instead of an infinite legalization loop.

Fixes llvm#81244.

(cherry picked from commit 070848c)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 13, 2024
If we have something like G_TRUNC from v2s32 to v2s16, then lowering
this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from
v2s16 to v2s8 does not bring us any closer to legality. In fact, the
first part of that is a G_BUILD_VECTOR whose legalization will produce a
new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get
combined to the original, causing a legalization cycle.

Make the lowering condition more precise, by requiring that the original
vector is >128 bits, which is I believe the only case where this
specific splitting approach is useful.

Note that this doesn't actually produce a legal result (the alwaysLegal
is a lie, as before), but it will cause a proper globalisel abort
instead of an infinite legalization loop.

Fixes llvm#81244.

(cherry picked from commit 070848c)
tstellar pushed a commit to tstellar/llvm-project that referenced this issue Feb 14, 2024
If we have something like G_TRUNC from v2s32 to v2s16, then lowering
this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from
v2s16 to v2s8 does not bring us any closer to legality. In fact, the
first part of that is a G_BUILD_VECTOR whose legalization will produce a
new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get
combined to the original, causing a legalization cycle.

Make the lowering condition more precise, by requiring that the original
vector is >128 bits, which is I believe the only case where this
specific splitting approach is useful.

Note that this doesn't actually produce a legal result (the alwaysLegal
is a lie, as before), but it will cause a proper globalisel abort
instead of an infinite legalization loop.

Fixes llvm#81244.

(cherry picked from commit 070848c)
tstellar pushed a commit to tstellar/llvm-project that referenced this issue Feb 14, 2024
If we have something like G_TRUNC from v2s32 to v2s16, then lowering
this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from
v2s16 to v2s8 does not bring us any closer to legality. In fact, the
first part of that is a G_BUILD_VECTOR whose legalization will produce a
new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get
combined to the original, causing a legalization cycle.

Make the lowering condition more precise, by requiring that the original
vector is >128 bits, which is I believe the only case where this
specific splitting approach is useful.

Note that this doesn't actually produce a legal result (the alwaysLegal
is a lie, as before), but it will cause a proper globalisel abort
instead of an infinite legalization loop.

Fixes llvm#81244.

(cherry picked from commit 070848c)
tstellar pushed a commit to tstellar/llvm-project that referenced this issue Feb 14, 2024
If we have something like G_TRUNC from v2s32 to v2s16, then lowering
this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from
v2s16 to v2s8 does not bring us any closer to legality. In fact, the
first part of that is a G_BUILD_VECTOR whose legalization will produce a
new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get
combined to the original, causing a legalization cycle.

Make the lowering condition more precise, by requiring that the original
vector is >128 bits, which is I believe the only case where this
specific splitting approach is useful.

Note that this doesn't actually produce a legal result (the alwaysLegal
is a lie, as before), but it will cause a proper globalisel abort
instead of an infinite legalization loop.

Fixes llvm#81244.

(cherry picked from commit 070848c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

3 participants