Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86][Inline] Skip inline asm in inlining target feature check #83820

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Mar 4, 2024

When inlining across functions with different target features, we perform roughly two checks:

  1. The caller features must be a superset of the callee features.
  2. Calls in the callee cannot use types where the target features would change the call ABI (e.g. by changing whether something is passed in a zmm or two ymm registers). The latter check is very crude right now.

The latter check currently also catches inline asm "calls". I believe that inline asm should be excluded from this check, as it is independent from the usual call ABI, and instead governed by the inline asm constraint string.

Fixes #67054.

When inlining across functions with different target features, we
perform roughly two checks:
 1. The caller features must be a superset of the callee features.
 2. Calls in the callee cannot use types where the target feeatures
    would change the call ABI (e.g. by changing whether something
    is passed in a zmm or two ymm registers). The latter check is
    very crude right now.

The latter check currently also catches inline asm "calls".
I believe that inline asm should be excluded from this check, as
it is independent from the usual call ABI, and instead governed by
the inline asm constraint string.
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 4, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-x86

Author: Nikita Popov (nikic)

Changes

When inlining across functions with different target features, we perform roughly two checks:

  1. The caller features must be a superset of the callee features.
  2. Calls in the callee cannot use types where the target feeatures would change the call ABI (e.g. by changing whether something is passed in a zmm or two ymm registers). The latter check is very crude right now.

The latter check currently also catches inline asm "calls". I believe that inline asm should be excluded from this check, as it is independent from the usual call ABI, and instead governed by the inline asm constraint string.

Fixes #67054.


Full diff: https://github.com/llvm/llvm-project/pull/83820.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+4)
  • (modified) llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll (+6-11)
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 18bf32fe1acaad..4cca291a245622 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -6087,6 +6087,10 @@ bool X86TTIImpl::areInlineCompatible(const Function *Caller,
 
   for (const Instruction &I : instructions(Callee)) {
     if (const auto *CB = dyn_cast<CallBase>(&I)) {
+      // Having more target features is fine for inline ASM.
+      if (CB->isInlineAsm())
+        continue;
+
       SmallVector<Type *, 8> Types;
       for (Value *Arg : CB->args())
         Types.push_back(Arg->getType());
diff --git a/llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll b/llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll
index f03270bafea999..6f582cab2f1452 100644
--- a/llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll
+++ b/llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll
@@ -94,27 +94,22 @@ define internal void @caller_not_avx4() {
 
 declare i64 @caller_unknown_simple(i64)
 
-; FIXME: This call should get inlined, because the callee only contains
+; This call should get inlined, because the callee only contains
 ; inline ASM, not real calls.
 define <8 x i64> @caller_inline_asm(ptr %p0, i64 %k, ptr %p1, ptr %p2) #0 {
 ; CHECK-LABEL: define {{[^@]+}}@caller_inline_asm
 ; CHECK-SAME: (ptr [[P0:%.*]], i64 [[K:%.*]], ptr [[P1:%.*]], ptr [[P2:%.*]]) #[[ATTR2:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call <8 x i64> @callee_inline_asm(ptr [[P0]], i64 [[K]], ptr [[P1]], ptr [[P2]])
-; CHECK-NEXT:    ret <8 x i64> [[CALL]]
+; CHECK-NEXT:    [[SRC_I:%.*]] = load <8 x i64>, ptr [[P0]], align 64
+; CHECK-NEXT:    [[A_I:%.*]] = load <8 x i64>, ptr [[P1]], align 64
+; CHECK-NEXT:    [[B_I:%.*]] = load <8 x i64>, ptr [[P2]], align 64
+; CHECK-NEXT:    [[TMP1:%.*]] = call <8 x i64> asm "vpaddb\09$($3, $2, $0 {$1}", "=v,^Yk,v,v,0,~{dirflag},~{fpsr},~{flags}"(i64 [[K]], <8 x i64> [[A_I]], <8 x i64> [[B_I]], <8 x i64> [[SRC_I]])
+; CHECK-NEXT:    ret <8 x i64> [[TMP1]]
 ;
   %call = call <8 x i64> @callee_inline_asm(ptr %p0, i64 %k, ptr %p1, ptr %p2)
   ret <8 x i64> %call
 }
 
 define internal <8 x i64> @callee_inline_asm(ptr %p0, i64 %k, ptr %p1, ptr %p2) #1 {
-; CHECK-LABEL: define {{[^@]+}}@callee_inline_asm
-; CHECK-SAME: (ptr [[P0:%.*]], i64 [[K:%.*]], ptr [[P1:%.*]], ptr [[P2:%.*]]) #[[ATTR3:[0-9]+]] {
-; CHECK-NEXT:    [[SRC:%.*]] = load <8 x i64>, ptr [[P0]], align 64
-; CHECK-NEXT:    [[A:%.*]] = load <8 x i64>, ptr [[P1]], align 64
-; CHECK-NEXT:    [[B:%.*]] = load <8 x i64>, ptr [[P2]], align 64
-; CHECK-NEXT:    [[TMP1:%.*]] = tail call <8 x i64> asm "vpaddb\09$($3, $2, $0 {$1}", "=v,^Yk,v,v,0,~{dirflag},~{fpsr},~{flags}"(i64 [[K]], <8 x i64> [[A]], <8 x i64> [[B]], <8 x i64> [[SRC]])
-; CHECK-NEXT:    ret <8 x i64> [[TMP1]]
-;
   %src = load <8 x i64>, ptr %p0, align 64
   %a = load <8 x i64>, ptr %p1, align 64
   %b = load <8 x i64>, ptr %p2, align 64

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can always drop the second check. I assume if a callee calls something unmatching its feature, e.g., 512-bit vector with AVX2. User would have got a warning about ABI change. Such user scenario is not ABI compatible already. We should not care it at all. Passing 512-bit vector with AVX512 feature is guaranteed not split in two ymm registers. We should not worry about it's changed by inlining.

Anyway, inline asm never has such oversized argument problem. So I'm good to start from it.

@KanRobert KanRobert self-requested a review March 4, 2024 13:34
@KanRobert
Copy link
Contributor

feeatures -> features

@nikic
Copy link
Contributor Author

nikic commented Mar 4, 2024

I think we can always drop the second check. I assume if a callee calls something unmatching its feature, e.g., 512-bit vector with AVX2. User would have got a warning about ABI change. Such user scenario is not ABI compatible already. We should not care it at all. Passing 512-bit vector with AVX512 feature is guaranteed not split in two ymm registers. We should not worry about it's changed by inlining.

The problem is that this doesn't just happen in user-written code, but also as a result of argument promotion for example. We do want vector arguments to be promoted, but after this has happened, inlining may no longer be safe (even if vector types are never passed by value in the original code).

@phoebewang
Copy link
Contributor

phoebewang commented Mar 4, 2024

I think we can always drop the second check. I assume if a callee calls something unmatching its feature, e.g., 512-bit vector with AVX2. User would have got a warning about ABI change. Such user scenario is not ABI compatible already. We should not care it at all. Passing 512-bit vector with AVX512 feature is guaranteed not split in two ymm registers. We should not worry about it's changed by inlining.

The problem is that this doesn't just happen in user-written code, but also as a result of argument promotion for example. We do want vector arguments to be promoted, but after this has happened, inlining may no longer be safe (even if vector types are never passed by value in the original code).

I think I have fixed the argument promotion issue by https://reviews.llvm.org/D123284
It is the only one that has the problem AFAIK.

@nikic
Copy link
Contributor Author

nikic commented Mar 4, 2024

I think we can always drop the second check. I assume if a callee calls something unmatching its feature, e.g., 512-bit vector with AVX2. User would have got a warning about ABI change. Such user scenario is not ABI compatible already. We should not care it at all. Passing 512-bit vector with AVX512 feature is guaranteed not split in two ymm registers. We should not worry about it's changed by inlining.

The problem is that this doesn't just happen in user-written code, but also as a result of argument promotion for example. We do want vector arguments to be promoted, but after this has happened, inlining may no longer be safe (even if vector types are never passed by value in the original code).

I think I have fixed the argument promotion issue by https://reviews.llvm.org/D123284 It is the only one that has the problem AFAIK.

I don't think it (fully) addresses the issue, at least because non-clang frontends do not use min-legal-vector-width.

@phoebewang
Copy link
Contributor

I think we can always drop the second check. I assume if a callee calls something unmatching its feature, e.g., 512-bit vector with AVX2. User would have got a warning about ABI change. Such user scenario is not ABI compatible already. We should not care it at all. Passing 512-bit vector with AVX512 feature is guaranteed not split in two ymm registers. We should not worry about it's changed by inlining.

The problem is that this doesn't just happen in user-written code, but also as a result of argument promotion for example. We do want vector arguments to be promoted, but after this has happened, inlining may no longer be safe (even if vector types are never passed by value in the original code).

I think I have fixed the argument promotion issue by https://reviews.llvm.org/D123284 It is the only one that has the problem AFAIK.

I don't think it (fully) addresses the issue, at least because non-clang frontends do not use min-legal-vector-width.

The design of min-legal-vector-width is safe to non-clang frontends. Without the attribute, backend assume it's MAX_INT.

@nikic nikic merged commit e84182a into llvm:main Mar 5, 2024
7 checks passed
@nikic nikic deleted the x86-inline-asm branch March 5, 2024 13:21
@nikic
Copy link
Contributor Author

nikic commented Mar 5, 2024

@phoebewang I just double checked to confirm that your patch does not address this issue. Here's a very simple example:

target triple = "x86_64-unknown-linux-gnu"

@g = external global i8
 
define void @test1(ptr %p) nounwind "target-features"="+avx" {
  call void @test2(ptr %p) 
  ret void
} 
 
define internal void @test2(ptr %p) {
  call void @test3(ptr %p)
  ret void
} 
 
define internal void @test3(ptr %p) nounwind noinline {
  %v = load <4 x i64>, ptr %p
  store <4 x i64> %v, ptr @g
  ret void 
}  

Note that this does not pass any vectors by value.

Now comment out the check in areInlineCompatible and run build/bin/opt -S -passes=argpromotion,inline | build/bin/llc and you will get the argument passed in ymm0 in test1 and accepted in xmm0 and xmm1 in test3.

@nikic nikic added this to the LLVM 18.X Release milestone Mar 5, 2024
@nikic
Copy link
Contributor Author

nikic commented Mar 5, 2024

/cherry-pick cad6ad2 e84182a

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Mar 5, 2024
…83820)

When inlining across functions with different target features, we
perform roughly two checks:
 1. The caller features must be a superset of the callee features.
2. Calls in the callee cannot use types where the target features would
change the call ABI (e.g. by changing whether something is passed in a
zmm or two ymm registers). The latter check is very crude right now.

The latter check currently also catches inline asm "calls". I believe
that inline asm should be excluded from this check, as it is independent
from the usual call ABI, and instead governed by the inline asm
constraint string.

Fixes llvm#67054.

(cherry picked from commit e84182a)
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 5, 2024

/pull-request #84029

@phoebewang
Copy link
Contributor

@phoebewang I just double checked to confirm that your patch does not address this issue. Here's a very simple example:

target triple = "x86_64-unknown-linux-gnu"

@g = external global i8
 
define void @test1(ptr %p) nounwind "target-features"="+avx" {
  call void @test2(ptr %p) 
  ret void
} 
 
define internal void @test2(ptr %p) {
  call void @test3(ptr %p)
  ret void
} 
 
define internal void @test3(ptr %p) nounwind noinline {
  %v = load <4 x i64>, ptr %p
  store <4 x i64> %v, ptr @g
  ret void 
}  

Note that this does not pass any vectors by value.

Now comment out the check in areInlineCompatible and run build/bin/opt -S -passes=argpromotion,inline | build/bin/llc and you will get the argument passed in ymm0 in test1 and accepted in xmm0 and xmm1 in test3.

I didn't get ymm0 in test1:
https://godbolt.org/z/4jY9vqeMa
https://godbolt.org/z/h76jW5a9W

The argument is passed by pointer between test1 and test2, because in areTypesABICompatible, we have made sure of caller and callee have identical features. https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h#L844

The only hole was min-legal-vector-width which had been fixed.

@phoebewang
Copy link
Contributor

Sorry, I missed the condition areInlineCompatible. Let me check it again.

@phoebewang
Copy link
Contributor

@nikic how about d465850?

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Mar 13, 2024
…83820)

When inlining across functions with different target features, we
perform roughly two checks:
 1. The caller features must be a superset of the callee features.
2. Calls in the callee cannot use types where the target features would
change the call ABI (e.g. by changing whether something is passed in a
zmm or two ymm registers). The latter check is very crude right now.

The latter check currently also catches inline asm "calls". I believe
that inline asm should be excluded from this check, as it is independent
from the usual call ABI, and instead governed by the inline asm
constraint string.

Fixes llvm#67054.

(cherry picked from commit e84182a)
@pointhex pointhex mentioned this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

X86 rejects inlining of target-feature-wise compatible functions
4 participants