[X86][Inline] Skip inline asm in inlining target feature check #83820

nikic · 2024-03-04T11:00:42Z

When inlining across functions with different target features, we perform roughly two checks:

The caller features must be a superset of the callee features.
Calls in the callee cannot use types where the target features would change the call ABI (e.g. by changing whether something is passed in a zmm or two ymm registers). The latter check is very crude right now.

The latter check currently also catches inline asm "calls". I believe that inline asm should be excluded from this check, as it is independent from the usual call ABI, and instead governed by the inline asm constraint string.

Fixes #67054.

When inlining across functions with different target features, we perform roughly two checks: 1. The caller features must be a superset of the callee features. 2. Calls in the callee cannot use types where the target feeatures would change the call ABI (e.g. by changing whether something is passed in a zmm or two ymm registers). The latter check is very crude right now. The latter check currently also catches inline asm "calls". I believe that inline asm should be excluded from this check, as it is independent from the usual call ABI, and instead governed by the inline asm constraint string.

llvmbot · 2024-03-04T11:01:10Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-x86

Author: Nikita Popov (nikic)

Changes

When inlining across functions with different target features, we perform roughly two checks:

The caller features must be a superset of the callee features.
Calls in the callee cannot use types where the target feeatures would change the call ABI (e.g. by changing whether something is passed in a zmm or two ymm registers). The latter check is very crude right now.

The latter check currently also catches inline asm "calls". I believe that inline asm should be excluded from this check, as it is independent from the usual call ABI, and instead governed by the inline asm constraint string.

Fixes #67054.

Full diff: https://github.com/llvm/llvm-project/pull/83820.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+4)
(modified) llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll (+6-11)

diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 18bf32fe1acaad..4cca291a245622 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -6087,6 +6087,10 @@ bool X86TTIImpl::areInlineCompatible(const Function *Caller,
 
   for (const Instruction &I : instructions(Callee)) {
     if (const auto *CB = dyn_cast<CallBase>(&I)) {
+      // Having more target features is fine for inline ASM.
+      if (CB->isInlineAsm())
+        continue;
+
       SmallVector<Type *, 8> Types;
       for (Value *Arg : CB->args())
         Types.push_back(Arg->getType());
diff --git a/llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll b/llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll
index f03270bafea999..6f582cab2f1452 100644
--- a/llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll
+++ b/llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll
@@ -94,27 +94,22 @@ define internal void @caller_not_avx4() {
 
 declare i64 @caller_unknown_simple(i64)
 
-; FIXME: This call should get inlined, because the callee only contains
+; This call should get inlined, because the callee only contains
 ; inline ASM, not real calls.
 define <8 x i64> @caller_inline_asm(ptr %p0, i64 %k, ptr %p1, ptr %p2) #0 {
 ; CHECK-LABEL: define {{[^@]+}}@caller_inline_asm
 ; CHECK-SAME: (ptr [[P0:%.*]], i64 [[K:%.*]], ptr [[P1:%.*]], ptr [[P2:%.*]]) #[[ATTR2:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call <8 x i64> @callee_inline_asm(ptr [[P0]], i64 [[K]], ptr [[P1]], ptr [[P2]])
-; CHECK-NEXT:    ret <8 x i64> [[CALL]]
+; CHECK-NEXT:    [[SRC_I:%.*]] = load <8 x i64>, ptr [[P0]], align 64
+; CHECK-NEXT:    [[A_I:%.*]] = load <8 x i64>, ptr [[P1]], align 64
+; CHECK-NEXT:    [[B_I:%.*]] = load <8 x i64>, ptr [[P2]], align 64
+; CHECK-NEXT:    [[TMP1:%.*]] = call <8 x i64> asm "vpaddb\09$($3, $2, $0 {$1}", "=v,^Yk,v,v,0,~{dirflag},~{fpsr},~{flags}"(i64 [[K]], <8 x i64> [[A_I]], <8 x i64> [[B_I]], <8 x i64> [[SRC_I]])
+; CHECK-NEXT:    ret <8 x i64> [[TMP1]]
 ;
   %call = call <8 x i64> @callee_inline_asm(ptr %p0, i64 %k, ptr %p1, ptr %p2)
   ret <8 x i64> %call
 }
 
 define internal <8 x i64> @callee_inline_asm(ptr %p0, i64 %k, ptr %p1, ptr %p2) #1 {
-; CHECK-LABEL: define {{[^@]+}}@callee_inline_asm
-; CHECK-SAME: (ptr [[P0:%.*]], i64 [[K:%.*]], ptr [[P1:%.*]], ptr [[P2:%.*]]) #[[ATTR3:[0-9]+]] {
-; CHECK-NEXT:    [[SRC:%.*]] = load <8 x i64>, ptr [[P0]], align 64
-; CHECK-NEXT:    [[A:%.*]] = load <8 x i64>, ptr [[P1]], align 64
-; CHECK-NEXT:    [[B:%.*]] = load <8 x i64>, ptr [[P2]], align 64
-; CHECK-NEXT:    [[TMP1:%.*]] = tail call <8 x i64> asm "vpaddb\09$($3, $2, $0 {$1}", "=v,^Yk,v,v,0,~{dirflag},~{fpsr},~{flags}"(i64 [[K]], <8 x i64> [[A]], <8 x i64> [[B]], <8 x i64> [[SRC]])
-; CHECK-NEXT:    ret <8 x i64> [[TMP1]]
-;
   %src = load <8 x i64>, ptr %p0, align 64
   %a = load <8 x i64>, ptr %p1, align 64
   %b = load <8 x i64>, ptr %p2, align 64

phoebewang

I think we can always drop the second check. I assume if a callee calls something unmatching its feature, e.g., 512-bit vector with AVX2. User would have got a warning about ABI change. Such user scenario is not ABI compatible already. We should not care it at all. Passing 512-bit vector with AVX512 feature is guaranteed not split in two ymm registers. We should not worry about it's changed by inlining.

Anyway, inline asm never has such oversized argument problem. So I'm good to start from it.

KanRobert · 2024-03-04T13:34:33Z

feeatures -> features

nikic · 2024-03-04T13:35:48Z

I think we can always drop the second check. I assume if a callee calls something unmatching its feature, e.g., 512-bit vector with AVX2. User would have got a warning about ABI change. Such user scenario is not ABI compatible already. We should not care it at all. Passing 512-bit vector with AVX512 feature is guaranteed not split in two ymm registers. We should not worry about it's changed by inlining.

The problem is that this doesn't just happen in user-written code, but also as a result of argument promotion for example. We do want vector arguments to be promoted, but after this has happened, inlining may no longer be safe (even if vector types are never passed by value in the original code).

phoebewang · 2024-03-04T13:43:41Z

I think we can always drop the second check. I assume if a callee calls something unmatching its feature, e.g., 512-bit vector with AVX2. User would have got a warning about ABI change. Such user scenario is not ABI compatible already. We should not care it at all. Passing 512-bit vector with AVX512 feature is guaranteed not split in two ymm registers. We should not worry about it's changed by inlining.

The problem is that this doesn't just happen in user-written code, but also as a result of argument promotion for example. We do want vector arguments to be promoted, but after this has happened, inlining may no longer be safe (even if vector types are never passed by value in the original code).

I think I have fixed the argument promotion issue by https://reviews.llvm.org/D123284
It is the only one that has the problem AFAIK.

nikic · 2024-03-04T13:53:49Z

I think we can always drop the second check. I assume if a callee calls something unmatching its feature, e.g., 512-bit vector with AVX2. User would have got a warning about ABI change. Such user scenario is not ABI compatible already. We should not care it at all. Passing 512-bit vector with AVX512 feature is guaranteed not split in two ymm registers. We should not worry about it's changed by inlining.

The problem is that this doesn't just happen in user-written code, but also as a result of argument promotion for example. We do want vector arguments to be promoted, but after this has happened, inlining may no longer be safe (even if vector types are never passed by value in the original code).

I think I have fixed the argument promotion issue by https://reviews.llvm.org/D123284 It is the only one that has the problem AFAIK.

I don't think it (fully) addresses the issue, at least because non-clang frontends do not use min-legal-vector-width.

phoebewang · 2024-03-04T14:06:02Z

I think we can always drop the second check. I assume if a callee calls something unmatching its feature, e.g., 512-bit vector with AVX2. User would have got a warning about ABI change. Such user scenario is not ABI compatible already. We should not care it at all. Passing 512-bit vector with AVX512 feature is guaranteed not split in two ymm registers. We should not worry about it's changed by inlining.

The problem is that this doesn't just happen in user-written code, but also as a result of argument promotion for example. We do want vector arguments to be promoted, but after this has happened, inlining may no longer be safe (even if vector types are never passed by value in the original code).

I think I have fixed the argument promotion issue by https://reviews.llvm.org/D123284 It is the only one that has the problem AFAIK.

I don't think it (fully) addresses the issue, at least because non-clang frontends do not use min-legal-vector-width.

The design of min-legal-vector-width is safe to non-clang frontends. Without the attribute, backend assume it's MAX_INT.

nikic · 2024-03-05T15:16:54Z

@phoebewang I just double checked to confirm that your patch does not address this issue. Here's a very simple example:

target triple = "x86_64-unknown-linux-gnu"

@g = external global i8
 
define void @test1(ptr %p) nounwind "target-features"="+avx" {
  call void @test2(ptr %p) 
  ret void
} 
 
define internal void @test2(ptr %p) {
  call void @test3(ptr %p)
  ret void
} 
 
define internal void @test3(ptr %p) nounwind noinline {
  %v = load <4 x i64>, ptr %p
  store <4 x i64> %v, ptr @g
  ret void 
}

Note that this does not pass any vectors by value.

Now comment out the check in areInlineCompatible and run build/bin/opt -S -passes=argpromotion,inline | build/bin/llc and you will get the argument passed in ymm0 in test1 and accepted in xmm0 and xmm1 in test3.

nikic · 2024-03-05T15:17:52Z

/cherry-pick cad6ad2 e84182a

…83820) When inlining across functions with different target features, we perform roughly two checks: 1. The caller features must be a superset of the callee features. 2. Calls in the callee cannot use types where the target features would change the call ABI (e.g. by changing whether something is passed in a zmm or two ymm registers). The latter check is very crude right now. The latter check currently also catches inline asm "calls". I believe that inline asm should be excluded from this check, as it is independent from the usual call ABI, and instead governed by the inline asm constraint string. Fixes llvm#67054. (cherry picked from commit e84182a)

llvmbot · 2024-03-05T15:24:05Z

/pull-request #84029

phoebewang · 2024-03-06T00:37:39Z

@phoebewang I just double checked to confirm that your patch does not address this issue. Here's a very simple example:
target triple = "x86_64-unknown-linux-gnu"

@g = external global i8
 
define void @test1(ptr %p) nounwind "target-features"="+avx" {
  call void @test2(ptr %p) 
  ret void
} 
 
define internal void @test2(ptr %p) {
  call void @test3(ptr %p)
  ret void
} 
 
define internal void @test3(ptr %p) nounwind noinline {
  %v = load <4 x i64>, ptr %p
  store <4 x i64> %v, ptr @g
  ret void 
}  
Note that this does not pass any vectors by value.

Now comment out the check in areInlineCompatible and run build/bin/opt -S -passes=argpromotion,inline | build/bin/llc and you will get the argument passed in ymm0 in test1 and accepted in xmm0 and xmm1 in test3.

I didn't get ymm0 in test1:
https://godbolt.org/z/4jY9vqeMa
https://godbolt.org/z/h76jW5a9W

The argument is passed by pointer between test1 and test2, because in areTypesABICompatible, we have made sure of caller and callee have identical features. https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h#L844

The only hole was min-legal-vector-width which had been fixed.

phoebewang · 2024-03-06T00:51:11Z

Sorry, I missed the condition areInlineCompatible. Let me check it again.

phoebewang · 2024-03-06T02:42:12Z

@nikic how about d465850?

…83820) When inlining across functions with different target features, we perform roughly two checks: 1. The caller features must be a superset of the callee features. 2. Calls in the callee cannot use types where the target features would change the call ABI (e.g. by changing whether something is passed in a zmm or two ymm registers). The latter check is very crude right now. The latter check currently also catches inline asm "calls". I believe that inline asm should be excluded from this check, as it is independent from the usual call ABI, and instead governed by the inline asm constraint string. Fixes llvm#67054. (cherry picked from commit e84182a)

nikic requested review from phoebewang, topperc and kazutakahirata March 4, 2024 11:00

llvmbot added backend:X86 llvm:transforms labels Mar 4, 2024

phoebewang approved these changes Mar 4, 2024

View reviewed changes

KanRobert self-requested a review March 4, 2024 13:34

nikic merged commit e84182a into llvm:main Mar 5, 2024
7 checks passed

nikic deleted the x86-inline-asm branch March 5, 2024 13:21

nikic added this to the LLVM 18.X Release milestone Mar 5, 2024

bjacob mentioned this pull request Mar 6, 2024

Missing CPU features attributes on dispatch functions lead to UB / missed target instructions iree-org/iree#16670

Open

pointhex mentioned this pull request May 7, 2024

getStyleDiagHandler #91314

Closed

aemerson mentioned this pull request May 9, 2024

release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge #91672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[X86][Inline] Skip inline asm in inlining target feature check #83820

[X86][Inline] Skip inline asm in inlining target feature check #83820

nikic commented Mar 4, 2024 •

edited

Loading

llvmbot commented Mar 4, 2024 •

edited

Loading

phoebewang left a comment

KanRobert commented Mar 4, 2024

nikic commented Mar 4, 2024

phoebewang commented Mar 4, 2024 •

edited

Loading

nikic commented Mar 4, 2024 •

edited

Loading

phoebewang commented Mar 4, 2024

nikic commented Mar 5, 2024 •

edited

Loading

nikic commented Mar 5, 2024

llvmbot commented Mar 5, 2024

phoebewang commented Mar 6, 2024

phoebewang commented Mar 6, 2024

phoebewang commented Mar 6, 2024

[X86][Inline] Skip inline asm in inlining target feature check #83820

[X86][Inline] Skip inline asm in inlining target feature check #83820

Conversation

nikic commented Mar 4, 2024 • edited Loading

llvmbot commented Mar 4, 2024 • edited Loading

phoebewang left a comment

Choose a reason for hiding this comment

KanRobert commented Mar 4, 2024

nikic commented Mar 4, 2024

phoebewang commented Mar 4, 2024 • edited Loading

nikic commented Mar 4, 2024 • edited Loading

phoebewang commented Mar 4, 2024

nikic commented Mar 5, 2024 • edited Loading

nikic commented Mar 5, 2024

llvmbot commented Mar 5, 2024

phoebewang commented Mar 6, 2024

phoebewang commented Mar 6, 2024

phoebewang commented Mar 6, 2024

nikic commented Mar 4, 2024 •

edited

Loading

llvmbot commented Mar 4, 2024 •

edited

Loading

phoebewang commented Mar 4, 2024 •

edited

Loading

nikic commented Mar 4, 2024 •

edited

Loading

nikic commented Mar 5, 2024 •

edited

Loading