Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM] Armv8-R does not require fp64 or neon. #88287

Merged
merged 1 commit into from
May 7, 2024

Conversation

chrisnc
Copy link
Contributor

@chrisnc chrisnc commented Apr 10, 2024

This was addressed for AArch64 here, but the same applies to ARM.

Move the enablement of neon+fp64 to -mcpu=cortex-r52, which optionally supports these features.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:ARM llvm:analysis labels Apr 10, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 10, 2024

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-arm

Author: Chris Copeland (chrisnc)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/88287.diff

14 Files Affected:

  • (modified) clang/test/Preprocessor/arm-target-features.c (+1-1)
  • (modified) llvm/include/llvm/TargetParser/ARMTargetParser.def (+1-1)
  • (modified) llvm/lib/Target/ARM/ARM.td (+3-3)
  • (modified) llvm/test/Analysis/CostModel/ARM/arith.ll (+1-1)
  • (modified) llvm/test/Analysis/CostModel/ARM/cast.ll (+2-2)
  • (modified) llvm/test/Analysis/CostModel/ARM/cast_ldst.ll (+2-2)
  • (modified) llvm/test/Analysis/CostModel/ARM/cmps.ll (+2-2)
  • (modified) llvm/test/Analysis/CostModel/ARM/divrem.ll (+1-1)
  • (modified) llvm/test/CodeGen/ARM/build-attributes.ll (+2-2)
  • (modified) llvm/test/CodeGen/ARM/fpconv.ll (+2-2)
  • (modified) llvm/test/CodeGen/ARM/half.ll (+2-2)
  • (modified) llvm/test/CodeGen/ARM/pr42638-VMOVRRDCombine.ll (+1-1)
  • (modified) llvm/test/CodeGen/ARM/useaa.ll (+2-2)
  • (modified) llvm/unittests/TargetParser/TargetParserTest.cpp (+1-1)
diff --git a/clang/test/Preprocessor/arm-target-features.c b/clang/test/Preprocessor/arm-target-features.c
index 236c9f2479b705..5805e9e0bf0b50 100644
--- a/clang/test/Preprocessor/arm-target-features.c
+++ b/clang/test/Preprocessor/arm-target-features.c
@@ -96,7 +96,7 @@
 // CHECK-V8R-ALLOW-FP-INSTR: #define __ARM_FEATURE_CRC32 1
 // CHECK-V8R-ALLOW-FP-INSTR: #define __ARM_FEATURE_DIRECTED_ROUNDING 1
 // CHECK-V8R-ALLOW-FP-INSTR: #define __ARM_FEATURE_NUMERIC_MAXMIN 1
-// CHECK-V8R-ALLOW-FP-INSTR: #define __ARM_FP 0xe
+// CHECK-V8R-ALLOW-FP-INSTR: #define __ARM_FP 0x4
 
 // RUN: %clang -target armv7a-none-linux-gnu -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-V7 %s
 // CHECK-V7: #define __ARMEL__ 1
diff --git a/llvm/include/llvm/TargetParser/ARMTargetParser.def b/llvm/include/llvm/TargetParser/ARMTargetParser.def
index b821d224d7a82c..dee0f383b4b234 100644
--- a/llvm/include/llvm/TargetParser/ARMTargetParser.def
+++ b/llvm/include/llvm/TargetParser/ARMTargetParser.def
@@ -329,7 +329,7 @@ ARM_CPU_NAME("cortex-r7", ARMV7R, FK_VFPV3_D16_FP16, false,
              (ARM::AEK_MP | ARM::AEK_HWDIVARM))
 ARM_CPU_NAME("cortex-r8", ARMV7R, FK_VFPV3_D16_FP16, false,
              (ARM::AEK_MP | ARM::AEK_HWDIVARM))
-ARM_CPU_NAME("cortex-r52", ARMV8R, FK_NEON_FP_ARMV8, true, ARM::AEK_NONE)
+ARM_CPU_NAME("cortex-r52", ARMV8R, FK_FPV5_SP_D16, true, ARM::AEK_NONE)
 ARM_CPU_NAME("sc300", ARMV7M, FK_NONE, false, ARM::AEK_NONE)
 ARM_CPU_NAME("cortex-m3", ARMV7M, FK_NONE, true, ARM::AEK_NONE)
 ARM_CPU_NAME("cortex-m4", ARMV7EM, FK_FPV4_SP_D16, true, ARM::AEK_NONE)
diff --git a/llvm/lib/Target/ARM/ARM.td b/llvm/lib/Target/ARM/ARM.td
index 66596dbda83c95..6ee8c0540e6488 100644
--- a/llvm/lib/Target/ARM/ARM.td
+++ b/llvm/lib/Target/ARM/ARM.td
@@ -1167,9 +1167,7 @@ def ARMv8r    : Architecture<"armv8-r",   "ARMv8r",   [HasV8Ops,
                                                        FeatureDSP,
                                                        FeatureCRC,
                                                        FeatureMP,
-                                                       FeatureVirtualization,
-                                                       FeatureFPARMv8,
-                                                       FeatureNEON]>;
+                                                       FeatureVirtualization]>;
 
 def ARMv8mBaseline : Architecture<"armv8-m.base", "ARMv8mBaseline",
                                                       [HasV8MBaselineOps,
@@ -1726,6 +1724,8 @@ def : ProcNoItin<"kryo",                                [ARMv8a, ProcKryo,
                                                          FeatureCRC]>;
 
 def : ProcessorModel<"cortex-r52", CortexR52Model,      [ARMv8r, ProcR52,
+                                                         FeatureFPARMv8_D16_SP,
+                                                         FeatureFP16,
                                                          FeatureUseMISched,
                                                          FeatureFPAO]>;
 
diff --git a/llvm/test/Analysis/CostModel/ARM/arith.ll b/llvm/test/Analysis/CostModel/ARM/arith.ll
index 3a137a5af36664..8f173596c3b9a0 100644
--- a/llvm/test/Analysis/CostModel/ARM/arith.ll
+++ b/llvm/test/Analysis/CostModel/ARM/arith.ll
@@ -4,7 +4,7 @@
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve,+mve4beat < %s | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-MVE4
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R
 ; RUN: opt -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -mtriple=thumbv8.1m.main -mattr=+mve < %s | FileCheck %s --check-prefix=CHECK-MVE-SIZE
 
 target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
diff --git a/llvm/test/Analysis/CostModel/ARM/cast.ll b/llvm/test/Analysis/CostModel/ARM/cast.ll
index 60addd3077ed14..ae0d2347ec8be0 100644
--- a/llvm/test/Analysis/CostModel/ARM/cast.ll
+++ b/llvm/test/Analysis/CostModel/ARM/cast.ll
@@ -3,11 +3,11 @@
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE-RECIP
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R-RECIP
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE-SIZE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN-SIZE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE-SIZE
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R-SIZE
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R-SIZE
 
 target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
 
diff --git a/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll b/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll
index db700eb3baeefe..4a2f9a25dc152f 100644
--- a/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll
+++ b/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll
@@ -3,11 +3,11 @@
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE-RECIP
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R-RECIP
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE-SIZE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN-SIZE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE-SIZE
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R-SIZE
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R-SIZE
 
 target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
 
diff --git a/llvm/test/Analysis/CostModel/ARM/cmps.ll b/llvm/test/Analysis/CostModel/ARM/cmps.ll
index 7f89f521e77cbc..184b7076d02bea 100644
--- a/llvm/test/Analysis/CostModel/ARM/cmps.ll
+++ b/llvm/test/Analysis/CostModel/ARM/cmps.ll
@@ -2,11 +2,11 @@
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE-RECIP
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R-RECIP
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE-SIZE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN-SIZE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE-SIZE
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R-SIZE
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R-SIZE
 
 target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
 
diff --git a/llvm/test/Analysis/CostModel/ARM/divrem.ll b/llvm/test/Analysis/CostModel/ARM/divrem.ll
index b582a61c2a0fc8..9aba39327f6601 100644
--- a/llvm/test/Analysis/CostModel/ARM/divrem.ll
+++ b/llvm/test/Analysis/CostModel/ARM/divrem.ll
@@ -3,7 +3,7 @@
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8,+fp16 < %s | FileCheck %s --check-prefix=CHECK-V8R
 
 target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
 
diff --git a/llvm/test/CodeGen/ARM/build-attributes.ll b/llvm/test/CodeGen/ARM/build-attributes.ll
index e8c83b92f3f947..2812bf81acf98c 100644
--- a/llvm/test/CodeGen/ARM/build-attributes.ll
+++ b/llvm/test/CodeGen/ARM/build-attributes.ll
@@ -221,8 +221,8 @@
 
 ; ARMv8-R
 ; RUN: llc < %s -mtriple=arm-none-none-eabi -mcpu=cortex-r52 -mattr=-vfp2sp,-fp16 | FileCheck %s --check-prefix=ARMv8R --check-prefix=ARMv8R-NOFPU
-; RUN: llc < %s -mtriple=arm-none-none-eabi -mcpu=cortex-r52 -mattr=-neon,-fp64,-d32 | FileCheck %s --check-prefix=ARMv8R --check-prefix=ARMv8R-SP
-; RUN: llc < %s -mtriple=arm-none-none-eabi -mcpu=cortex-r52 | FileCheck %s --check-prefix=ARMv8R --check-prefix=ARMv8R-NEON
+; RUN: llc < %s -mtriple=arm-none-none-eabi -mcpu=cortex-r52 | FileCheck %s --check-prefix=ARMv8R --check-prefix=ARMv8R-SP
+; RUN: llc < %s -mtriple=arm-none-none-eabi -mcpu=cortex-r52 -mattr=+neon,+fp-armv8 | FileCheck %s --check-prefix=ARMv8R --check-prefix=ARMv8R-NEON
 
 ; ARMv8-M
 ; RUN: llc < %s -mtriple=thumbv8-none-none-eabi -mcpu=cortex-m23 | FileCheck %s --check-prefix=STRICT-ALIGN
diff --git a/llvm/test/CodeGen/ARM/fpconv.ll b/llvm/test/CodeGen/ARM/fpconv.ll
index 929da5f18c813e..5dfc2a6a4629b1 100644
--- a/llvm/test/CodeGen/ARM/fpconv.ll
+++ b/llvm/test/CodeGen/ARM/fpconv.ll
@@ -1,7 +1,7 @@
 ; RUN: llc -mtriple=arm-eabi -mattr=+vfp2 %s -o - | FileCheck %s --check-prefix=CHECK-VFP
 ; RUN: llc -mtriple=arm-apple-darwin %s -o - | FileCheck %s
-; RUN: llc -mtriple=armv8r-none-none-eabi %s -o - | FileCheck %s --check-prefix=CHECK-VFP
-; RUN: llc -mtriple=armv8r-none-none-eabi -mattr=-fp64 %s -o - | FileCheck %s --check-prefix=CHECK-VFP-SP
+; RUN: llc -mtriple=armv8r-none-none-eabi -mattr=+neon,+fp-armv8 %s -o - | FileCheck %s --check-prefix=CHECK-VFP
+; RUN: llc -mtriple=armv8r-none-none-eabi -mattr=+fp-armv8,-fp64,-d32 %s -o - | FileCheck %s --check-prefix=CHECK-VFP-SP
 
 define float @f1(double %x) {
 ;CHECK-VFP-LABEL: f1:
diff --git a/llvm/test/CodeGen/ARM/half.ll b/llvm/test/CodeGen/ARM/half.ll
index 9b53dc77f22739..050b400539b4dd 100644
--- a/llvm/test/CodeGen/ARM/half.ll
+++ b/llvm/test/CodeGen/ARM/half.ll
@@ -1,8 +1,8 @@
 ; RUN: llc < %s -mtriple=thumbv7-apple-ios7.0 | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-OLD
 ; RUN: llc < %s -mtriple=thumbv7s-apple-ios7.0 | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-F16
 ; RUN: llc < %s -mtriple=thumbv8-apple-ios7.0 | FileCheck %s --check-prefix=CHECK  --check-prefix=CHECK-V8
-; RUN: llc < %s -mtriple=armv8r-none-none-eabi | FileCheck %s --check-prefix=CHECK  --check-prefix=CHECK-V8
-; RUN: llc < %s -mtriple=armv8r-none-none-eabi -mattr=-fp64 | FileCheck %s --check-prefix=CHECK  --check-prefix=CHECK-V8-SP
+; RUN: llc < %s -mtriple=armv8r-none-none-eabi -mattr=+fp-armv8,+fp16 | FileCheck %s --check-prefix=CHECK  --check-prefix=CHECK-V8
+; RUN: llc < %s -mtriple=armv8r-none-none-eabi -mattr=+fp-armv8,+fp16,-fp64 | FileCheck %s --check-prefix=CHECK  --check-prefix=CHECK-V8-SP
 ; RUN: llc < %s -mtriple=armv8.1m-none-none-eabi -mattr=+fp-armv8 | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-V8
 ; RUN: llc < %s -mtriple=armv8.1m-none-none-eabi -mattr=+fp-armv8,-fp64 | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-V8-SP
 ; RUN: llc < %s -mtriple=armv8.1m-none-none-eabi -mattr=+mve.fp,+fp64 | FileCheck %s --check-prefix=CHECK-V8
diff --git a/llvm/test/CodeGen/ARM/pr42638-VMOVRRDCombine.ll b/llvm/test/CodeGen/ARM/pr42638-VMOVRRDCombine.ll
index 8ee88fd18f3c81..245780e6836ea2 100644
--- a/llvm/test/CodeGen/ARM/pr42638-VMOVRRDCombine.ll
+++ b/llvm/test/CodeGen/ARM/pr42638-VMOVRRDCombine.ll
@@ -16,7 +16,7 @@ entry:
 
 declare void @llvm.va_start(ptr)
 
-attributes #0 = { "target-cpu"="cortex-r52" "target-features"="-fp64"  }
+attributes #0 = { "target-cpu"="cortex-r52" }
 
 ; Ensures that the machine scheduler does not move accessing the upper
 ; 32 bits of the double to before actually storing it to memory
diff --git a/llvm/test/CodeGen/ARM/useaa.ll b/llvm/test/CodeGen/ARM/useaa.ll
index f8207a1056e3b4..bbf2bb043d3f31 100644
--- a/llvm/test/CodeGen/ARM/useaa.ll
+++ b/llvm/test/CodeGen/ARM/useaa.ll
@@ -1,7 +1,7 @@
-; RUN: llc < %s -mtriple=armv8r-eabi -mcpu=cortex-r52 | FileCheck %s --check-prefix=CHECK --check-prefix=USEAA
+; RUN: llc < %s -mtriple=armv8r-eabi -mcpu=cortex-r52 -mattr=+neon,+fp-armv8 | FileCheck %s --check-prefix=CHECK --check-prefix=USEAA
 ; RUN: llc < %s -mtriple=armv7m-eabi -mcpu=cortex-m4 | FileCheck %s --check-prefix=CHECK --check-prefix=USEAA
 ; RUN: llc < %s -mtriple=armv8m-eabi -mcpu=cortex-m33 | FileCheck %s --check-prefix=CHECK --check-prefix=USEAA
-; RUN: llc < %s -mtriple=armv8r-eabi -mcpu=generic | FileCheck %s --check-prefix=CHECK --check-prefix=GENERIC
+; RUN: llc < %s -mtriple=armv8r-eabi -mcpu=generic -mattr=+neon,+fp-armv8 | FileCheck %s --check-prefix=CHECK --check-prefix=GENERIC
 
 ; Check we use AA during codegen, so can interleave these loads/stores.
 
diff --git a/llvm/unittests/TargetParser/TargetParserTest.cpp b/llvm/unittests/TargetParser/TargetParserTest.cpp
index 2c72a7229b5274..82c5c0d6c12175 100644
--- a/llvm/unittests/TargetParser/TargetParserTest.cpp
+++ b/llvm/unittests/TargetParser/TargetParserTest.cpp
@@ -351,7 +351,7 @@ INSTANTIATE_TEST_SUITE_P(
                                    ARM::AEK_MP | ARM::AEK_HWDIVARM |
                                        ARM::AEK_HWDIVTHUMB | ARM::AEK_DSP,
                                    "7-R"),
-        ARMCPUTestParams<uint64_t>("cortex-r52", "armv8-r", "neon-fp-armv8",
+        ARMCPUTestParams<uint64_t>("cortex-r52", "armv8-r", "fpv5-sp-d16",
                                    ARM::AEK_NONE | ARM::AEK_CRC | ARM::AEK_MP |
                                        ARM::AEK_VIRT | ARM::AEK_HWDIVARM |
                                        ARM::AEK_HWDIVTHUMB | ARM::AEK_DSP,

@llvmbot
Copy link
Member

llvmbot commented Apr 10, 2024

@llvm/pr-subscribers-clang

Author: Chris Copeland (chrisnc)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/88287.diff

14 Files Affected:

  • (modified) clang/test/Preprocessor/arm-target-features.c (+1-1)
  • (modified) llvm/include/llvm/TargetParser/ARMTargetParser.def (+1-1)
  • (modified) llvm/lib/Target/ARM/ARM.td (+3-3)
  • (modified) llvm/test/Analysis/CostModel/ARM/arith.ll (+1-1)
  • (modified) llvm/test/Analysis/CostModel/ARM/cast.ll (+2-2)
  • (modified) llvm/test/Analysis/CostModel/ARM/cast_ldst.ll (+2-2)
  • (modified) llvm/test/Analysis/CostModel/ARM/cmps.ll (+2-2)
  • (modified) llvm/test/Analysis/CostModel/ARM/divrem.ll (+1-1)
  • (modified) llvm/test/CodeGen/ARM/build-attributes.ll (+2-2)
  • (modified) llvm/test/CodeGen/ARM/fpconv.ll (+2-2)
  • (modified) llvm/test/CodeGen/ARM/half.ll (+2-2)
  • (modified) llvm/test/CodeGen/ARM/pr42638-VMOVRRDCombine.ll (+1-1)
  • (modified) llvm/test/CodeGen/ARM/useaa.ll (+2-2)
  • (modified) llvm/unittests/TargetParser/TargetParserTest.cpp (+1-1)
diff --git a/clang/test/Preprocessor/arm-target-features.c b/clang/test/Preprocessor/arm-target-features.c
index 236c9f2479b705..5805e9e0bf0b50 100644
--- a/clang/test/Preprocessor/arm-target-features.c
+++ b/clang/test/Preprocessor/arm-target-features.c
@@ -96,7 +96,7 @@
 // CHECK-V8R-ALLOW-FP-INSTR: #define __ARM_FEATURE_CRC32 1
 // CHECK-V8R-ALLOW-FP-INSTR: #define __ARM_FEATURE_DIRECTED_ROUNDING 1
 // CHECK-V8R-ALLOW-FP-INSTR: #define __ARM_FEATURE_NUMERIC_MAXMIN 1
-// CHECK-V8R-ALLOW-FP-INSTR: #define __ARM_FP 0xe
+// CHECK-V8R-ALLOW-FP-INSTR: #define __ARM_FP 0x4
 
 // RUN: %clang -target armv7a-none-linux-gnu -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-V7 %s
 // CHECK-V7: #define __ARMEL__ 1
diff --git a/llvm/include/llvm/TargetParser/ARMTargetParser.def b/llvm/include/llvm/TargetParser/ARMTargetParser.def
index b821d224d7a82c..dee0f383b4b234 100644
--- a/llvm/include/llvm/TargetParser/ARMTargetParser.def
+++ b/llvm/include/llvm/TargetParser/ARMTargetParser.def
@@ -329,7 +329,7 @@ ARM_CPU_NAME("cortex-r7", ARMV7R, FK_VFPV3_D16_FP16, false,
              (ARM::AEK_MP | ARM::AEK_HWDIVARM))
 ARM_CPU_NAME("cortex-r8", ARMV7R, FK_VFPV3_D16_FP16, false,
              (ARM::AEK_MP | ARM::AEK_HWDIVARM))
-ARM_CPU_NAME("cortex-r52", ARMV8R, FK_NEON_FP_ARMV8, true, ARM::AEK_NONE)
+ARM_CPU_NAME("cortex-r52", ARMV8R, FK_FPV5_SP_D16, true, ARM::AEK_NONE)
 ARM_CPU_NAME("sc300", ARMV7M, FK_NONE, false, ARM::AEK_NONE)
 ARM_CPU_NAME("cortex-m3", ARMV7M, FK_NONE, true, ARM::AEK_NONE)
 ARM_CPU_NAME("cortex-m4", ARMV7EM, FK_FPV4_SP_D16, true, ARM::AEK_NONE)
diff --git a/llvm/lib/Target/ARM/ARM.td b/llvm/lib/Target/ARM/ARM.td
index 66596dbda83c95..6ee8c0540e6488 100644
--- a/llvm/lib/Target/ARM/ARM.td
+++ b/llvm/lib/Target/ARM/ARM.td
@@ -1167,9 +1167,7 @@ def ARMv8r    : Architecture<"armv8-r",   "ARMv8r",   [HasV8Ops,
                                                        FeatureDSP,
                                                        FeatureCRC,
                                                        FeatureMP,
-                                                       FeatureVirtualization,
-                                                       FeatureFPARMv8,
-                                                       FeatureNEON]>;
+                                                       FeatureVirtualization]>;
 
 def ARMv8mBaseline : Architecture<"armv8-m.base", "ARMv8mBaseline",
                                                       [HasV8MBaselineOps,
@@ -1726,6 +1724,8 @@ def : ProcNoItin<"kryo",                                [ARMv8a, ProcKryo,
                                                          FeatureCRC]>;
 
 def : ProcessorModel<"cortex-r52", CortexR52Model,      [ARMv8r, ProcR52,
+                                                         FeatureFPARMv8_D16_SP,
+                                                         FeatureFP16,
                                                          FeatureUseMISched,
                                                          FeatureFPAO]>;
 
diff --git a/llvm/test/Analysis/CostModel/ARM/arith.ll b/llvm/test/Analysis/CostModel/ARM/arith.ll
index 3a137a5af36664..8f173596c3b9a0 100644
--- a/llvm/test/Analysis/CostModel/ARM/arith.ll
+++ b/llvm/test/Analysis/CostModel/ARM/arith.ll
@@ -4,7 +4,7 @@
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve,+mve4beat < %s | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-MVE4
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R
 ; RUN: opt -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -mtriple=thumbv8.1m.main -mattr=+mve < %s | FileCheck %s --check-prefix=CHECK-MVE-SIZE
 
 target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
diff --git a/llvm/test/Analysis/CostModel/ARM/cast.ll b/llvm/test/Analysis/CostModel/ARM/cast.ll
index 60addd3077ed14..ae0d2347ec8be0 100644
--- a/llvm/test/Analysis/CostModel/ARM/cast.ll
+++ b/llvm/test/Analysis/CostModel/ARM/cast.ll
@@ -3,11 +3,11 @@
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE-RECIP
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R-RECIP
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE-SIZE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN-SIZE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE-SIZE
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R-SIZE
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R-SIZE
 
 target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
 
diff --git a/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll b/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll
index db700eb3baeefe..4a2f9a25dc152f 100644
--- a/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll
+++ b/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll
@@ -3,11 +3,11 @@
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE-RECIP
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R-RECIP
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE-SIZE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN-SIZE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE-SIZE
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R-SIZE
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R-SIZE
 
 target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
 
diff --git a/llvm/test/Analysis/CostModel/ARM/cmps.ll b/llvm/test/Analysis/CostModel/ARM/cmps.ll
index 7f89f521e77cbc..184b7076d02bea 100644
--- a/llvm/test/Analysis/CostModel/ARM/cmps.ll
+++ b/llvm/test/Analysis/CostModel/ARM/cmps.ll
@@ -2,11 +2,11 @@
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE-RECIP
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R-RECIP
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R-RECIP
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE-SIZE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN-SIZE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE-SIZE
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R-SIZE
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8 < %s | FileCheck %s --check-prefix=CHECK-V8R-SIZE
 
 target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
 
diff --git a/llvm/test/Analysis/CostModel/ARM/divrem.ll b/llvm/test/Analysis/CostModel/ARM/divrem.ll
index b582a61c2a0fc8..9aba39327f6601 100644
--- a/llvm/test/Analysis/CostModel/ARM/divrem.ll
+++ b/llvm/test/Analysis/CostModel/ARM/divrem.ll
@@ -3,7 +3,7 @@
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp < %s | FileCheck %s --check-prefix=CHECK-MVE
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.main-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-MAIN
 ; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=thumbv8m.base-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8M-BASE
-; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi < %s | FileCheck %s --check-prefix=CHECK-V8R
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=armv8r-none-eabi -mattr=+neon,+fp-armv8,+fp16 < %s | FileCheck %s --check-prefix=CHECK-V8R
 
 target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
 
diff --git a/llvm/test/CodeGen/ARM/build-attributes.ll b/llvm/test/CodeGen/ARM/build-attributes.ll
index e8c83b92f3f947..2812bf81acf98c 100644
--- a/llvm/test/CodeGen/ARM/build-attributes.ll
+++ b/llvm/test/CodeGen/ARM/build-attributes.ll
@@ -221,8 +221,8 @@
 
 ; ARMv8-R
 ; RUN: llc < %s -mtriple=arm-none-none-eabi -mcpu=cortex-r52 -mattr=-vfp2sp,-fp16 | FileCheck %s --check-prefix=ARMv8R --check-prefix=ARMv8R-NOFPU
-; RUN: llc < %s -mtriple=arm-none-none-eabi -mcpu=cortex-r52 -mattr=-neon,-fp64,-d32 | FileCheck %s --check-prefix=ARMv8R --check-prefix=ARMv8R-SP
-; RUN: llc < %s -mtriple=arm-none-none-eabi -mcpu=cortex-r52 | FileCheck %s --check-prefix=ARMv8R --check-prefix=ARMv8R-NEON
+; RUN: llc < %s -mtriple=arm-none-none-eabi -mcpu=cortex-r52 | FileCheck %s --check-prefix=ARMv8R --check-prefix=ARMv8R-SP
+; RUN: llc < %s -mtriple=arm-none-none-eabi -mcpu=cortex-r52 -mattr=+neon,+fp-armv8 | FileCheck %s --check-prefix=ARMv8R --check-prefix=ARMv8R-NEON
 
 ; ARMv8-M
 ; RUN: llc < %s -mtriple=thumbv8-none-none-eabi -mcpu=cortex-m23 | FileCheck %s --check-prefix=STRICT-ALIGN
diff --git a/llvm/test/CodeGen/ARM/fpconv.ll b/llvm/test/CodeGen/ARM/fpconv.ll
index 929da5f18c813e..5dfc2a6a4629b1 100644
--- a/llvm/test/CodeGen/ARM/fpconv.ll
+++ b/llvm/test/CodeGen/ARM/fpconv.ll
@@ -1,7 +1,7 @@
 ; RUN: llc -mtriple=arm-eabi -mattr=+vfp2 %s -o - | FileCheck %s --check-prefix=CHECK-VFP
 ; RUN: llc -mtriple=arm-apple-darwin %s -o - | FileCheck %s
-; RUN: llc -mtriple=armv8r-none-none-eabi %s -o - | FileCheck %s --check-prefix=CHECK-VFP
-; RUN: llc -mtriple=armv8r-none-none-eabi -mattr=-fp64 %s -o - | FileCheck %s --check-prefix=CHECK-VFP-SP
+; RUN: llc -mtriple=armv8r-none-none-eabi -mattr=+neon,+fp-armv8 %s -o - | FileCheck %s --check-prefix=CHECK-VFP
+; RUN: llc -mtriple=armv8r-none-none-eabi -mattr=+fp-armv8,-fp64,-d32 %s -o - | FileCheck %s --check-prefix=CHECK-VFP-SP
 
 define float @f1(double %x) {
 ;CHECK-VFP-LABEL: f1:
diff --git a/llvm/test/CodeGen/ARM/half.ll b/llvm/test/CodeGen/ARM/half.ll
index 9b53dc77f22739..050b400539b4dd 100644
--- a/llvm/test/CodeGen/ARM/half.ll
+++ b/llvm/test/CodeGen/ARM/half.ll
@@ -1,8 +1,8 @@
 ; RUN: llc < %s -mtriple=thumbv7-apple-ios7.0 | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-OLD
 ; RUN: llc < %s -mtriple=thumbv7s-apple-ios7.0 | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-F16
 ; RUN: llc < %s -mtriple=thumbv8-apple-ios7.0 | FileCheck %s --check-prefix=CHECK  --check-prefix=CHECK-V8
-; RUN: llc < %s -mtriple=armv8r-none-none-eabi | FileCheck %s --check-prefix=CHECK  --check-prefix=CHECK-V8
-; RUN: llc < %s -mtriple=armv8r-none-none-eabi -mattr=-fp64 | FileCheck %s --check-prefix=CHECK  --check-prefix=CHECK-V8-SP
+; RUN: llc < %s -mtriple=armv8r-none-none-eabi -mattr=+fp-armv8,+fp16 | FileCheck %s --check-prefix=CHECK  --check-prefix=CHECK-V8
+; RUN: llc < %s -mtriple=armv8r-none-none-eabi -mattr=+fp-armv8,+fp16,-fp64 | FileCheck %s --check-prefix=CHECK  --check-prefix=CHECK-V8-SP
 ; RUN: llc < %s -mtriple=armv8.1m-none-none-eabi -mattr=+fp-armv8 | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-V8
 ; RUN: llc < %s -mtriple=armv8.1m-none-none-eabi -mattr=+fp-armv8,-fp64 | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-V8-SP
 ; RUN: llc < %s -mtriple=armv8.1m-none-none-eabi -mattr=+mve.fp,+fp64 | FileCheck %s --check-prefix=CHECK-V8
diff --git a/llvm/test/CodeGen/ARM/pr42638-VMOVRRDCombine.ll b/llvm/test/CodeGen/ARM/pr42638-VMOVRRDCombine.ll
index 8ee88fd18f3c81..245780e6836ea2 100644
--- a/llvm/test/CodeGen/ARM/pr42638-VMOVRRDCombine.ll
+++ b/llvm/test/CodeGen/ARM/pr42638-VMOVRRDCombine.ll
@@ -16,7 +16,7 @@ entry:
 
 declare void @llvm.va_start(ptr)
 
-attributes #0 = { "target-cpu"="cortex-r52" "target-features"="-fp64"  }
+attributes #0 = { "target-cpu"="cortex-r52" }
 
 ; Ensures that the machine scheduler does not move accessing the upper
 ; 32 bits of the double to before actually storing it to memory
diff --git a/llvm/test/CodeGen/ARM/useaa.ll b/llvm/test/CodeGen/ARM/useaa.ll
index f8207a1056e3b4..bbf2bb043d3f31 100644
--- a/llvm/test/CodeGen/ARM/useaa.ll
+++ b/llvm/test/CodeGen/ARM/useaa.ll
@@ -1,7 +1,7 @@
-; RUN: llc < %s -mtriple=armv8r-eabi -mcpu=cortex-r52 | FileCheck %s --check-prefix=CHECK --check-prefix=USEAA
+; RUN: llc < %s -mtriple=armv8r-eabi -mcpu=cortex-r52 -mattr=+neon,+fp-armv8 | FileCheck %s --check-prefix=CHECK --check-prefix=USEAA
 ; RUN: llc < %s -mtriple=armv7m-eabi -mcpu=cortex-m4 | FileCheck %s --check-prefix=CHECK --check-prefix=USEAA
 ; RUN: llc < %s -mtriple=armv8m-eabi -mcpu=cortex-m33 | FileCheck %s --check-prefix=CHECK --check-prefix=USEAA
-; RUN: llc < %s -mtriple=armv8r-eabi -mcpu=generic | FileCheck %s --check-prefix=CHECK --check-prefix=GENERIC
+; RUN: llc < %s -mtriple=armv8r-eabi -mcpu=generic -mattr=+neon,+fp-armv8 | FileCheck %s --check-prefix=CHECK --check-prefix=GENERIC
 
 ; Check we use AA during codegen, so can interleave these loads/stores.
 
diff --git a/llvm/unittests/TargetParser/TargetParserTest.cpp b/llvm/unittests/TargetParser/TargetParserTest.cpp
index 2c72a7229b5274..82c5c0d6c12175 100644
--- a/llvm/unittests/TargetParser/TargetParserTest.cpp
+++ b/llvm/unittests/TargetParser/TargetParserTest.cpp
@@ -351,7 +351,7 @@ INSTANTIATE_TEST_SUITE_P(
                                    ARM::AEK_MP | ARM::AEK_HWDIVARM |
                                        ARM::AEK_HWDIVTHUMB | ARM::AEK_DSP,
                                    "7-R"),
-        ARMCPUTestParams<uint64_t>("cortex-r52", "armv8-r", "neon-fp-armv8",
+        ARMCPUTestParams<uint64_t>("cortex-r52", "armv8-r", "fpv5-sp-d16",
                                    ARM::AEK_NONE | ARM::AEK_CRC | ARM::AEK_MP |
                                        ARM::AEK_VIRT | ARM::AEK_HWDIVARM |
                                        ARM::AEK_HWDIVTHUMB | ARM::AEK_DSP,

@chrisnc
Copy link
Contributor Author

chrisnc commented Apr 18, 2024

ping @ostannard

Copy link
Collaborator

@ostannard ostannard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davemgreen
Copy link
Collaborator

Does this disable neon by default for cortex-r52? If so I don't think it should be doing that, it would be a break in the existing behaviour, and should at least be in the release notes.

The general rule for -mcpu options is that they should (roughly) enable the maximum set of features available, and from there can be disabled with -mfpu or +nofp options. Essentially the default is -mfpu=auto. Keeping to this keeps things consistent between cores, so long as people understand the rule. For architectures the default should be without, but as far as I understand we still default to -mfpu=auto.

@chrisnc
Copy link
Contributor Author

chrisnc commented Apr 18, 2024

@davemgreen yes, this does change the cortex-r52 default features as well. The problem is that the cortex-r52 is the "default" CPU for armv8r (probably because it's the only one, ignoring r52plus), so if I don't reduce the feature set of that CPU, armv8r will continue to imply features that are not true of all processors in that family. I can add this to the release notes if there's no other way to achieve what is needed here. Would removing the default CPU for armv8r be acceptable?
Regardless, this is an intentional behavior change because currently you get all neon and dp features just by using this architecture, so I can add release notes about that if necessary.

@chrisnc chrisnc force-pushed the armv8r-no-fp-simd branch 2 times, most recently from ccc3022 to a021b57 Compare April 19, 2024 04:35
Copy link

github-actions bot commented Apr 19, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@chrisnc chrisnc force-pushed the armv8r-no-fp-simd branch from a021b57 to 4681c30 Compare April 19, 2024 04:39
@chrisnc
Copy link
Contributor Author

chrisnc commented Apr 19, 2024

I've updated the PR to use the proposed approach of making generic the default CPU for armv8r and let cortex-r52 have the full dp+neon suite. Let me know if this is alright. I still need to fix up a few of the tests to match this new behavior.

@chrisnc chrisnc force-pushed the armv8r-no-fp-simd branch 2 times, most recently from 65e44d6 to a4a27e9 Compare April 20, 2024 21:19
@llvmbot llvmbot added the clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' label Apr 20, 2024
@chrisnc
Copy link
Contributor Author

chrisnc commented Apr 21, 2024

Another option is to include FeatureFPARMv8_D16_SP in ARMv8r. The R-profile supplement of the Arm manual does say that this is a minimum feature requirement (as opposed to just being a variant of the R52).

@davemgreen
Copy link
Collaborator

As far as I understand this will remove the tuning we do for cortex-r52 when using armv8r, which will mean a little less performance but the tuning features in the Arm backend are not handled as well as they could be.

Can you add release note explaining what will change? Thanks.

@davemgreen davemgreen requested a review from smithp35 April 21, 2024 19:12
@chrisnc chrisnc force-pushed the armv8r-no-fp-simd branch from a4a27e9 to 575128c Compare April 22, 2024 00:29
@chrisnc
Copy link
Contributor Author

chrisnc commented Apr 22, 2024

Added an item to the release notes and fixed another place where fp64+neon was being added (the target parser); now I see the expected results when using just armv8r-none-eabi (sans -mcpu=cortex-r52).

$ bin/clang --target=armv8r-none-eabihf -O3 -c -o foo.o foo.c
$ arm-none-eabi-readelf -a
Attribute Section: aeabi
File Attributes
  Tag_conformance: "2.09"
  Tag_CPU_arch: v8-R
  Tag_CPU_arch_profile: Realtime
  Tag_ARM_ISA_use: Yes
  Tag_THUMB_ISA_use: Thumb-2
  Tag_FP_arch: FPv5/FP-D16 for ARMv8
  Tag_ABI_PCS_R9_use: V6
  Tag_ABI_PCS_GOT_use: direct
  Tag_ABI_PCS_wchar_t: 4
  Tag_ABI_FP_denormal: Needed
  Tag_ABI_FP_exceptions: Unused
  Tag_ABI_FP_number_model: IEEE 754
  Tag_ABI_align_needed: 8-byte
  Tag_ABI_align_preserved: 8-byte, except leaf SP
  Tag_ABI_enum_size: int
  Tag_ABI_HardFP_use: SP only
  Tag_ABI_VFP_args: VFP registers
  Tag_ABI_optimization_goals: Aggressive Speed
  Tag_CPU_unaligned_access: v6
  Tag_FP_HP_extension: Allowed
  Tag_ABI_FP_16bit_format: IEEE 754
  Tag_MPextension_use: Allowed
  Tag_Virtualization_use: Virtualization Extensions

@chrisnc chrisnc requested a review from ostannard April 22, 2024 00:38
@chrisnc chrisnc force-pushed the armv8r-no-fp-simd branch from 575128c to 46803a6 Compare April 22, 2024 00:45
@davemgreen
Copy link
Collaborator

I'm not sure I would make this change, mostly due to it potentially causing a break for existing users and making performance worse, but can see the reasoning. I am willing to defer to others if they have an opinion.

@chrisnc
Copy link
Contributor Author

chrisnc commented Apr 24, 2024

@davemgreen which change? Specifying -mcpu=cortex-r52 will behave the same way as before. The original manual for the R52 provided for a no-neon sp-only variant, and they exist in the wild, and this lets "architecture-generic" builds automatically support both.

One example where this comes up is in the Rust project, which recently gained armv8r support, but due to the over-spec'ing of the base armv8r in LLVM, it requires -feature negations in the default target feature list to build the core crate in a way that will support lesser r52s. Then, when users specify cortex-r52 as their target cpu, they are still left with the -feature negations overriding what the r52 should enable by default. rust-lang/rust#123159 (comment) This change will allow the feature flags and target CPUs to interact in a more predictable way as compared to other Cortex-R and -M CPUs.

@chrisnc
Copy link
Contributor Author

chrisnc commented May 2, 2024

Friendly bump. :) @ostannard @davemgreen

@davemgreen
Copy link
Collaborator

which change? Specifying -mcpu=cortex-r52 will behave the same way as before. The original manual for the R52 provided for a no-neon sp-only variant, and they exist in the wild, and this lets "architecture-generic" builds automatically support both.

I just meant the break in functionality for existing users of -march=armv8r, and the drop in performance from less R52 tuning features/scheduling and the lack of Neon. I believe this is more correct, but users in the past could still get the same results by specifying -march=armv8r -mfpu=fpv5-sp-d16. I'm hoping someone else can come along and have an opinion on it one way or another.

@chrisnc
Copy link
Contributor Author

chrisnc commented May 2, 2024

Understood. FWIW, arm-none-eabi-gcc does not enable any FPU features by default with -march=armv8-r (and will error out if you combine that with -mfloat-abi=hard), which is what the first pass of this PR went for, but I think where we landed is a decent middle ground.

@smithp35
Copy link
Collaborator

smithp35 commented May 3, 2024

I can chime in with an opinion but unfortunately I think it may be different to everyone elses!

This is a bit of an awkward situation as I think we have to balance several things:

  • Consistency between v8-R and AArch32 (ARM) and AArch64 (more consistent the better)
  • Consistency between clang and gcc (more consistent the better)
  • Historical behaviour of -march=armv8-r (how often is it used vs -mcpu)
  • Historical behaviour of -mcpu=cortex-r82 (keeping these stable as this is the most specific target)
  • Consistency with other -march and -mcpu with respect to feature enablement heuristics (helps transition between options).

As a general rule/heuristic for Arm feature enablement going forward, you will find exceptions to these rules in LLVM, particularly pre Armv8. we tend towards:

  • -march enables the mandatory features, but not the optional features, with specific exceptions such as SVE for Armv9 and floating point and SIMD for AArch64 Armv8
  • -mcpu enables the mandatory and optional features, with specific exceptions such as crypto

The Cortex-R series are intended for deeply embedded applications so it is normally possible for a user to directly target the CPU rather than the architecture. As there is only one v8-R CPU I'm fully expecting most users to use -mcpu=cortex-r82 rather than -march=armv8-r.

With these in mind. I propose:

  • -march=armv8-r enables the mandatory architectural features and including the mandatory minimum FPU given that all implementations of v8-R have that. It doesn't make sense to make it optional for puritys sake.
  • Make -march=armv8-r use a generic CPU so that we don't need to change cortex-r82. Yes this will make the performance of v8-R worse without -mtune, but I think it is preferable to keep -mcpu=cortex-r82 unaltered as that is what most people will be using.
  • Do not change -mcpu=cortex-r82 from its current default.

This is again an opinion based the weights that I put on the trade-off dimensions.

@smithp35
Copy link
Collaborator

smithp35 commented May 3, 2024

Apologies I read the comments, rather than go through the code. It looks the code has already made the v8-r imply a generic CPU.

So it looks like the code is doing what my proposal states so no objections from me and apologies for the noise.

@chrisnc
Copy link
Contributor Author

chrisnc commented May 3, 2024

@smithp35 thanks! Yes, I iterated on the specific direction here after the feedback about not changing the behavior of -mcpu=cortex-r52, but the first comment in the thread reflects the initial state, so I'll change that. I believe the cortex-r82 is AArch64-only, and so is not relevant here except w.r.t. consistency. I'm not familiar with how default CPUs are set in AArch64, but experimentally, --target=aarch64-none-elf -march=armv8-r does not generate the same code on a simple example as using --target=aarch64-none-elf -mcpu=cortex-r82 currently, so it does not appear to be a default there, unlike cortex-r52 without this change. Other than s/82/52/g, I confirm that what you've written is in line with what this change is doing.

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the confirmation.

@chrisnc
Copy link
Contributor Author

chrisnc commented May 5, 2024

If there's no other feedback, could someone hit the merge button for me?

@ostannard ostannard merged commit 651bdb9 into llvm:main May 7, 2024
5 checks passed
Copy link

github-actions bot commented May 7, 2024

@chrisnc Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@chrisnc chrisnc deleted the armv8r-no-fp-simd branch May 7, 2024 15:41
chrisnc added a commit to chrisnc/rust that referenced this pull request Sep 13, 2024
This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work
properly. Now that this change exists in rustc's llvm, we can fix
Armv8-R's default fpu features.
chrisnc added a commit to chrisnc/rust that referenced this pull request Sep 14, 2024
This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work
properly. Now that this change exists in rustc's llvm, we can fix
Armv8-R's default fpu features.

Add a run-make test that target-cpu=cortex-r52 enables double-precision
and neon.
chrisnc added a commit to chrisnc/rust that referenced this pull request Sep 14, 2024
This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work
properly. Now that this change exists in rustc's llvm, we can fix
Armv8-R's default fpu features.

Add a run-make test that target-cpu=cortex-r52 enables double-precision
and neon.
chrisnc added a commit to chrisnc/rust that referenced this pull request Sep 14, 2024
This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work
properly. Now that this change exists in rustc's llvm, we can fix
Armv8-R's default fpu features. In Armv8-R's case, the default features
from LLVM for floating-point are sufficient, because there is no
integer-only variant of this architecture.

Add a run-make test that target-cpu=cortex-r52 enables double-precision
and neon.
chrisnc added a commit to chrisnc/rust that referenced this pull request Sep 15, 2024
This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work
properly. Now that this change exists in rustc's llvm, we can fix
Armv8-R's default fpu features. In Armv8-R's case, the default features
from LLVM for floating-point are sufficient, because there is no
integer-only variant of this architecture.

Add a run-make test that an appropriate target-cpu enables
double-precision and neon for thumbv8m.main-none-eabihf and
armv8r-none-eabihf.
chrisnc added a commit to chrisnc/rust that referenced this pull request Sep 15, 2024
This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work
properly. Now that this change exists in rustc's llvm, we can fix
Armv8-R's default fpu features. In Armv8-R's case, the default features
from LLVM for floating-point are sufficient, because there is no
integer-only variant of this architecture.

Add a run-make test that an appropriate target-cpu enables
double-precision and neon for thumbv8m.main-none-eabihf and
armv8r-none-eabihf.
chrisnc added a commit to chrisnc/rust that referenced this pull request Sep 15, 2024
This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work
properly. Now that this change exists in rustc's llvm, we can fix
Armv8-R's default fpu features. In Armv8-R's case, the default features
from LLVM for floating-point are sufficient, because there is no
integer-only variant of this architecture.

Add a run-make test that an appropriate target-cpu enables
double-precision and neon for thumbv7em-none-eabihf and
thumbv8m.main-none-eabihf.
chrisnc added a commit to chrisnc/rust that referenced this pull request Sep 15, 2024
This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work
properly. Now that this change exists in rustc's llvm, we can fix
Armv8-R's default fpu features. In Armv8-R's case, the default features
from LLVM for floating-point are sufficient, because there is no
integer-only variant of this architecture.

Add a run-make test that an appropriate target-cpu enables
double-precision and neon for thumbv7em-none-eabihf and
thumbv8m.main-none-eabihf.
chrisnc added a commit to chrisnc/rust that referenced this pull request Sep 15, 2024
This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work
properly. Now that this change exists in rustc's llvm, we can fix
Armv8-R's default fpu features. In Armv8-R's case, the default features
from LLVM for floating-point are sufficient, because there is no
integer-only variant of this architecture.

Add a run-make test that an appropriate target-cpu enables
double-precision and neon for thumbv7em-none-eabihf and
thumbv8m.main-none-eabihf.
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 15, 2024
Fix target-cpu fpu features on Armv8-R.

This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work properly. Now that this change exists in rustc's llvm, we can fix Armv8-R's default fpu features. In Armv8-R's case, the default features from LLVM for floating-point are sufficient, because there is no integer-only variant of this architecture.

Add a run-make test that target-cpu=cortex-r52 enables double-precision and neon.

try-job: dist-aarch64-linux
try-job: dist-various-1
try-job: dist-various-2
try-job: dist-x86_64-linux
try-job: dist-x86_64-msvc
chrisnc added a commit to chrisnc/rust that referenced this pull request Sep 15, 2024
This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work
properly. Now that this change exists in rustc's llvm, we can fix
Armv8-R's default fpu features. In Armv8-R's case, the default features
from LLVM for floating-point are sufficient, because there is no
integer-only variant of this architecture.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Sep 15, 2024
…ingjubilee

Fix target-cpu fpu features on Armv8-R.

This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work properly. Now that this change exists in rustc's llvm, we can fix Armv8-R's default fpu features. In Armv8-R's case, the default features from LLVM for floating-point are sufficient, because there is no integer-only variant of this architecture.
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Sep 15, 2024
…ingjubilee

Fix target-cpu fpu features on Armv8-R.

This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work properly. Now that this change exists in rustc's llvm, we can fix Armv8-R's default fpu features. In Armv8-R's case, the default features from LLVM for floating-point are sufficient, because there is no integer-only variant of this architecture.
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Sep 15, 2024
…ingjubilee

Fix target-cpu fpu features on Armv8-R.

This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work properly. Now that this change exists in rustc's llvm, we can fix Armv8-R's default fpu features. In Armv8-R's case, the default features from LLVM for floating-point are sufficient, because there is no integer-only variant of this architecture.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Sep 15, 2024
…ingjubilee

Fix target-cpu fpu features on Armv8-R.

This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work properly. Now that this change exists in rustc's llvm, we can fix Armv8-R's default fpu features. In Armv8-R's case, the default features from LLVM for floating-point are sufficient, because there is no integer-only variant of this architecture.
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Sep 15, 2024
Rollup merge of rust-lang#130295 - chrisnc:armv8r-feature-fix, r=workingjubilee

Fix target-cpu fpu features on Armv8-R.

This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work properly. Now that this change exists in rustc's llvm, we can fix Armv8-R's default fpu features. In Armv8-R's case, the default features from LLVM for floating-point are sufficient, because there is no integer-only variant of this architecture.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:ARM clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category llvm:analysis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants