Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Add support for Cortex-R82AE and improve Cortex-R82 #90440

Merged
merged 4 commits into from
Apr 30, 2024

Conversation

jthackray
Copy link
Contributor

@jthackray jthackray commented Apr 29, 2024

  • [AArch64] Add support for Cortex-R82AE and improve Cortex-R82

Cortex-R82AE is an Armv8R AArch64 CPU. Also, update Cortex-R82
feature flags to be more accurate.

Technical Reference Manual for Cortex-R82AE:
   https://developer.arm.com/documentation/101550/latest/
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 labels Apr 29, 2024
@jthackray jthackray closed this Apr 29, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 29, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Jonathan Thackray (jthackray)

Changes
  • Fix mismatches between function parameter definitions and declarations (#89512)
  • Revert "[llvm][RISCV] Enable trailing fences for seq-cst stores by default (#87376)"
  • Revert "[RISCV] Support RISCV Atomics ABI attributes (#84597)"
  • [SelectionDAG] Treat CopyFromReg as freezing the value (#85932)
  • [DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)
  • [AArch64] Add support for Neoverse-N3, Neoverse-V3 and Neoverse-V3AE (#90143)
  • [clang][X86] Fix -Wundef warning in cpuid.h (#89842)
  • Add test cases for SELECT->AND miscompiles in DAGCombiner
  • [M68k] Add support for MOVEQ instruction (#88542)
  • [Transforms] Debug values are not remapped when cloning. (#87747)
  • [RISCV][NFC] Future-proof reference to ISA manual in RISCVInstrInfoC.td
  • [DAG] visitORCommutative - fold build_pair(not(x),not(y)) -> not(build_pair(x,y)) style patterns (#90050)
  • [NFC][OpenACC] Remove stale FIXME comment in a test
  • DAG: Simplify demanded bits for truncating atomic_store (#90113)
  • [Offload] Remove remaining __tgt_register_requires references (#90198)
  • Revert "[TableGen] Ignore inaccessible memory when checking pattern flags (#90061)"
  • [SLP]Attempt to vectorize long stores, if short one failed.
  • [mlir][MemRef] Add ExtractStridedMetadataOpCollapseShapeFolder (#89954)
  • [mlir] Mark isa/dyn_cast/cast/... member functions deprecated. (#89998)
  • [clang] Add test for CWG2149 "Brace elision and array length deduction" (#90079)
  • [libc++][ranges] LWG3984: ranges::to's recursion branch may be ill-formed (#87964)
  • [clang-tidy][NFC] Fix broken link in documentation of cert-env33-c (#90216)
  • [mlir] Fix -Wdeprecated-declarations of cast in VCIXToLLVMIRTranslation.cpp (NFC)
  • [mlir] Add sub-byte type emulation support for memref.collapse_shape (#89962)
  • [MC] Rename temporary symbols of empty name to ".L0 " (#89693)
  • [X86] Regenerate subreg-to-reg tests with update_llc_test_checks.py
  • [C++17] Support _GCC[CON|DE]STRUCTIVE_SIZE (#89446)
  • [AArch64][SVE2] SVE2 NBSL instruction lowering. (#89732)
  • [libc++][ranges] Exports operator|. (#90071)
  • [NFC] update comments from an earlier version of SuffixTree (#89800)
  • [scudo] Reflect the allowed values for M_DECAY_TIME on Android (#89114)
  • [DXIL] Fix build warning (#90226)
  • [OpenMP][AIX] Use syssmt() to get the number of SMTs per physical CPU (#89985)
  • [RISCV] Flatten the ImpliedExts table in RISCVISAInfo.cpp (#89975)
  • [LV] Add tests showing missed propgation of versiond stride values.
  • [mlir][sparse] fold sparse convert into producer linalg op. (#89999)
  • [Arm64EC] Improve alignment mangling in arm64ec thunks. (#90115)
  • [RISCV] Add an instruction PrettyPrinter to llvm-objdump (#90093)
  • [APINotes] Allow annotating a C++ type as non-copyable in Swift
  • [lldb] Switch to llvm::DWARFUnitHeader (#89808)
  • [SLP]Fix PR90224: check that users of gep are all vectorized.
  • [lldb] Fix typo in CumulativeSystemTimeIsValid check (#89680)
  • [Libomptarget] Rename libomptarget.rtl.x86_64 to libomptarget.rtl.host (#86868)
  • [RISCV] Consistently use uint32_t in Disassembler decode functions. NFC
  • [Driver,test] Replace CHECK-NOT: warning with -### -Werror
  • [HLSL][SPIR-V] Target directx is required
  • Revert "[mlir] Mark isa/dyn_cast/cast/... member functions deprecated. (#89998)" (#90250)
  • [RISCV] Fix off by 1 typo in decodeVMaskReg. NFC
  • Implement the DWARF 6 language and version attributes. (#89980)
  • [AMDGPU] Support byte_sel modifier on v_cvt_sr_fp8_f32 and v_cvt_sr_bf8_f32 (#90244)
  • [ci] Add clang project dependency for bolt testing (#90262)
  • [NFC] [HWASan] factor out debug record annotation (#90252)
  • [lldb][sbapi] Fix API break in SBDebugger broadcast bits (#90261)
  • [VPlan] Also propagate versioned strides to users via sext/zext.
  • [flang][cuda] Avoid to issue data transfer in device context (#90247)
  • [WebAssembly] Add half-precision feature (#90248)
  • [BOLT][NFC] Use getEHFrameHdrSectionName() (#90257)
  • [alpha.webkit.UncountedCallArgsChecker] Avoid emitting warnings for Ref, RefPtr, and their variants. (#90153)
  • [ASan][Test] Remove hardcoded linker version from test (#90147)
  • [AArch64] Add support for Cortex-R82AE and improve Cortex-R82

Full diff: https://github.com/llvm/llvm-project/pull/90440.diff

8 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+1)
  • (modified) clang/test/Misc/target-invalid-cpu-note.c (+2-2)
  • (modified) llvm/docs/ReleaseNotes.rst (+1-1)
  • (modified) llvm/include/llvm/TargetParser/AArch64TargetParser.h (+12-1)
  • (modified) llvm/lib/Target/AArch64/AArch64Processors.td (+14-1)
  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+1)
  • (modified) llvm/lib/TargetParser/Host.cpp (+1)
  • (modified) llvm/unittests/TargetParser/TargetParserTest.cpp (+15-2)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 5d4d152b2eb540..c92d480023f4d4 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -645,6 +645,7 @@ Arm and AArch64 Support
     * Arm Cortex-A78AE (cortex-a78ae).
     * Arm Cortex-A520AE (cortex-a520ae).
     * Arm Cortex-A720AE (cortex-a720ae).
+    * Arm Cortex-R82AE (cortex-r82ae).
     * Arm Neoverse-N3 (neoverse-n3).
     * Arm Neoverse-V3 (neoverse-v3).
     * Arm Neoverse-V3AE (neoverse-v3ae).
diff --git a/clang/test/Misc/target-invalid-cpu-note.c b/clang/test/Misc/target-invalid-cpu-note.c
index 21d80b7134508f..768b243b04e3a3 100644
--- a/clang/test/Misc/target-invalid-cpu-note.c
+++ b/clang/test/Misc/target-invalid-cpu-note.c
@@ -5,11 +5,11 @@
 
 // RUN: not %clang_cc1 -triple arm64--- -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix AARCH64
 // AARCH64: error: unknown target CPU 'not-a-cpu'
-// AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a520, cortex-a520ae, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a78, cortex-a78ae, cortex-a78c, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, cortex-r82, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-n3, neoverse-512tvb, neoverse-v1, neoverse-v2, neoverse-v3, neoverse-v3ae, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-a17, apple-m1, apple-m2, apple-m3, apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, carmel, ampere1, ampere1a, ampere1b, cobalt-100, grace{{$}}
+// AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a520, cortex-a520ae, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a78, cortex-a78ae, cortex-a78c, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, cortex-r82, cortex-r82ae, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-n3, neoverse-512tvb, neoverse-v1, neoverse-v2, neoverse-v3, neoverse-v3ae, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-a17, apple-m1, apple-m2, apple-m3, apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, carmel, ampere1, ampere1a, ampere1b, cobalt-100, grace{{$}}
 
 // RUN: not %clang_cc1 -triple arm64--- -tune-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix TUNE_AARCH64
 // TUNE_AARCH64: error: unknown target CPU 'not-a-cpu'
-// TUNE_AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a520, cortex-a520ae, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a78, cortex-a78ae, cortex-a78c, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, cortex-r82, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-n3, neoverse-512tvb, neoverse-v1, neoverse-v2, neoverse-v3, neoverse-v3ae, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-a17, apple-m1, apple-m2, apple-m3, apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, carmel, ampere1, ampere1a, ampere1b, cobalt-100, grace{{$}}
+// TUNE_AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a520, cortex-a520ae, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a78, cortex-a78ae, cortex-a78c, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, cortex-r82, cortex-r82ae, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-n3, neoverse-512tvb, neoverse-v1, neoverse-v2, neoverse-v3, neoverse-v3ae, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-a17, apple-m1, apple-m2, apple-m3, apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, carmel, ampere1, ampere1a, ampere1b, cobalt-100, grace{{$}}
 
 // RUN: not %clang_cc1 -triple i386--- -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix X86
 // X86: error: unknown target CPU 'not-a-cpu'
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index 64a69832521290..4c07abb744a238 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -70,7 +70,7 @@ Changes to the AArch64 Backend
 ------------------------------
 
 * Added support for Cortex-A78AE, Cortex-A520AE, Cortex-A720AE,
-  Neoverse-N3, Neoverse-V3 and Neoverse-V3AE CPUs.
+  Cortex-R82AE, Neoverse-N3, Neoverse-V3 and Neoverse-V3AE CPUs.
 
 Changes to the AMDGPU Backend
 -----------------------------
diff --git a/llvm/include/llvm/TargetParser/AArch64TargetParser.h b/llvm/include/llvm/TargetParser/AArch64TargetParser.h
index 0d1cfd152151aa..c3d033e1659139 100644
--- a/llvm/include/llvm/TargetParser/AArch64TargetParser.h
+++ b/llvm/include/llvm/TargetParser/AArch64TargetParser.h
@@ -632,7 +632,18 @@ inline constexpr CpuInfo CpuInfos[] = {
                                AArch64::AEK_PAUTH, AArch64::AEK_SVE2BITPERM,
                                AArch64::AEK_FLAGM, AArch64::AEK_PERFMON,
                                AArch64::AEK_PREDRES, AArch64::AEK_PROFILE})},
-    {"cortex-r82", ARMV8R, AArch64::ExtensionBitset({AArch64::AEK_LSE})},
+    {"cortex-r82", ARMV8R,
+     AArch64::ExtensionBitset({AArch64::AEK_CRC, AArch64::AEK_DOTPROD,
+                               AArch64::AEK_FLAGM, AArch64::AEK_FP,
+                               AArch64::AEK_RCPC, AArch64::AEK_PAUTH,
+                               AArch64::AEK_PERFMON, AArch64::AEK_RAS,
+                               AArch64::AEK_RDM, AArch64::AEK_PREDRES})},
+    {"cortex-r82ae", ARMV8R,
+     AArch64::ExtensionBitset({AArch64::AEK_CRC, AArch64::AEK_DOTPROD,
+                               AArch64::AEK_FLAGM, AArch64::AEK_FP,
+                               AArch64::AEK_RCPC, AArch64::AEK_PAUTH,
+                               AArch64::AEK_PERFMON, AArch64::AEK_RAS,
+                               AArch64::AEK_RDM, AArch64::AEK_PREDRES})},
     {"cortex-x1", ARMV8_2A,
      AArch64::ExtensionBitset({AArch64::AEK_AES, AArch64::AEK_SHA2,
                                AArch64::AEK_FP16, AArch64::AEK_DOTPROD,
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index 8772e51bf0ab42..f2286ae17dba56 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -194,6 +194,11 @@ def TuneR82 : SubtargetFeature<"cortex-r82", "ARMProcFamily",
                                "Cortex-R82 ARM processors", [
                                FeaturePostRAScheduler]>;
 
+def TuneR82AE : SubtargetFeature<"cortex-r82ae", "ARMProcFamily",
+                                 "CortexR82AE",
+                                 "Cortex-R82-AE ARM processors",
+                                 [FeaturePostRAScheduler]>;
+
 def TuneX1 : SubtargetFeature<"cortex-x1", "ARMProcFamily", "CortexX1",
                                   "Cortex-X1 ARM processors", [
                                   FeatureCmpBccFusion,
@@ -667,7 +672,13 @@ def ProcessorFeatures {
   list<SubtargetFeature> R82  = [HasV8_0rOps, FeaturePerfMon, FeatureFullFP16,
                                  FeatureFP16FML, FeatureSSBS, FeaturePredRes,
                                  FeatureSB, FeatureRDM, FeatureDotProd,
-                                 FeatureComplxNum, FeatureJS];
+                                 FeatureComplxNum, FeatureJS,
+                                 FeatureCacheDeepPersist];
+  list<SubtargetFeature> R82AE = [HasV8_0rOps, FeaturePerfMon, FeatureFullFP16,
+                                  FeatureFP16FML, FeatureSSBS, FeaturePredRes,
+                                  FeatureSB, FeatureRDM, FeatureDotProd,
+                                  FeatureComplxNum, FeatureJS,
+                                  FeatureCacheDeepPersist];
   list<SubtargetFeature> X1   = [HasV8_2aOps, FeatureCrypto, FeatureFPARMv8,
                                  FeatureNEON, FeatureRCPC, FeaturePerfMon,
                                  FeatureSPE, FeatureFullFP16, FeatureDotProd,
@@ -854,6 +865,8 @@ def : ProcessorModel<"cortex-a720ae", NeoverseN2Model, ProcessorFeatures.A720AE,
                      [TuneA720AE]>;
 def : ProcessorModel<"cortex-r82", CortexA55Model, ProcessorFeatures.R82,
                      [TuneR82]>;
+def : ProcessorModel<"cortex-r82ae", CortexA55Model, ProcessorFeatures.R82AE,
+                     [TuneR82AE]>;
 def : ProcessorModel<"cortex-x1", CortexA57Model, ProcessorFeatures.X1,
                      [TuneX1]>;
 def : ProcessorModel<"cortex-x1c", CortexA57Model, ProcessorFeatures.X1C,
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index 7d34dd1c776878..747d82639a9b4f 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -143,6 +143,7 @@ void AArch64Subtarget::initializeProperties(bool HasMinSize) {
   case CortexA78AE:
   case CortexA78C:
   case CortexR82:
+  case CortexR82AE:
   case CortexX1:
     PrefFunctionAlignment = Align(16);
     PrefLoopAlignment = Align(32);
diff --git a/llvm/lib/TargetParser/Host.cpp b/llvm/lib/TargetParser/Host.cpp
index 0a93b06f40c248..8823ae370ed2b0 100644
--- a/llvm/lib/TargetParser/Host.cpp
+++ b/llvm/lib/TargetParser/Host.cpp
@@ -214,6 +214,7 @@ StringRef sys::detail::getHostCPUNameForARM(StringRef ProcCpuinfoContent) {
         .Case("0xc18", "cortex-r8")
         .Case("0xd13", "cortex-r52")
         .Case("0xd15", "cortex-r82")
+        .Case("0xd14", "cortex-r82ae")
         .Case("0xd02", "cortex-a34")
         .Case("0xd04", "cortex-a35")
         .Case("0xd03", "cortex-a53")
diff --git a/llvm/unittests/TargetParser/TargetParserTest.cpp b/llvm/unittests/TargetParser/TargetParserTest.cpp
index 75e235008b4f25..816aea44a9bc51 100644
--- a/llvm/unittests/TargetParser/TargetParserTest.cpp
+++ b/llvm/unittests/TargetParser/TargetParserTest.cpp
@@ -1390,7 +1390,20 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_DOTPROD, AArch64::AEK_FP, AArch64::AEK_SIMD,
                  AArch64::AEK_FP16, AArch64::AEK_FP16FML, AArch64::AEK_RAS,
                  AArch64::AEK_RCPC, AArch64::AEK_LSE, AArch64::AEK_SB,
-                 AArch64::AEK_JSCVT, AArch64::AEK_FCMA, AArch64::AEK_PAUTH}),
+                 AArch64::AEK_JSCVT, AArch64::AEK_FCMA, AArch64::AEK_PAUTH,
+                 AArch64::AEK_FLAGM, AArch64::AEK_PERFMON,
+                 AArch64::AEK_PREDRES}),
+            "8-R"),
+        ARMCPUTestParams<AArch64::ExtensionBitset>(
+            "cortex-r82ae", "armv8-r", "crypto-neon-fp-armv8",
+            AArch64::ExtensionBitset(
+                {AArch64::AEK_CRC, AArch64::AEK_RDM, AArch64::AEK_SSBS,
+                 AArch64::AEK_DOTPROD, AArch64::AEK_FP, AArch64::AEK_SIMD,
+                 AArch64::AEK_FP16, AArch64::AEK_FP16FML, AArch64::AEK_RAS,
+                 AArch64::AEK_RCPC, AArch64::AEK_LSE, AArch64::AEK_SB,
+                 AArch64::AEK_JSCVT, AArch64::AEK_FCMA, AArch64::AEK_PAUTH,
+                 AArch64::AEK_FLAGM, AArch64::AEK_PERFMON,
+                 AArch64::AEK_PREDRES}),
             "8-R"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "cortex-x1", "armv8.2-a", "crypto-neon-fp-armv8",
@@ -1806,7 +1819,7 @@ INSTANTIATE_TEST_SUITE_P(
     ARMCPUTestParams<AArch64::ExtensionBitset>::PrintToStringParamName);
 
 // Note: number of CPUs includes aliases.
-static constexpr unsigned NumAArch64CPUArchs = 75;
+static constexpr unsigned NumAArch64CPUArchs = 76;
 
 TEST(TargetParserTest, testAArch64CPUArchList) {
   SmallVector<StringRef, NumAArch64CPUArchs> List;

@llvmbot
Copy link
Member

llvmbot commented Apr 29, 2024

@llvm/pr-subscribers-clang

Author: Jonathan Thackray (jthackray)

Changes
  • Fix mismatches between function parameter definitions and declarations (#89512)
  • Revert "[llvm][RISCV] Enable trailing fences for seq-cst stores by default (#87376)"
  • Revert "[RISCV] Support RISCV Atomics ABI attributes (#84597)"
  • [SelectionDAG] Treat CopyFromReg as freezing the value (#85932)
  • [DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)
  • [AArch64] Add support for Neoverse-N3, Neoverse-V3 and Neoverse-V3AE (#90143)
  • [clang][X86] Fix -Wundef warning in cpuid.h (#89842)
  • Add test cases for SELECT->AND miscompiles in DAGCombiner
  • [M68k] Add support for MOVEQ instruction (#88542)
  • [Transforms] Debug values are not remapped when cloning. (#87747)
  • [RISCV][NFC] Future-proof reference to ISA manual in RISCVInstrInfoC.td
  • [DAG] visitORCommutative - fold build_pair(not(x),not(y)) -> not(build_pair(x,y)) style patterns (#90050)
  • [NFC][OpenACC] Remove stale FIXME comment in a test
  • DAG: Simplify demanded bits for truncating atomic_store (#90113)
  • [Offload] Remove remaining __tgt_register_requires references (#90198)
  • Revert "[TableGen] Ignore inaccessible memory when checking pattern flags (#90061)"
  • [SLP]Attempt to vectorize long stores, if short one failed.
  • [mlir][MemRef] Add ExtractStridedMetadataOpCollapseShapeFolder (#89954)
  • [mlir] Mark isa/dyn_cast/cast/... member functions deprecated. (#89998)
  • [clang] Add test for CWG2149 "Brace elision and array length deduction" (#90079)
  • [libc++][ranges] LWG3984: ranges::to's recursion branch may be ill-formed (#87964)
  • [clang-tidy][NFC] Fix broken link in documentation of cert-env33-c (#90216)
  • [mlir] Fix -Wdeprecated-declarations of cast in VCIXToLLVMIRTranslation.cpp (NFC)
  • [mlir] Add sub-byte type emulation support for memref.collapse_shape (#89962)
  • [MC] Rename temporary symbols of empty name to ".L0 " (#89693)
  • [X86] Regenerate subreg-to-reg tests with update_llc_test_checks.py
  • [C++17] Support _GCC[CON|DE]STRUCTIVE_SIZE (#89446)
  • [AArch64][SVE2] SVE2 NBSL instruction lowering. (#89732)
  • [libc++][ranges] Exports operator|. (#90071)
  • [NFC] update comments from an earlier version of SuffixTree (#89800)
  • [scudo] Reflect the allowed values for M_DECAY_TIME on Android (#89114)
  • [DXIL] Fix build warning (#90226)
  • [OpenMP][AIX] Use syssmt() to get the number of SMTs per physical CPU (#89985)
  • [RISCV] Flatten the ImpliedExts table in RISCVISAInfo.cpp (#89975)
  • [LV] Add tests showing missed propgation of versiond stride values.
  • [mlir][sparse] fold sparse convert into producer linalg op. (#89999)
  • [Arm64EC] Improve alignment mangling in arm64ec thunks. (#90115)
  • [RISCV] Add an instruction PrettyPrinter to llvm-objdump (#90093)
  • [APINotes] Allow annotating a C++ type as non-copyable in Swift
  • [lldb] Switch to llvm::DWARFUnitHeader (#89808)
  • [SLP]Fix PR90224: check that users of gep are all vectorized.
  • [lldb] Fix typo in CumulativeSystemTimeIsValid check (#89680)
  • [Libomptarget] Rename libomptarget.rtl.x86_64 to libomptarget.rtl.host (#86868)
  • [RISCV] Consistently use uint32_t in Disassembler decode functions. NFC
  • [Driver,test] Replace CHECK-NOT: warning with -### -Werror
  • [HLSL][SPIR-V] Target directx is required
  • Revert "[mlir] Mark isa/dyn_cast/cast/... member functions deprecated. (#89998)" (#90250)
  • [RISCV] Fix off by 1 typo in decodeVMaskReg. NFC
  • Implement the DWARF 6 language and version attributes. (#89980)
  • [AMDGPU] Support byte_sel modifier on v_cvt_sr_fp8_f32 and v_cvt_sr_bf8_f32 (#90244)
  • [ci] Add clang project dependency for bolt testing (#90262)
  • [NFC] [HWASan] factor out debug record annotation (#90252)
  • [lldb][sbapi] Fix API break in SBDebugger broadcast bits (#90261)
  • [VPlan] Also propagate versioned strides to users via sext/zext.
  • [flang][cuda] Avoid to issue data transfer in device context (#90247)
  • [WebAssembly] Add half-precision feature (#90248)
  • [BOLT][NFC] Use getEHFrameHdrSectionName() (#90257)
  • [alpha.webkit.UncountedCallArgsChecker] Avoid emitting warnings for Ref, RefPtr, and their variants. (#90153)
  • [ASan][Test] Remove hardcoded linker version from test (#90147)
  • [AArch64] Add support for Cortex-R82AE and improve Cortex-R82

Full diff: https://github.com/llvm/llvm-project/pull/90440.diff

8 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+1)
  • (modified) clang/test/Misc/target-invalid-cpu-note.c (+2-2)
  • (modified) llvm/docs/ReleaseNotes.rst (+1-1)
  • (modified) llvm/include/llvm/TargetParser/AArch64TargetParser.h (+12-1)
  • (modified) llvm/lib/Target/AArch64/AArch64Processors.td (+14-1)
  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+1)
  • (modified) llvm/lib/TargetParser/Host.cpp (+1)
  • (modified) llvm/unittests/TargetParser/TargetParserTest.cpp (+15-2)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 5d4d152b2eb540..c92d480023f4d4 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -645,6 +645,7 @@ Arm and AArch64 Support
     * Arm Cortex-A78AE (cortex-a78ae).
     * Arm Cortex-A520AE (cortex-a520ae).
     * Arm Cortex-A720AE (cortex-a720ae).
+    * Arm Cortex-R82AE (cortex-r82ae).
     * Arm Neoverse-N3 (neoverse-n3).
     * Arm Neoverse-V3 (neoverse-v3).
     * Arm Neoverse-V3AE (neoverse-v3ae).
diff --git a/clang/test/Misc/target-invalid-cpu-note.c b/clang/test/Misc/target-invalid-cpu-note.c
index 21d80b7134508f..768b243b04e3a3 100644
--- a/clang/test/Misc/target-invalid-cpu-note.c
+++ b/clang/test/Misc/target-invalid-cpu-note.c
@@ -5,11 +5,11 @@
 
 // RUN: not %clang_cc1 -triple arm64--- -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix AARCH64
 // AARCH64: error: unknown target CPU 'not-a-cpu'
-// AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a520, cortex-a520ae, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a78, cortex-a78ae, cortex-a78c, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, cortex-r82, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-n3, neoverse-512tvb, neoverse-v1, neoverse-v2, neoverse-v3, neoverse-v3ae, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-a17, apple-m1, apple-m2, apple-m3, apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, carmel, ampere1, ampere1a, ampere1b, cobalt-100, grace{{$}}
+// AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a520, cortex-a520ae, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a78, cortex-a78ae, cortex-a78c, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, cortex-r82, cortex-r82ae, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-n3, neoverse-512tvb, neoverse-v1, neoverse-v2, neoverse-v3, neoverse-v3ae, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-a17, apple-m1, apple-m2, apple-m3, apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, carmel, ampere1, ampere1a, ampere1b, cobalt-100, grace{{$}}
 
 // RUN: not %clang_cc1 -triple arm64--- -tune-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix TUNE_AARCH64
 // TUNE_AARCH64: error: unknown target CPU 'not-a-cpu'
-// TUNE_AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a520, cortex-a520ae, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a78, cortex-a78ae, cortex-a78c, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, cortex-r82, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-n3, neoverse-512tvb, neoverse-v1, neoverse-v2, neoverse-v3, neoverse-v3ae, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-a17, apple-m1, apple-m2, apple-m3, apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, carmel, ampere1, ampere1a, ampere1b, cobalt-100, grace{{$}}
+// TUNE_AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a520, cortex-a520ae, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a78, cortex-a78ae, cortex-a78c, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, cortex-r82, cortex-r82ae, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-n3, neoverse-512tvb, neoverse-v1, neoverse-v2, neoverse-v3, neoverse-v3ae, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-a17, apple-m1, apple-m2, apple-m3, apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, carmel, ampere1, ampere1a, ampere1b, cobalt-100, grace{{$}}
 
 // RUN: not %clang_cc1 -triple i386--- -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix X86
 // X86: error: unknown target CPU 'not-a-cpu'
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index 64a69832521290..4c07abb744a238 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -70,7 +70,7 @@ Changes to the AArch64 Backend
 ------------------------------
 
 * Added support for Cortex-A78AE, Cortex-A520AE, Cortex-A720AE,
-  Neoverse-N3, Neoverse-V3 and Neoverse-V3AE CPUs.
+  Cortex-R82AE, Neoverse-N3, Neoverse-V3 and Neoverse-V3AE CPUs.
 
 Changes to the AMDGPU Backend
 -----------------------------
diff --git a/llvm/include/llvm/TargetParser/AArch64TargetParser.h b/llvm/include/llvm/TargetParser/AArch64TargetParser.h
index 0d1cfd152151aa..c3d033e1659139 100644
--- a/llvm/include/llvm/TargetParser/AArch64TargetParser.h
+++ b/llvm/include/llvm/TargetParser/AArch64TargetParser.h
@@ -632,7 +632,18 @@ inline constexpr CpuInfo CpuInfos[] = {
                                AArch64::AEK_PAUTH, AArch64::AEK_SVE2BITPERM,
                                AArch64::AEK_FLAGM, AArch64::AEK_PERFMON,
                                AArch64::AEK_PREDRES, AArch64::AEK_PROFILE})},
-    {"cortex-r82", ARMV8R, AArch64::ExtensionBitset({AArch64::AEK_LSE})},
+    {"cortex-r82", ARMV8R,
+     AArch64::ExtensionBitset({AArch64::AEK_CRC, AArch64::AEK_DOTPROD,
+                               AArch64::AEK_FLAGM, AArch64::AEK_FP,
+                               AArch64::AEK_RCPC, AArch64::AEK_PAUTH,
+                               AArch64::AEK_PERFMON, AArch64::AEK_RAS,
+                               AArch64::AEK_RDM, AArch64::AEK_PREDRES})},
+    {"cortex-r82ae", ARMV8R,
+     AArch64::ExtensionBitset({AArch64::AEK_CRC, AArch64::AEK_DOTPROD,
+                               AArch64::AEK_FLAGM, AArch64::AEK_FP,
+                               AArch64::AEK_RCPC, AArch64::AEK_PAUTH,
+                               AArch64::AEK_PERFMON, AArch64::AEK_RAS,
+                               AArch64::AEK_RDM, AArch64::AEK_PREDRES})},
     {"cortex-x1", ARMV8_2A,
      AArch64::ExtensionBitset({AArch64::AEK_AES, AArch64::AEK_SHA2,
                                AArch64::AEK_FP16, AArch64::AEK_DOTPROD,
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index 8772e51bf0ab42..f2286ae17dba56 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -194,6 +194,11 @@ def TuneR82 : SubtargetFeature<"cortex-r82", "ARMProcFamily",
                                "Cortex-R82 ARM processors", [
                                FeaturePostRAScheduler]>;
 
+def TuneR82AE : SubtargetFeature<"cortex-r82ae", "ARMProcFamily",
+                                 "CortexR82AE",
+                                 "Cortex-R82-AE ARM processors",
+                                 [FeaturePostRAScheduler]>;
+
 def TuneX1 : SubtargetFeature<"cortex-x1", "ARMProcFamily", "CortexX1",
                                   "Cortex-X1 ARM processors", [
                                   FeatureCmpBccFusion,
@@ -667,7 +672,13 @@ def ProcessorFeatures {
   list<SubtargetFeature> R82  = [HasV8_0rOps, FeaturePerfMon, FeatureFullFP16,
                                  FeatureFP16FML, FeatureSSBS, FeaturePredRes,
                                  FeatureSB, FeatureRDM, FeatureDotProd,
-                                 FeatureComplxNum, FeatureJS];
+                                 FeatureComplxNum, FeatureJS,
+                                 FeatureCacheDeepPersist];
+  list<SubtargetFeature> R82AE = [HasV8_0rOps, FeaturePerfMon, FeatureFullFP16,
+                                  FeatureFP16FML, FeatureSSBS, FeaturePredRes,
+                                  FeatureSB, FeatureRDM, FeatureDotProd,
+                                  FeatureComplxNum, FeatureJS,
+                                  FeatureCacheDeepPersist];
   list<SubtargetFeature> X1   = [HasV8_2aOps, FeatureCrypto, FeatureFPARMv8,
                                  FeatureNEON, FeatureRCPC, FeaturePerfMon,
                                  FeatureSPE, FeatureFullFP16, FeatureDotProd,
@@ -854,6 +865,8 @@ def : ProcessorModel<"cortex-a720ae", NeoverseN2Model, ProcessorFeatures.A720AE,
                      [TuneA720AE]>;
 def : ProcessorModel<"cortex-r82", CortexA55Model, ProcessorFeatures.R82,
                      [TuneR82]>;
+def : ProcessorModel<"cortex-r82ae", CortexA55Model, ProcessorFeatures.R82AE,
+                     [TuneR82AE]>;
 def : ProcessorModel<"cortex-x1", CortexA57Model, ProcessorFeatures.X1,
                      [TuneX1]>;
 def : ProcessorModel<"cortex-x1c", CortexA57Model, ProcessorFeatures.X1C,
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index 7d34dd1c776878..747d82639a9b4f 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -143,6 +143,7 @@ void AArch64Subtarget::initializeProperties(bool HasMinSize) {
   case CortexA78AE:
   case CortexA78C:
   case CortexR82:
+  case CortexR82AE:
   case CortexX1:
     PrefFunctionAlignment = Align(16);
     PrefLoopAlignment = Align(32);
diff --git a/llvm/lib/TargetParser/Host.cpp b/llvm/lib/TargetParser/Host.cpp
index 0a93b06f40c248..8823ae370ed2b0 100644
--- a/llvm/lib/TargetParser/Host.cpp
+++ b/llvm/lib/TargetParser/Host.cpp
@@ -214,6 +214,7 @@ StringRef sys::detail::getHostCPUNameForARM(StringRef ProcCpuinfoContent) {
         .Case("0xc18", "cortex-r8")
         .Case("0xd13", "cortex-r52")
         .Case("0xd15", "cortex-r82")
+        .Case("0xd14", "cortex-r82ae")
         .Case("0xd02", "cortex-a34")
         .Case("0xd04", "cortex-a35")
         .Case("0xd03", "cortex-a53")
diff --git a/llvm/unittests/TargetParser/TargetParserTest.cpp b/llvm/unittests/TargetParser/TargetParserTest.cpp
index 75e235008b4f25..816aea44a9bc51 100644
--- a/llvm/unittests/TargetParser/TargetParserTest.cpp
+++ b/llvm/unittests/TargetParser/TargetParserTest.cpp
@@ -1390,7 +1390,20 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_DOTPROD, AArch64::AEK_FP, AArch64::AEK_SIMD,
                  AArch64::AEK_FP16, AArch64::AEK_FP16FML, AArch64::AEK_RAS,
                  AArch64::AEK_RCPC, AArch64::AEK_LSE, AArch64::AEK_SB,
-                 AArch64::AEK_JSCVT, AArch64::AEK_FCMA, AArch64::AEK_PAUTH}),
+                 AArch64::AEK_JSCVT, AArch64::AEK_FCMA, AArch64::AEK_PAUTH,
+                 AArch64::AEK_FLAGM, AArch64::AEK_PERFMON,
+                 AArch64::AEK_PREDRES}),
+            "8-R"),
+        ARMCPUTestParams<AArch64::ExtensionBitset>(
+            "cortex-r82ae", "armv8-r", "crypto-neon-fp-armv8",
+            AArch64::ExtensionBitset(
+                {AArch64::AEK_CRC, AArch64::AEK_RDM, AArch64::AEK_SSBS,
+                 AArch64::AEK_DOTPROD, AArch64::AEK_FP, AArch64::AEK_SIMD,
+                 AArch64::AEK_FP16, AArch64::AEK_FP16FML, AArch64::AEK_RAS,
+                 AArch64::AEK_RCPC, AArch64::AEK_LSE, AArch64::AEK_SB,
+                 AArch64::AEK_JSCVT, AArch64::AEK_FCMA, AArch64::AEK_PAUTH,
+                 AArch64::AEK_FLAGM, AArch64::AEK_PERFMON,
+                 AArch64::AEK_PREDRES}),
             "8-R"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "cortex-x1", "armv8.2-a", "crypto-neon-fp-armv8",
@@ -1806,7 +1819,7 @@ INSTANTIATE_TEST_SUITE_P(
     ARMCPUTestParams<AArch64::ExtensionBitset>::PrintToStringParamName);
 
 // Note: number of CPUs includes aliases.
-static constexpr unsigned NumAArch64CPUArchs = 75;
+static constexpr unsigned NumAArch64CPUArchs = 76;
 
 TEST(TargetParserTest, testAArch64CPUArchList) {
   SmallVector<StringRef, NumAArch64CPUArchs> List;

@jthackray jthackray reopened this Apr 29, 2024
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM

@jthackray jthackray merged commit e50a857 into llvm:main Apr 30, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants