Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Clang] Bring initFeatureMap back to AArch64TargetInfo. #96832

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

weiweichen
Copy link
Contributor

We noticed that TargetInfo::CreateTargetInfo is giving up back empty opt->featureMap for AArch64, which is probably related to this change.

Trying to bring the info back (if possible).

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jun 26, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 26, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-aarch64

Author: weiwei chen (weiweichen)

Changes

We noticed that TargetInfo::CreateTargetInfo is giving up back empty opt->featureMap for AArch64, which is probably related to this change.

Trying to bring the info back (if possible).


Full diff: https://github.com/llvm/llvm-project/pull/96832.diff

2 Files Affected:

  • (modified) clang/lib/Basic/Targets/AArch64.cpp (+18)
  • (modified) clang/lib/Basic/Targets/AArch64.h (+5)
diff --git a/clang/lib/Basic/Targets/AArch64.cpp b/clang/lib/Basic/Targets/AArch64.cpp
index 2692ddec26ff4..efc5c7e6e931a 100644
--- a/clang/lib/Basic/Targets/AArch64.cpp
+++ b/clang/lib/Basic/Targets/AArch64.cpp
@@ -1170,6 +1170,24 @@ ParsedTargetAttr AArch64TargetInfo::parseTargetAttr(StringRef Features) const {
   return Ret;
 }
 
+bool AArch64TargetInfo::initFeatureMap(
+     llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef CPU,
+     const std::vector<std::string> &FeaturesVec) const {
+   std::vector<std::string> UpdatedFeaturesVec;
+   // Parse the CPU and add any implied features.
+   std::optional<llvm::AArch64::CpuInfo> CpuInfo = llvm::AArch64::parseCpu(CPU);
+   if (CpuInfo) {
+     auto Exts = CpuInfo->getImpliedExtensions();
+     std::vector<StringRef> CPUFeats;
+     llvm::AArch64::getExtensionFeatures(Exts, CPUFeats);
+     for (auto F : CPUFeats) {
+       assert((F[0] == '+' || F[0] == '-') && "Expected +/- in target feature!");
+       UpdatedFeaturesVec.push_back(F.str());
+     }
+   }
+   return TargetInfo::initFeatureMap(Features, Diags, CPU, UpdatedFeaturesVec);
+}
+
 bool AArch64TargetInfo::hasBFloat16Type() const {
   return true;
 }
diff --git a/clang/lib/Basic/Targets/AArch64.h b/clang/lib/Basic/Targets/AArch64.h
index 71510fe289510..1af80db0de05c 100644
--- a/clang/lib/Basic/Targets/AArch64.h
+++ b/clang/lib/Basic/Targets/AArch64.h
@@ -107,6 +107,11 @@ class LLVM_LIBRARY_VISIBILITY AArch64TargetInfo : public TargetInfo {
   unsigned multiVersionSortPriority(StringRef Name) const override;
   unsigned multiVersionFeatureCost() const override;
 
+  bool
+  initFeatureMap(llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags,
+                  StringRef CPU,
+                  const std::vector<std::string> &FeaturesVec) const override;
+
   bool useFP16ConversionIntrinsics() const override {
     return false;
   }

@weiweichen weiweichen force-pushed the weiweic/aarch64-init-feature-map branch from 6f266f1 to e04415c Compare June 26, 2024 23:38
@weiweichen
Copy link
Contributor Author

@willghatch 👀

@weiweichen
Copy link
Contributor Author

Not sure if this is the right fix, so any clang experts here, please take a look to see if this is acceptable?

@willghatch
Copy link
Contributor

@labrinea please take a look. The PR mentioned above removed this code and broke AArch64 feature detection for us.

@davemgreen
Copy link
Collaborator

Could you explain more about what broke? Are you using target(..) attributes?

@tmatheson-arm
Copy link
Contributor

And please add a test to cover whatever broke.

@weiweichen
Copy link
Contributor Author

Could you explain more about what broke? Are you using target(..) attributes?

TargetInfo::CreateTargetInfo is giving up back empty opt->featureMap for AArch64. But before the PR mentioned, we can get a list of features. We can also get a list of feature for other targets like X86, etc. Is there a different way to get them back for AArch64 if this change is not desirable?

@tmatheson-arm
Copy link
Contributor

The only thing AArch64TargetInfo::initFeatureMap adds is features from the selected CPU. IMHO this was not an appropriate place to be doing that. Since #94279 the CPU features are added either by AArch64TargetInfo::parseTargetAttr when dealing with __attribute(target(...)), and by getAArch64ArchFeaturesFromMcpu if you are in the clang driver dealing with -mcpu. Basically anywhere you see AArch64::ExtensionSet::addCPUDefaults used. This approach properly handles feature dependencies, which the old AArch64TargetInfo::initFeatureMap approach did not.

What inputs are you giving when you create the TargetInfo? What features do you expect to see that are missing? A concrete example would help to understand.

@weiweichen
Copy link
Contributor Author

weiweichen commented Jun 27, 2024

The only thing AArch64TargetInfo::initFeatureMap adds is features from the selected CPU. IMHO this was not an appropriate place to be doing that. Since #94279 the CPU features are added either by AArch64TargetInfo::parseTargetAttr when dealing with __attribute(target(...)), and by getAArch64ArchFeaturesFromMcpu if you are in the clang driver dealing with -mcpu. Basically anywhere you see AArch64::ExtensionSet::addCPUDefaults used. This approach properly handles feature dependencies, which the old AArch64TargetInfo::initFeatureMap approach did not.

What inputs are you giving when you create the TargetInfo? What features do you expect to see that are missing? A concrete example would help to understand.

We are using llvm::sys::getDefaultTargetTriple() and llvm::sys::getHostCPUName() to get TargetInfo, and was getting 👇 features before the aarch64 PR landed. (We still get features correctly for other backends).

+aes +bf16 +complxnum +crc +dotprod +fp-armv8 +fp16fml +fullfp16 +i8mm +jsconv +lse +neon +pauth +rand +ras +rcpc +rdm +sha2 +sha3 +sm4 +spe +ssbs +sve

Is this still possible to get for AArch64? Is there a different API we can use instead?

@tmatheson-arm
Copy link
Contributor

I think a test demonstrating the problem would be the fastest way forward.

@weiweichen
Copy link
Contributor Author

Here is a self contained simple repro test

cpp file:

#include "clang/Basic/Diagnostic.h"
#include "clang/Basic/DiagnosticIDs.h"
#include "clang/Basic/DiagnosticOptions.h"
#include "clang/Basic/TargetInfo.h"
#include "llvm/ADT/SmallString.h"
#include "llvm/TargetParser/Host.h"

using namespace llvm;

// Intercept diagnostics from Clang and then bundle them up in an `Error` if
// something bad happens.
struct DiagInterceptor : public clang::DiagnosticConsumer {
  void HandleDiagnostic(clang::DiagnosticsEngine::Level level,
                        const clang::Diagnostic &info) override {
    if (level >= clang::DiagnosticsEngine::Level::Error) {
      // Keep the last message.
      msg.clear();
      info.FormatDiagnostic(msg);
    }
  }

  SmallString<64> msg;
};

int main() {
      // Instantiate the Clang diagnostic engine. Pass in our interceptor.
  clang::IntrusiveRefCntPtr<clang::DiagnosticIDs> ids(
      new clang::DiagnosticIDs());
  clang::IntrusiveRefCntPtr<clang::DiagnosticOptions> diagOpts(
      new clang::DiagnosticOptions());
  DiagInterceptor interceptor;
  clang::DiagnosticsEngine diags(std::move(ids), std::move(diagOpts),
                                 &interceptor, /*ShouldOwnClient=*/false);

  std::string triple = llvm::sys::getDefaultTargetTriple();
  std::string cpu(llvm::sys::getHostCPUName());

  auto opts = std::make_shared<clang::TargetOptions>();
  opts->Triple = triple;
  opts->CPU = cpu;

  // Ask Clang to create the target info for the architecture and CPU. This will
  // populate `opts` with the full target triple and feature set.
  auto targetInfo = std::unique_ptr<clang::TargetInfo>(
      clang::TargetInfo::CreateTargetInfo(diags, opts));

  llvm::dbgs() << cpu << "\n";
  llvm::dbgs() << triple << "\n";
  
  for (StringRef feature : opts->Features) {
    llvm::dbgs() << feature << "\n";
  }

  return 0;
}

CMakeLists.txt

cmake_minimum_required(VERSION 3.20.0)
project(TestClangAPI)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED YES)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-exceptions -fno-rtti")

find_package(LLVM REQUIRED CONFIG)
find_package(Clang REQUIRED CONFIG)

message(STATUS "Found LLVM ${LLVM_PACKAGE_VERSION}")
message(STATUS "Using LLVMConfig.cmake in: ${LLVM_DIR}")

# Set your project compile flags.
# E.g. if using the C++ header files
# you will need to enable C++11 support
# for your compiler.

include_directories(${LLVM_INCLUDE_DIRS})
include_directories(${CLANG_INCLUDE_DIRS})
message(STATUS "llvm include dires: ${LLVM_INCLUDE_DIRS}")
message(STATUS "clang include dires: ${CLANG_INCLUDE_DIRS}")
separate_arguments(LLVM_DEFINITIONS_LIST NATIVE_COMMAND ${LLVM_DEFINITIONS})
add_definitions(${LLVM_DEFINITIONS_LIST})

# Now build our tools
add_executable(test test.cpp)


# Link against LLVM libraries
target_link_libraries(test LLVMTargetParser LLVMTarget LLVMSupport clangBasic)

To build the test (given that LLVM and Clang can be found correctly):

mkdir -p build
cd build
cmake -G Ninja .. --fresh
ninja test
./test

Before this change, we get (non empty features):

neoverse-v1
aarch64-unknown-linux-gnu
+aes
+bf16
+complxnum
+crc
+dotprod
+fp-armv8
+fp16fml
+fullfp16
+i8mm
+jsconv
+lse
+neon
+pauth
+rand
+ras
+rcpc
+rdm
+sha2
+sha3
+sm4
+spe
+ssbs
+sve

With current top of upstream, we get (empty features):

neoverse-v1
aarch64-unknown-linux-gnu

And the same test can be run an x86 machine, and we get (non empty features with top of upstream):

skylake-avx512
x86_64-unknown-linux-gnu
+adx
+aes
+avx
+avx2
+avx512bw
+avx512cd
+avx512dq
+avx512f
+avx512vl
+bmi
+bmi2
+clflushopt
+clwb
+cmov
+crc32
+cx16
+cx8
+evex512
+f16c
+fma
+fsgsbase
+fxsr
+invpcid
+lzcnt
+mmx
+movbe
+pclmul
+pku
+popcnt
+prfchw
+rdrnd
+rdseed
+sahf
+sse
+sse2
+sse3
+sse4.1
+sse4.2
+ssse3
+x87
+xsave
+xsavec
+xsaveopt

This looks like the upstream change, while working good for Clang internally, it has broken the public clang API clang::TargetInfo::CreateTargetInfo which makes it inconsistent between different backends 😢 .

As a downstream consumer of upstream, we are broken due to the change. However, we are able to workaround it by adding a wrapper to pull the old logic back for AArch64 backend on our side, but it probably would be much better if upstream can be fixed for AArch64.

Please let me know if I can provide more info on reproduce the issue.

@tmatheson-arm
Copy link
Contributor

Thank you for the example, I understand what is happening how.

  • Before [AArch64] Decouple feature dependency expansion. #94279, we used to add CPU features in AArch64::initFeatureMap.
  • In [AArch64] Decouple feature dependency expansion. #94279, we decided that actually you should do that in the Driver, which should put all -target-features it wants on the -cc1 command line. The Driver is responsible for expanding the CPU (and architecture) feature dependencies and their interaction with any modifiers on the command line.
  • This means that when initFeatureMap runs in clang, FeaturesAsWritten is populated with the CPU features and is used to initialise the FeatureMap.
  • In contrast, you are not using the Driver, and do not populate FeaturesAsWritten with the CPU features.
  • Instead, you expect initFeatureMap to add CPU features. This is not unreasonable, given that the CPU is passed the function and several other backends add CPU features at this stage.

This bit of code used to crudely add the CPU features to the end of the feature list. However there are some problems with that approach, which we attempted to rectify in #94279:

  • CPU features that were explicitly disabled on the command line could actually end up enabled in the backend
  • The architecture features (i.e. implied by -march) were not treated the same way as the CPU features (-mcpu)

For example, if you wrote: clang -mcpu=cortex-a75+norcpc -###, you would see all the Cortex-A75 features expanded on the -cc1 command line, but with RCPC disabled: -target-feature -rcpc. But in this case, AArch64::initFeatureMap would have re-added +rcpc, overriding the command line. (This is technically not the case after this line was added, but the general point is that initFeatureMap broke feature dependency resolution in ways that are difficult to reason about).

There doesn't seem to be a way to specify an architecture in TargetOptions, which looks odd to me. That means there is no way to select e.g. armv9.4-a in your example, except by manually adding the features in TargetOptions::Features or TargetOptions::FeaturesAsWritten.

So the way that we set up the AArch64 backend in #94279 is to require you to calculate your feature set up front, which are then trivially passed through by the default TargetInfo::initFeatureMap.

I'm not sure there is a clear answer on this one. I can't see a way to easily let AArch64:: initFeatureMap add CPU features again without breaking the dependency resolution. I am open to suggestions though.

If you wanted to go the route of building the feature list before calling initFeatureMap, the functions tools::getTargetFeatures and aarch64::getAArch64TargetFeatures can do that for you. Currently they require a const Driver &D, but fundamentally I think they just need a DiagnosticsEngine& so that could be changed.

I'm open to other suggestions too.

@weiweichen
Copy link
Contributor Author

Thank you for the example, I understand what is happening how.

  • Before [AArch64] Decouple feature dependency expansion. #94279, we used to add CPU features in AArch64::initFeatureMap.
  • In [AArch64] Decouple feature dependency expansion. #94279, we decided that actually you should do that in the Driver, which should put all -target-features it wants on the -cc1 command line. The Driver is responsible for expanding the CPU (and architecture) feature dependencies and their interaction with any modifiers on the command line.
  • This means that when initFeatureMap runs in clang, FeaturesAsWritten is populated with the CPU features and is used to initialise the FeatureMap.
  • In contrast, you are not using the Driver, and do not populate FeaturesAsWritten with the CPU features.
  • Instead, you expect initFeatureMap to add CPU features. This is not unreasonable, given that the CPU is passed the function and several other backends add CPU features at this stage.

This bit of code used to crudely add the CPU features to the end of the feature list. However there are some problems with that approach, which we attempted to rectify in #94279:

  • CPU features that were explicitly disabled on the command line could actually end up enabled in the backend
  • The architecture features (i.e. implied by -march) were not treated the same way as the CPU features (-mcpu)

For example, if you wrote: clang -mcpu=cortex-a75+norcpc -###, you would see all the Cortex-A75 features expanded on the -cc1 command line, but with RCPC disabled: -target-feature -rcpc. But in this case, AArch64::initFeatureMap would have re-added +rcpc, overriding the command line. (This is technically not the case after this line was added, but the general point is that initFeatureMap broke feature dependency resolution in ways that are difficult to reason about).

There doesn't seem to be a way to specify an architecture in TargetOptions, which looks odd to me. That means there is no way to select e.g. armv9.4-a in your example, except by manually adding the features in TargetOptions::Features or TargetOptions::FeaturesAsWritten.

So the way that we set up the AArch64 backend in #94279 is to require you to calculate your feature set up front, which are then trivially passed through by the default TargetInfo::initFeatureMap.

I'm not sure there is a clear answer on this one. I can't see a way to easily let AArch64:: initFeatureMap add CPU features again without breaking the dependency resolution. I am open to suggestions though.

If you wanted to go the route of building the feature list before calling initFeatureMap, the functions tools::getTargetFeatures and aarch64::getAArch64TargetFeatures can do that for you. Currently they require a const Driver &D, but fundamentally I think they just need a DiagnosticsEngine& so that could be changed.

I'm open to other suggestions too.

Oh, thank you so much for the explanation and suggestions on the APIs to try! We'll play with the APIs to see if we can get a cleaner workaround on our side. On the other hand, as for the clang API's perspective, maybe it's ok that different backends have different behavior in terms how they retrieve target features or is there room to improve for consistency as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants