-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GlobalOpt] Cache whether CC is changeable #71381
Conversation
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic complexity if there are many cold calls to a function: We're going to visit each use of the function, and then for each of them iterate all the users. We've recently encounted a case where GlobalOpt spends more than an hour in these hasAddressTaken() checks when full LTO is used. Avoid this by moving the hasAddressTaken() check into hasChangeableCC() and caching its result, so it is only computed one per function.
@llvm/pr-subscribers-llvm-transforms Author: Nikita Popov (nikic) ChangesThe hasAddressTaken() call in hasOnlyColdCalls() has quadratic complexity if there are many cold calls to a function: We're going to visit each call of the function, and then for each of them iterate all the users of the function. We've recently encountered a case where GlobalOpt spends more than an hour in these hasAddressTaken() checks when full LTO is used. Avoid this by moving the hasAddressTaken() check into hasChangeableCC() and caching its result, so it is only computed once per function. Full diff: https://github.com/llvm/llvm-project/pull/71381.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/IPO/GlobalOpt.cpp b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
index 56a52d13c20d0cc..755ecf9d5cb713b 100644
--- a/llvm/lib/Transforms/IPO/GlobalOpt.cpp
+++ b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
@@ -1680,13 +1680,16 @@ static void RemoveAttribute(Function *F, Attribute::AttrKind A) {
/// idea here is that we don't want to mess with the convention if the user
/// explicitly requested something with performance implications like coldcc,
/// GHC, or anyregcc.
-static bool hasChangeableCC(Function *F) {
+static bool hasChangeableCCImpl(Function *F) {
CallingConv::ID CC = F->getCallingConv();
// FIXME: Is it worth transforming x86_stdcallcc and x86_fastcallcc?
if (CC != CallingConv::C && CC != CallingConv::X86_ThisCall)
return false;
+ if (F->isVarArg())
+ return false;
+
// FIXME: Change CC for the whole chain of musttail calls when possible.
//
// Can't change CC of the function that either has musttail calls, or is a
@@ -1706,7 +1709,16 @@ static bool hasChangeableCC(Function *F) {
if (BB.getTerminatingMustTailCall())
return false;
- return true;
+ return !F->hasAddressTaken();
+}
+
+using ChangeableCCCacheTy = SmallDenseMap<Function *, bool, 8>;
+static bool hasChangeableCC(Function *F,
+ ChangeableCCCacheTy &ChangeableCCCache) {
+ auto Res = ChangeableCCCache.try_emplace(F, false);
+ if (Res.second)
+ Res.first->second = hasChangeableCCImpl(F);
+ return Res.first->second;
}
/// Return true if the block containing the call site has a BlockFrequency of
@@ -1760,7 +1772,8 @@ static void changeCallSitesToColdCC(Function *F) {
// coldcc calling convention.
static bool
hasOnlyColdCalls(Function &F,
- function_ref<BlockFrequencyInfo &(Function &)> GetBFI) {
+ function_ref<BlockFrequencyInfo &(Function &)> GetBFI,
+ ChangeableCCCacheTy &ChangeableCCCache) {
for (BasicBlock &BB : F) {
for (Instruction &I : BB) {
if (CallInst *CI = dyn_cast<CallInst>(&I)) {
@@ -1779,8 +1792,7 @@ hasOnlyColdCalls(Function &F,
if (!CalledFn->hasLocalLinkage())
return false;
// Check if it's valid to use coldcc calling convention.
- if (!hasChangeableCC(CalledFn) || CalledFn->isVarArg() ||
- CalledFn->hasAddressTaken())
+ if (!hasChangeableCC(CalledFn, ChangeableCCCache))
return false;
BlockFrequencyInfo &CallerBFI = GetBFI(F);
if (!isColdCallSite(*CI, CallerBFI))
@@ -1905,9 +1917,10 @@ OptimizeFunctions(Module &M,
bool Changed = false;
+ ChangeableCCCacheTy ChangeableCCCache;
std::vector<Function *> AllCallsCold;
for (Function &F : llvm::make_early_inc_range(M))
- if (hasOnlyColdCalls(F, GetBFI))
+ if (hasOnlyColdCalls(F, GetBFI, ChangeableCCCache))
AllCallsCold.push_back(&F);
// Optimize functions.
@@ -1969,7 +1982,7 @@ OptimizeFunctions(Module &M,
continue;
}
- if (hasChangeableCC(&F) && !F.isVarArg() && !F.hasAddressTaken()) {
+ if (hasChangeableCC(&F, ChangeableCCCache)) {
NumInternalFunc++;
TargetTransformInfo &TTI = GetTTI(F);
// Change the calling convention to coldcc if either stress testing is
@@ -1979,6 +1992,7 @@ OptimizeFunctions(Module &M,
if (EnableColdCCStressTest ||
(TTI.useColdCCForColdCall(F) &&
isValidCandidateForColdCC(F, GetBFI, AllCallsCold))) {
+ ChangeableCCCache.erase(&F);
F.setCallingConv(CallingConv::Cold);
changeCallSitesToColdCC(&F);
Changed = true;
@@ -1986,7 +2000,7 @@ OptimizeFunctions(Module &M,
}
}
- if (hasChangeableCC(&F) && !F.isVarArg() && !F.hasAddressTaken()) {
+ if (hasChangeableCC(&F, ChangeableCCCache)) {
// If this function has a calling convention worth changing, is not a
// varargs function, and is only called directly, promote it to use the
// Fast calling convention.
|
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic complexity if there are many cold calls to a function: We're going to visit each call of the function, and then for each of them iterate all the users of the function. We've recently encountered a case where GlobalOpt spends more than an hour in these hasAddressTaken() checks when full LTO is used. Avoid this by moving the hasAddressTaken() check into hasChangeableCC() and caching its result, so it is only computed once per function. (cherry picked from commit e360a16)
Adapted to rebase on apple/stable/20200714
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic complexity if there are many cold calls to a function: We're going to visit each call of the function, and then for each of them iterate all the users of the function. We've recently encountered a case where GlobalOpt spends more than an hour in these hasAddressTaken() checks when full LTO is used. Avoid this by moving the hasAddressTaken() check into hasChangeableCC() and caching its result, so it is only computed once per function.
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic complexity if there are many cold calls to a function: We're going to visit each call of the function, and then for each of them iterate all the users of the function. We've recently encountered a case where GlobalOpt spends more than an hour in these hasAddressTaken() checks when full LTO is used. Avoid this by moving the hasAddressTaken() check into hasChangeableCC() and caching its result, so it is only computed once per function. (cherry picked from commit e360a16) Adapted to LLVM11 by Ming-Chuan Lin <mingcl@google.com>
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic complexity if there are many cold calls to a function: We're going to visit each call of the function, and then for each of them iterate all the users of the function. We've recently encountered a case where GlobalOpt spends more than an hour in these hasAddressTaken() checks when full LTO is used. Avoid this by moving the hasAddressTaken() check into hasChangeableCC() and caching its result, so it is only computed once per function. (cherry picked from commit e360a16)
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic complexity if there are many cold calls to a function: We're going to visit each call of the function, and then for each of them iterate all the users of the function. We've recently encountered a case where GlobalOpt spends more than an hour in these hasAddressTaken() checks when full LTO is used. Avoid this by moving the hasAddressTaken() check into hasChangeableCC() and caching its result, so it is only computed once per function. (cherry picked from commit e360a16)
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic complexity if there are many cold calls to a function: We're going to visit each call of the function, and then for each of them iterate all the users of the function.
We've recently encountered a case where GlobalOpt spends more than an hour in these hasAddressTaken() checks when full LTO is used.
Avoid this by moving the hasAddressTaken() check into hasChangeableCC() and caching its result, so it is only computed once per function.