-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ctx_prof] Add Inlining support #106154
[ctx_prof] Add Inlining support #106154
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
36e3144
to
176f656
Compare
bf293b4
to
5733f3b
Compare
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-pgo Author: Mircea Trofin (mtrofin) ChangesPatch is 21.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/106154.diff 9 Files Affected:
diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 10aef6f6067b6f..7b18f88f2ec955 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -63,6 +63,16 @@ class PGOContextualProfile {
return getDefinedFunctionGUID(F) != 0;
}
+ uint32_t getNrCounters(const Function &F) const {
+ assert(isFunctionKnown(F));
+ return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCounterIndex;
+ }
+
+ uint32_t getNrCallsites(const Function &F) const {
+ assert(isFunctionKnown(F));
+ return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex;
+ }
+
uint32_t allocateNextCounterIndex(const Function &F) {
assert(isFunctionKnown(F));
return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCounterIndex++;
@@ -113,9 +123,7 @@ class CtxProfAnalysisPrinterPass
: public PassInfoMixin<CtxProfAnalysisPrinterPass> {
public:
enum class PrintMode { Everything, JSON };
- explicit CtxProfAnalysisPrinterPass(raw_ostream &OS,
- PrintMode Mode = PrintMode::Everything)
- : OS(OS), Mode(Mode) {}
+ explicit CtxProfAnalysisPrinterPass(raw_ostream &OS);
PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
static bool isRequired() { return true; }
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h
index 71a96e0671c2f1..fc8d1b3d1947e3 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -1516,6 +1516,8 @@ class InstrProfInstBase : public IntrinsicInst {
return const_cast<Value *>(getArgOperand(0))->stripPointerCasts();
}
+ void setNameValue(Value *V) { setArgOperand(0, V); }
+
// The hash of the CFG for the instrumented function.
ConstantInt *getHash() const {
return cast<ConstantInt>(const_cast<Value *>(getArgOperand(1)));
diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
index f7f88966f7573f..c64e6e79f96c4c 100644
--- a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
+++ b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
@@ -74,6 +74,11 @@ class PGOCtxProfContext final {
Iter->second.emplace(Other.guid(), std::move(Other));
}
+ void ingestAllContexts(uint32_t CSId, CallTargetMapTy &&Other) {
+ auto [_, Inserted] = callsites().try_emplace(CSId, std::move(Other));
+ assert(Inserted);
+ }
+
void resizeCounters(uint32_t Size) { Counters.resize(Size); }
bool hasCallsite(uint32_t I) const {
diff --git a/llvm/include/llvm/Transforms/Utils/Cloning.h b/llvm/include/llvm/Transforms/Utils/Cloning.h
index 6226062dd713f6..2ddcfeb1501e28 100644
--- a/llvm/include/llvm/Transforms/Utils/Cloning.h
+++ b/llvm/include/llvm/Transforms/Utils/Cloning.h
@@ -20,6 +20,7 @@
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AssumptionCache.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
#include "llvm/Analysis/InlineCost.h"
#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/ValueHandle.h"
@@ -270,6 +271,17 @@ InlineResult InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
bool InsertLifetime = true,
Function *ForwardVarArgsTo = nullptr);
+/// Same as above, but it will update the contextual profile. If the contextual
+/// profile is invalid (i.e. not loaded because it is not present), it defaults
+/// to the behavior of the non-contextual profile updating variant above. This
+/// makes it easy to drop-in replace uses of the non-contextual overload.
+InlineResult InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
+ CtxProfAnalysis::Result &CtxProf,
+ bool MergeAttributes = false,
+ AAResults *CalleeAAR = nullptr,
+ bool InsertLifetime = true,
+ Function *ForwardVarArgsTo = nullptr);
+
/// Clones a loop \p OrigLoop. Returns the loop and the blocks in \p
/// Blocks.
///
diff --git a/llvm/lib/Analysis/CtxProfAnalysis.cpp b/llvm/lib/Analysis/CtxProfAnalysis.cpp
index 2cd3f2114397e5..4a66eb1dd0a31a 100644
--- a/llvm/lib/Analysis/CtxProfAnalysis.cpp
+++ b/llvm/lib/Analysis/CtxProfAnalysis.cpp
@@ -29,6 +29,15 @@ cl::opt<std::string>
UseCtxProfile("use-ctx-profile", cl::init(""), cl::Hidden,
cl::desc("Use the specified contextual profile file"));
+static cl::opt<CtxProfAnalysisPrinterPass::PrintMode> PrintLevel(
+ "ctx-profile-printer-level",
+ cl::init(CtxProfAnalysisPrinterPass::PrintMode::Everything), cl::Hidden,
+ cl::values(clEnumValN(CtxProfAnalysisPrinterPass::PrintMode::Everything,
+ "everything", "print everything - most verbose"),
+ clEnumValN(CtxProfAnalysisPrinterPass::PrintMode::JSON, "json",
+ "just the json representation of the profile")),
+ cl::desc("Verbosity level of the contextual profile printer pass."));
+
namespace llvm {
namespace json {
Value toJSON(const PGOCtxProfContext &P) {
@@ -150,7 +159,6 @@ PGOContextualProfile CtxProfAnalysis::run(Module &M,
// If we made it this far, the Result is valid - which we mark by setting
// .Profiles.
// Trim first the roots that aren't in this module.
- DenseSet<GlobalValue::GUID> ProfiledGUIDs;
for (auto &[RootGuid, _] : llvm::make_early_inc_range(*MaybeCtx))
if (!Result.FuncInfo.contains(RootGuid))
MaybeCtx->erase(RootGuid);
@@ -165,6 +173,10 @@ PGOContextualProfile::getDefinedFunctionGUID(const Function &F) const {
return 0;
}
+CtxProfAnalysisPrinterPass::CtxProfAnalysisPrinterPass(raw_ostream &OS)
+ : OS(OS),
+ Mode(PrintLevel.getNumOccurrences() > 0 ? PrintLevel : PrintMode::JSON) {}
+
PreservedAnalyses CtxProfAnalysisPrinterPass::run(Module &M,
ModuleAnalysisManager &MAM) {
CtxProfAnalysis::Result &C = MAM.getResult<CtxProfAnalysis>(M);
diff --git a/llvm/lib/Transforms/IPO/ModuleInliner.cpp b/llvm/lib/Transforms/IPO/ModuleInliner.cpp
index 5e91ab80d7505f..b7e4531c8e390d 100644
--- a/llvm/lib/Transforms/IPO/ModuleInliner.cpp
+++ b/llvm/lib/Transforms/IPO/ModuleInliner.cpp
@@ -20,6 +20,7 @@
#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
#include "llvm/Analysis/InlineAdvisor.h"
#include "llvm/Analysis/InlineCost.h"
#include "llvm/Analysis/InlineOrder.h"
@@ -113,6 +114,8 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
return PreservedAnalyses::all();
}
+ auto &CtxProf = MAM.getResult<CtxProfAnalysis>(M);
+
bool Changed = false;
ProfileSummaryInfo *PSI = MAM.getCachedResult<ProfileSummaryAnalysis>(M);
@@ -213,7 +216,7 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
&FAM.getResult<BlockFrequencyAnalysis>(Callee));
InlineResult IR =
- InlineFunction(*CB, IFI, /*MergeAttributes=*/true,
+ InlineFunction(*CB, IFI, CtxProf, /*MergeAttributes=*/true,
&FAM.getResult<AAManager>(*CB->getCaller()));
if (!IR.isSuccess()) {
Advice->recordUnsuccessfulInlining(IR);
diff --git a/llvm/lib/Transforms/Utils/InlineFunction.cpp b/llvm/lib/Transforms/Utils/InlineFunction.cpp
index 94e87656a192c7..b3c048e9584d47 100644
--- a/llvm/lib/Transforms/Utils/InlineFunction.cpp
+++ b/llvm/lib/Transforms/Utils/InlineFunction.cpp
@@ -23,6 +23,7 @@
#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/CallGraph.h"
#include "llvm/Analysis/CaptureTracking.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
#include "llvm/Analysis/IndirectCallVisitor.h"
#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/MemoryProfileInfo.h"
@@ -46,6 +47,7 @@
#include "llvm/IR/Dominators.h"
#include "llvm/IR/EHPersonalities.h"
#include "llvm/IR/Function.h"
+#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InlineAsm.h"
#include "llvm/IR/InstrTypes.h"
@@ -71,6 +73,7 @@
#include <algorithm>
#include <cassert>
#include <cstdint>
+#include <deque>
#include <iterator>
#include <limits>
#include <optional>
@@ -2116,6 +2119,199 @@ inlineRetainOrClaimRVCalls(CallBase &CB, objcarc::ARCInstKind RVCallKind,
}
}
+// In contextual profiling, when an inline succeeds, we want to remap the
+// indices of the callee in the index space of the caller. We can't just leave
+// them as-is because the same callee may appear in other places in this caller
+// (other callsites), and its (callee's) counters and sub-contextual profile
+// tree would be potentially different.
+// Not all BBs of the callee may survive the opportunistic DCE InlineFunction
+// does (same goes for callsites in the callee).
+// We will return a pair of vectors, one for basic block IDs and one for
+// callsites. For such a vector V, V[Idx] will be -1 if the callee
+// instrumentation with index Idx did not survive inlining, and a new value
+// otherwise.
+// This function will update the instrumentation intrinsics accordingly,
+// replacing the "name" operand to point to the caller, and mapping indices as
+// described above.
+// The return values will be then used to update the contextual profile.
+// Note: we only update the "name" and "index" operands in the instrumentation
+// intrinsics, we leave the hash and total nr of indices as-is, it's not worth
+// updating those.
+static const std::pair<std::vector<int64_t>, std::vector<int64_t>>
+remapIndices(Function &Caller, BasicBlock *StartBB,
+ CtxProfAnalysis::Result &CtxProf, uint32_t CalleeCounters,
+ uint32_t CalleeCallsites) {
+ // We'll allocate a new ID to imported callsite counters and callsites. We're
+ // using -1 to indicate a counter we delete. Most likely the entry, for
+ // example, will be deleted - we don't want 2 IDs in the same BB, and the
+ // entry would have been cloned in the callsite's old BB.
+ std::vector<int64_t> CalleeCounterMap;
+ std::vector<int64_t> CalleeCallsiteMap;
+ CalleeCounterMap.resize(CalleeCounters, -1);
+ CalleeCallsiteMap.resize(CalleeCallsites, -1);
+
+ auto RewriteInstrIfNeeded = [&](InstrProfIncrementInst &Ins) -> bool {
+ if (Ins.getNameValue() == &Caller)
+ return false;
+ const auto OldID = static_cast<uint32_t>(Ins.getIndex()->getZExtValue());
+ if (CalleeCounterMap[OldID] == -1)
+ CalleeCounterMap[OldID] = CtxProf.allocateNextCounterIndex(Caller);
+ const auto NewID = static_cast<uint32_t>(CalleeCounterMap[OldID]);
+
+ Ins.setNameValue(&Caller);
+ Ins.setIndex(NewID);
+ return true;
+ };
+
+ auto RewriteCallsiteInsIfNeeded = [&](InstrProfCallsite &Ins) -> bool {
+ if (Ins.getNameValue() == &Caller)
+ return false;
+ const auto OldID = static_cast<uint32_t>(Ins.getIndex()->getZExtValue());
+ if (CalleeCallsiteMap[OldID] == -1)
+ CalleeCallsiteMap[OldID] = CtxProf.allocateNextCallsiteIndex(Caller);
+ const auto NewID = static_cast<uint32_t>(CalleeCallsiteMap[OldID]);
+
+ Ins.setNameValue(&Caller);
+ Ins.setIndex(NewID);
+ return true;
+ };
+
+ std::deque<BasicBlock *> Worklist;
+ DenseSet<const BasicBlock *> Seen;
+ // We will traverse the BBs starting from the callsite BB. The callsite BB
+ // will have at least a BB ID - maybe its own, and in any case the one coming
+ // from the cloned function's entry BB. The other BBs we'll start seeing from
+ // there on may or may not have BB IDs. BBs with IDs belonging to our caller
+ // are definitely not coming from the imported function and form a boundary
+ // past which we don't need to traverse anymore. BBs may have no
+ // instrumentation, in which case we'll traverse past them.
+ // An invariant we'll keep is that a BB will have at most 1 BB ID. For
+ // example, the callsite BB will delete the callee BB's instrumentation. This
+ // doesn't result in information loss: the entry BB of the caller will have
+ // the same count as the callsite's BB.
+ // At the end of this traversal, all the callee's instrumentation would be
+ // mapped into the caller's instrumentation index space. Some of the callee's
+ // counters may be deleted (as mentioned, this should result in no loss of
+ // information).
+ Worklist.push_back(StartBB);
+ while (!Worklist.empty()) {
+ auto *BB = Worklist.front();
+ Worklist.pop_front();
+ bool Changed = false;
+ auto *BBID = CtxProfAnalysis::getBBInstrumentation(*BB);
+ if (BBID) {
+ Changed |= RewriteInstrIfNeeded(*BBID);
+ // this may be the entryblock from the inlined callee, coming into a BB
+ // that didn't have instrumentation because of MST decisions. Let's make
+ // sure it's placed accordingly. This is a noop elsewhere.
+ BBID->moveBefore(&*BB->getFirstInsertionPt());
+ }
+ for (auto &I : llvm::make_early_inc_range(*BB)) {
+ if (auto *Inc = dyn_cast<InstrProfIncrementInst>(&I)) {
+ if (Inc != BBID) {
+ Inc->eraseFromParent();
+ Changed = true;
+ }
+ } else if (auto *CS = dyn_cast<InstrProfCallsite>(&I)) {
+ Changed |= RewriteCallsiteInsIfNeeded(*CS);
+ }
+ }
+ if (!BBID || Changed)
+ for (auto *Succ : successors(BB))
+ if (Seen.insert(Succ).second)
+ Worklist.push_back(Succ);
+ }
+ return {std::move(CalleeCounterMap), std::move(CalleeCallsiteMap)};
+}
+
+llvm::InlineResult llvm::InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
+ CtxProfAnalysis::Result &CtxProf,
+ bool MergeAttributes,
+ AAResults *CalleeAAR,
+ bool InsertLifetime,
+ Function *ForwardVarArgsTo) {
+ if (!CtxProf)
+ return InlineFunction(CB, IFI, MergeAttributes, CalleeAAR, InsertLifetime,
+ ForwardVarArgsTo);
+
+ auto &Caller = *CB.getCaller();
+ auto &Callee = *CB.getCalledFunction();
+ auto *StartBB = CB.getParent();
+
+ // Get some preliminary data about the callsite before it might get inlined.
+ // Inlining shouldn't delete the callee, but it's cleaner (and low-cost) to
+ // get this data upfront and rely less on InlineFunction's behavior.
+ const auto CalleeGUID = AssignGUIDPass::getGUID(Callee);
+ auto *CallsiteIDIns = CtxProfAnalysis::getCallsiteInstrumentation(CB);
+ const auto CallsiteID =
+ static_cast<uint32_t>(CallsiteIDIns->getIndex()->getZExtValue());
+
+ const auto NrCalleeCounters = CtxProf.getNrCounters(Callee);
+ const auto NrCalleeCallsites = CtxProf.getNrCallsites(Callee);
+
+ auto Ret = InlineFunction(CB, IFI, MergeAttributes, CalleeAAR, InsertLifetime,
+ ForwardVarArgsTo);
+ if (!Ret.isSuccess())
+ return Ret;
+
+ // Inlining succeeded, we don't need the instrumentation of the inlined
+ // callsite.
+ CallsiteIDIns->eraseFromParent();
+
+ // Assinging Maps and then capturing references into it in the lambda because
+ // captured structured bindings are a C++20 extension. We do also need a
+ // capture here, though.
+ const auto IndicesMaps = remapIndices(Caller, StartBB, CtxProf,
+ NrCalleeCounters, NrCalleeCallsites);
+ const uint32_t NewCountersSize = CtxProf.getNrCounters(Caller);
+
+ auto Updater = [&](PGOCtxProfContext &Ctx) {
+ assert(Ctx.guid() == AssignGUIDPass::getGUID(Caller));
+ const auto &[CalleeCounterMap, CalleeCallsiteMap] = IndicesMaps;
+ assert(
+ (Ctx.counters().size() +
+ llvm::count_if(CalleeCounterMap, [](auto V) { return V != -1; }) ==
+ NewCountersSize) &&
+ "The caller's counters size should have grown by the number of new "
+ "distinct counters inherited from the inlined callee.");
+ Ctx.resizeCounters(NewCountersSize);
+ // If the callsite wasn't exercised in this context, the value of the
+ // counters coming from it is 0 and so we're done.
+ auto CSIt = Ctx.callsites().find(CallsiteID);
+ if (CSIt == Ctx.callsites().end())
+ return;
+ auto CalleeCtxIt = CSIt->second.find(CalleeGUID);
+ // The callsite was exercised, but not with this callee (so presumably this
+ // is an indirect callsite). Again, we're done here.
+ if (CalleeCtxIt == CSIt->second.end())
+ return;
+
+ // Let's pull in the counter values and the subcontexts coming from the
+ // inlined callee.
+ auto &CalleeCtx = CalleeCtxIt->second;
+ assert(CalleeCtx.guid() == CalleeGUID);
+
+ for (auto I = 0U; I < CalleeCtx.counters().size(); ++I) {
+ const int64_t NewIndex = CalleeCounterMap[I];
+ if (NewIndex >= 0)
+ Ctx.counters()[NewIndex] = CalleeCtx.counters()[I];
+ }
+ for (auto &[I, OtherSet] : CalleeCtx.callsites()) {
+ const int64_t NewCSIdx = CalleeCallsiteMap[I];
+ if (NewCSIdx >= 0)
+ Ctx.ingestAllContexts(NewCSIdx, std::move(OtherSet));
+ }
+ // We know the traversal is preorder, so it wouldn't have yet looked at the
+ // sub-contexts of this context that it's currently visiting. Meaning, the
+ // erase below invalidates no iterators.
+ auto Deleted = Ctx.callsites().erase(CallsiteID);
+ assert(Deleted);
+ (void)Deleted;
+ };
+ CtxProf.update(Updater, &Caller);
+ return Ret;
+}
+
/// This function inlines the called function into the basic block of the
/// caller. This returns false if it is not possible to inline this call.
/// The program is still in a well defined state if this occurs though.
diff --git a/llvm/test/Analysis/CtxProfAnalysis/inline.ll b/llvm/test/Analysis/CtxProfAnalysis/inline.ll
new file mode 100644
index 00000000000000..97158e3e0640cd
--- /dev/null
+++ b/llvm/test/Analysis/CtxProfAnalysis/inline.ll
@@ -0,0 +1,101 @@
+; RUN: rm -rf %t
+; RUN: split-file %s %t
+; RUN: llvm-ctxprof-util fromJSON --input=%t/profile.json --output=%t/profile.ctxprofdata
+
+; RUN: opt -passes='module-inline,print<ctx-prof-analysis>' %t/module.ll -S \
+; RUN: -use-ctx-profile=%t/profile.ctxprofdata -ctx-profile-printer-level=json \
+; RUN: -o - 2> %t/profile-final.txt | FileCheck %s
+; RUN: %python %S/json_equals.py %t/profile-final.txt %t/expected.json
+
+; CHECK-LABEL: @entrypoint
+; CHECK-LABEL: yes:
+; CHECK: call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 3, i32 1)
+; CHECK-NEXT: br label %loop.i
+; CHECK-LABEL: loop.i:
+; CHECK-NEXT: %indvar.i = phi i32 [ %indvar.next.i, %loop.i ], [ 0, %yes ]
+; CHECK-NEXT: call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 2, i32 3)
+; CHECK-NEXT: %b.i = add i32 %x, %indvar.i
+; CHECK-NEXT: call void @llvm.instrprof.callsite(ptr @entrypoint, i64 0, i32 1, i32 2, ptr @b)
+; CHECK-NEXT: %call3.i = call i32 @b() #1
+; CHECK-LABEL: no:
+; CHECK-NEXT: call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 3, i32 2)
+; CHECK-NEXT: call void @llvm.instrprof.callsite(ptr @entrypoint, i64 0, i32 2, i32 1, ptr @a)
+; CHECK-NEXT: %call2 = call i32 @a(i32 %x) #1
+; CHECK-NEXT: br label %exit
+
+
+;--- module.ll
+define i32 @entrypoint(i32 %x) !guid !0 {
+ call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 3, i32 0)
+ %t = icmp eq i32 %x, 0
+ br i1 %t, label %yes, label %no
+yes:
+ call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 3, i32 1)
+ call void @llvm.instrprof.callsite(ptr @entrypoint, i64 0, i32 2, i32 0, ptr @a)
+ %call1 = call i32 @a(i32 %x) alwaysinline
+ br label %exit
+no:
+ call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 3, i32 2)
+ call void @llvm.instrprof.callsite(ptr @entrypoint, i64 0, i32 2, i32 1, ptr @a)
+ %call2 = call i32 @a(i32 %x) noinline
+ br label %exit
+exit:
+ %ret = phi i32 [%call1, %yes], [%call2, %no]
+ ret i32 %ret
+}
+
+define i32 @a(i32 %x) !guid !1 {
+entry:
+ call void @llvm.instrprof.increment(ptr @a, i64 0, i32 2, i32 0)
+ br label %loop
+loop:
+ %indvar = phi i32 [%indvar.next, %loop], [0, %entry]
+ call void @llvm.instrprof.increment(ptr @a, i64 0, i32 2, i32 1)
+ %b = add i32 %x, %indvar
+ call void @llvm.instrprof.callsite(ptr @a, i64 0, i32 1, i32 0, ptr @b)
+ %call3 = call i32 @b() noinline
+ %indvar.next = add i32 %indvar,...
[truncated]
|
✅ With the latest revision this PR passed the Python code formatter. |
b6279c7
to
e11ff65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't reviewed the meat of the PR yet, but one question below and also a request to add a short high level summary to the PR.
@@ -63,6 +63,16 @@ class PGOContextualProfile { | |||
return getDefinedFunctionGUID(F) != 0; | |||
} | |||
|
|||
uint32_t getNrCounters(const Function &F) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: what's Nr short for? Maybe add a comment to describe it, or put the full name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number. "No" is the other common abbreviation, but can be read as a negation. I could rename to "getNumberOfCounters".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using "num".
auto Ret = InlineFunction(CB, IFI, MergeAttributes, CalleeAAR, InsertLifetime, | ||
ForwardVarArgsTo); | ||
if (!Ret.isSuccess()) | ||
return Ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naive question, what are the scenarios for un-successful inlines if contextual profile shows a <caller, inlined-callee> pair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contextual profile never shows a <caller, inlined-callee> pair. It shows <caller, callee> pairs. Whether the post-instrumentation inliner pass inlined or not functions is immaterial - because it'd be decisions about inlining on an instrumented binary - and also transparent - because the instrumentation mechanism doesn't care how we get from a callsite instrumentation to the entry BB of a function.
All that is about the instrumented run and explains what the profile contains: call chains (with the observation above). Here we're doing profile use, though. There might be confusion about "why instrumentation...": we carry around instrumentation so we can use it as a marker - to tie counter values back to BBs later, when we flatten and lower the profile.
So at this stage, any inlining decision may be unsuccessful for whatever legal reason an inlining could be unsuccessful, irrespective of contextual profiling or not (we mostly try to avoid trying to inline callsites that can't be inlined, but IIUC there are some cases that only the actual attempt uncovers)
|
||
; CHECK-LABEL: @entrypoint | ||
; CHECK-LABEL: yes: | ||
; CHECK: call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 3, i32 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive question, is it intended that llvm.instrprof.increment
intrinsic is present in a ctx profile use IR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. we keep the instrumentation until we flatten the profile and then lower the flattened profile to MD_prof
. The instrumentation will serve as a means to associate counter values in the contextual profile to BBs.
uint32_t CalleeCallsites) { | ||
// We'll allocate a new ID to imported callsite counters and callsites. We're | ||
// using -1 to indicate a counter we delete. Most likely the entry, for | ||
// example, will be deleted - we don't want 2 IDs in the same BB, and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "...the callee entry BB, for example, would have been deleted when the callee was inlined"..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no :) the callee entry BB is cloned in the BB of the callsite. What the "deletion" is about is the ID - i.e. the instrumentation instruction (the "ID" is the index of the instrumentation instruction. It IDs the BB, basically) - and it's something remapIndices
will likely do. It won't always do that: if the callsite BB didn't have an ID (because MST decided not to add instrumentation to that BB), then we have no reason to delete the instrumentation the callee entry BB comes with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the word "ID" there ("Most likely the entry ID, for example...")
} | ||
for (auto &I : llvm::make_early_inc_range(*BB)) { | ||
if (auto *Inc = dyn_cast<InstrProfIncrementInst>(&I)) { | ||
if (Inc != BBID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this case correspond to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me a bit to remember, so... added a comment, ptal.
Changed |= RewriteCallsiteInsIfNeeded(*CS); | ||
} | ||
} | ||
if (!BBID || Changed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the !BBID case mean in practice - i.e. why does it mean we need to look at successors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instrumentation instructions are added on MST edges (PGOInstrument...), so no BBID means "MST thought this is a likely cold BB". Which it could've thought in the callee, too, we don't know.
; RUN: -o - 2> %t/profile-final.txt | FileCheck %s | ||
; RUN: %python %S/json_equals.py %t/profile-final.txt %t/expected.json | ||
|
||
; CHECK-LABEL: @entrypoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments noting what we are looking for would be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, ptal
aef887e
to
f2349d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with some comment fixes
f2349d6
to
a6d5e9f
Compare
a6d5e9f
to
e1ab7de
Compare
revets: 3209766 [ctx_prof] Add Inlining support (llvm#106154) Matthew looking to reland base patch Change-Id: I38c1a12d756af27dd7d748020c49b86caddd07e8
Pass to flatten and lower the contextual profile to profile (i.e. `MD_prof`) metadata. This is expected to be used after all IPO transformations have happened. Prior to lowering, the instrumentation is maintained during IPO and the contextual profile is kept in sync (see PRs #105469, #106154). Flattening (#104539) sums up all the counters belonging to all a function's context nodes. We first propagate counter values (from the flattened profile) using the same propagation algorithm as `PGOUseFunc::populateCounters`, then map the edge values to `branch_weights`. Functions. in the module that don't have an entry in the flattened profile are deemed cold, and any `MD_prof` metadata they may have is reset. The profile summary is also reset at this point. Issue [#89287](#89287)
Add an overload of `InlineFunction` that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant. Post-inlining, the update mainly consists of: - making the PGO instrumentation of the callee "the caller's": the owner function (the "name" parameter of the instrumentation instructions) becomes the caller, and new index values are allocated for each of the callee's indices (this happens for both increment and callsite instrumentation instructions) - in the contextual profile: - each context corresponding to the caller has its counters updated to incorporate the counters inherited from the callee at the inlined callsite. Counter values are copied as-is because no scaling is required since the profile is contextual. - the contexts of the callee (at the inlined callsite) are moved to the caller. - the callee context at the inlined callsite is deleted. Change-Id: If4814bc9aeea2a55e597b93cb0ecb33aa73be01f
Pass to flatten and lower the contextual profile to profile (i.e. `MD_prof`) metadata. This is expected to be used after all IPO transformations have happened. Prior to lowering, the instrumentation is maintained during IPO and the contextual profile is kept in sync (see PRs llvm#105469, llvm#106154). Flattening (llvm#104539) sums up all the counters belonging to all a function's context nodes. We first propagate counter values (from the flattened profile) using the same propagation algorithm as `PGOUseFunc::populateCounters`, then map the edge values to `branch_weights`. Functions. in the module that don't have an entry in the flattened profile are deemed cold, and any `MD_prof` metadata they may have is reset. The profile summary is also reset at this point. Issue [llvm#89287](llvm#89287)
Add an overload of
InlineFunction
that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant.Post-inlining, the update mainly consists of:
Issue #89287