Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ctx_prof] Add Inlining support #106154

Merged
merged 1 commit into from
Sep 3, 2024

Conversation

mtrofin
Copy link
Member

@mtrofin mtrofin commented Aug 26, 2024

Add an overload of InlineFunction that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant.

Post-inlining, the update mainly consists of:

  • making the PGO instrumentation of the callee "the caller's": the owner function (the "name" parameter of the instrumentation instructions) becomes the caller, and new index values are allocated for each of the callee's indices (this happens for both increment and callsite instrumentation instructions)
  • in the contextual profile:
    • each context corresponding to the caller has its counters updated to incorporate the counters inherited from the callee at the inlined callsite. Counter values are copied as-is because no scaling is required since the profile is contextual.
    • the contexts of the callee (at the inlined callsite) are moved to the caller.
    • the callee context at the inlined callsite is deleted.

Issue #89287

Copy link
Member Author

mtrofin commented Aug 26, 2024

Copy link

github-actions bot commented Aug 26, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@mtrofin mtrofin force-pushed the users/mtrofin/08-20-_ctx_prof_add_support_for_icp branch 7 times, most recently from 36e3144 to 176f656 Compare August 27, 2024 22:46
Base automatically changed from users/mtrofin/08-20-_ctx_prof_add_support_for_icp to main August 27, 2024 22:50
@mtrofin mtrofin force-pushed the users/mtrofin/08-22-_ctx_prof_add_inlining_support branch 2 times, most recently from bf293b4 to 5733f3b Compare August 28, 2024 22:00
@mtrofin mtrofin marked this pull request as ready for review August 28, 2024 22:00
@llvmbot
Copy link
Member

llvmbot commented Aug 28, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-pgo

Author: Mircea Trofin (mtrofin)

Changes

Patch is 21.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/106154.diff

9 Files Affected:

  • (modified) llvm/include/llvm/Analysis/CtxProfAnalysis.h (+11-3)
  • (modified) llvm/include/llvm/IR/IntrinsicInst.h (+2)
  • (modified) llvm/include/llvm/ProfileData/PGOCtxProfReader.h (+5)
  • (modified) llvm/include/llvm/Transforms/Utils/Cloning.h (+12)
  • (modified) llvm/lib/Analysis/CtxProfAnalysis.cpp (+13-1)
  • (modified) llvm/lib/Transforms/IPO/ModuleInliner.cpp (+4-1)
  • (modified) llvm/lib/Transforms/Utils/InlineFunction.cpp (+196)
  • (added) llvm/test/Analysis/CtxProfAnalysis/inline.ll (+101)
  • (added) llvm/test/Analysis/CtxProfAnalysis/json_equals.py (+13)
diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 10aef6f6067b6f..7b18f88f2ec955 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -63,6 +63,16 @@ class PGOContextualProfile {
     return getDefinedFunctionGUID(F) != 0;
   }
 
+  uint32_t getNrCounters(const Function &F) const {
+    assert(isFunctionKnown(F));
+    return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCounterIndex;
+  }
+
+  uint32_t getNrCallsites(const Function &F) const {
+    assert(isFunctionKnown(F));
+    return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex;
+  }
+
   uint32_t allocateNextCounterIndex(const Function &F) {
     assert(isFunctionKnown(F));
     return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCounterIndex++;
@@ -113,9 +123,7 @@ class CtxProfAnalysisPrinterPass
     : public PassInfoMixin<CtxProfAnalysisPrinterPass> {
 public:
   enum class PrintMode { Everything, JSON };
-  explicit CtxProfAnalysisPrinterPass(raw_ostream &OS,
-                                      PrintMode Mode = PrintMode::Everything)
-      : OS(OS), Mode(Mode) {}
+  explicit CtxProfAnalysisPrinterPass(raw_ostream &OS);
 
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
   static bool isRequired() { return true; }
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h
index 71a96e0671c2f1..fc8d1b3d1947e3 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -1516,6 +1516,8 @@ class InstrProfInstBase : public IntrinsicInst {
     return const_cast<Value *>(getArgOperand(0))->stripPointerCasts();
   }
 
+  void setNameValue(Value *V) { setArgOperand(0, V); }
+
   // The hash of the CFG for the instrumented function.
   ConstantInt *getHash() const {
     return cast<ConstantInt>(const_cast<Value *>(getArgOperand(1)));
diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
index f7f88966f7573f..c64e6e79f96c4c 100644
--- a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
+++ b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
@@ -74,6 +74,11 @@ class PGOCtxProfContext final {
     Iter->second.emplace(Other.guid(), std::move(Other));
   }
 
+  void ingestAllContexts(uint32_t CSId, CallTargetMapTy &&Other) {
+    auto [_, Inserted] = callsites().try_emplace(CSId, std::move(Other));
+    assert(Inserted);
+  }
+
   void resizeCounters(uint32_t Size) { Counters.resize(Size); }
 
   bool hasCallsite(uint32_t I) const {
diff --git a/llvm/include/llvm/Transforms/Utils/Cloning.h b/llvm/include/llvm/Transforms/Utils/Cloning.h
index 6226062dd713f6..2ddcfeb1501e28 100644
--- a/llvm/include/llvm/Transforms/Utils/Cloning.h
+++ b/llvm/include/llvm/Transforms/Utils/Cloning.h
@@ -20,6 +20,7 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/Analysis/AssumptionCache.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
 #include "llvm/Analysis/InlineCost.h"
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/ValueHandle.h"
@@ -270,6 +271,17 @@ InlineResult InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
                             bool InsertLifetime = true,
                             Function *ForwardVarArgsTo = nullptr);
 
+/// Same as above, but it will update the contextual profile. If the contextual
+/// profile is invalid (i.e. not loaded because it is not present), it defaults
+/// to the behavior of the non-contextual profile updating variant above. This
+/// makes it easy to drop-in replace uses of the non-contextual overload.
+InlineResult InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
+                            CtxProfAnalysis::Result &CtxProf,
+                            bool MergeAttributes = false,
+                            AAResults *CalleeAAR = nullptr,
+                            bool InsertLifetime = true,
+                            Function *ForwardVarArgsTo = nullptr);
+
 /// Clones a loop \p OrigLoop.  Returns the loop and the blocks in \p
 /// Blocks.
 ///
diff --git a/llvm/lib/Analysis/CtxProfAnalysis.cpp b/llvm/lib/Analysis/CtxProfAnalysis.cpp
index 2cd3f2114397e5..4a66eb1dd0a31a 100644
--- a/llvm/lib/Analysis/CtxProfAnalysis.cpp
+++ b/llvm/lib/Analysis/CtxProfAnalysis.cpp
@@ -29,6 +29,15 @@ cl::opt<std::string>
     UseCtxProfile("use-ctx-profile", cl::init(""), cl::Hidden,
                   cl::desc("Use the specified contextual profile file"));
 
+static cl::opt<CtxProfAnalysisPrinterPass::PrintMode> PrintLevel(
+    "ctx-profile-printer-level",
+    cl::init(CtxProfAnalysisPrinterPass::PrintMode::Everything), cl::Hidden,
+    cl::values(clEnumValN(CtxProfAnalysisPrinterPass::PrintMode::Everything,
+                          "everything", "print everything - most verbose"),
+               clEnumValN(CtxProfAnalysisPrinterPass::PrintMode::JSON, "json",
+                          "just the json representation of the profile")),
+    cl::desc("Verbosity level of the contextual profile printer pass."));
+
 namespace llvm {
 namespace json {
 Value toJSON(const PGOCtxProfContext &P) {
@@ -150,7 +159,6 @@ PGOContextualProfile CtxProfAnalysis::run(Module &M,
   // If we made it this far, the Result is valid - which we mark by setting
   // .Profiles.
   // Trim first the roots that aren't in this module.
-  DenseSet<GlobalValue::GUID> ProfiledGUIDs;
   for (auto &[RootGuid, _] : llvm::make_early_inc_range(*MaybeCtx))
     if (!Result.FuncInfo.contains(RootGuid))
       MaybeCtx->erase(RootGuid);
@@ -165,6 +173,10 @@ PGOContextualProfile::getDefinedFunctionGUID(const Function &F) const {
   return 0;
 }
 
+CtxProfAnalysisPrinterPass::CtxProfAnalysisPrinterPass(raw_ostream &OS)
+    : OS(OS),
+      Mode(PrintLevel.getNumOccurrences() > 0 ? PrintLevel : PrintMode::JSON) {}
+
 PreservedAnalyses CtxProfAnalysisPrinterPass::run(Module &M,
                                                   ModuleAnalysisManager &MAM) {
   CtxProfAnalysis::Result &C = MAM.getResult<CtxProfAnalysis>(M);
diff --git a/llvm/lib/Transforms/IPO/ModuleInliner.cpp b/llvm/lib/Transforms/IPO/ModuleInliner.cpp
index 5e91ab80d7505f..b7e4531c8e390d 100644
--- a/llvm/lib/Transforms/IPO/ModuleInliner.cpp
+++ b/llvm/lib/Transforms/IPO/ModuleInliner.cpp
@@ -20,6 +20,7 @@
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/Analysis/AssumptionCache.h"
 #include "llvm/Analysis/BlockFrequencyInfo.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
 #include "llvm/Analysis/InlineAdvisor.h"
 #include "llvm/Analysis/InlineCost.h"
 #include "llvm/Analysis/InlineOrder.h"
@@ -113,6 +114,8 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
     return PreservedAnalyses::all();
   }
 
+  auto &CtxProf = MAM.getResult<CtxProfAnalysis>(M);
+
   bool Changed = false;
 
   ProfileSummaryInfo *PSI = MAM.getCachedResult<ProfileSummaryAnalysis>(M);
@@ -213,7 +216,7 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
         &FAM.getResult<BlockFrequencyAnalysis>(Callee));
 
     InlineResult IR =
-        InlineFunction(*CB, IFI, /*MergeAttributes=*/true,
+        InlineFunction(*CB, IFI, CtxProf, /*MergeAttributes=*/true,
                        &FAM.getResult<AAManager>(*CB->getCaller()));
     if (!IR.isSuccess()) {
       Advice->recordUnsuccessfulInlining(IR);
diff --git a/llvm/lib/Transforms/Utils/InlineFunction.cpp b/llvm/lib/Transforms/Utils/InlineFunction.cpp
index 94e87656a192c7..b3c048e9584d47 100644
--- a/llvm/lib/Transforms/Utils/InlineFunction.cpp
+++ b/llvm/lib/Transforms/Utils/InlineFunction.cpp
@@ -23,6 +23,7 @@
 #include "llvm/Analysis/BlockFrequencyInfo.h"
 #include "llvm/Analysis/CallGraph.h"
 #include "llvm/Analysis/CaptureTracking.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
 #include "llvm/Analysis/IndirectCallVisitor.h"
 #include "llvm/Analysis/InstructionSimplify.h"
 #include "llvm/Analysis/MemoryProfileInfo.h"
@@ -46,6 +47,7 @@
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/EHPersonalities.h"
 #include "llvm/IR/Function.h"
+#include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/InlineAsm.h"
 #include "llvm/IR/InstrTypes.h"
@@ -71,6 +73,7 @@
 #include <algorithm>
 #include <cassert>
 #include <cstdint>
+#include <deque>
 #include <iterator>
 #include <limits>
 #include <optional>
@@ -2116,6 +2119,199 @@ inlineRetainOrClaimRVCalls(CallBase &CB, objcarc::ARCInstKind RVCallKind,
   }
 }
 
+// In contextual profiling, when an inline succeeds, we want to remap the
+// indices of the callee in the index space of the caller. We can't just leave
+// them as-is because the same callee may appear in other places in this caller
+// (other callsites), and its (callee's) counters and sub-contextual profile
+// tree would be potentially different.
+// Not all BBs of the callee may survive the opportunistic DCE InlineFunction
+// does (same goes for callsites in the callee).
+// We will return a pair of vectors, one for basic block IDs and one for
+// callsites. For such a vector V, V[Idx] will be -1 if the callee
+// instrumentation with index Idx did not survive inlining, and a new value
+// otherwise.
+// This function will update the instrumentation intrinsics accordingly,
+// replacing the "name" operand to point to the caller, and mapping indices as
+// described above.
+// The return values will be then used to update the contextual profile.
+// Note: we only update the "name" and "index" operands in the instrumentation
+// intrinsics, we leave the hash and total nr of indices as-is, it's not worth
+// updating those.
+static const std::pair<std::vector<int64_t>, std::vector<int64_t>>
+remapIndices(Function &Caller, BasicBlock *StartBB,
+             CtxProfAnalysis::Result &CtxProf, uint32_t CalleeCounters,
+             uint32_t CalleeCallsites) {
+  // We'll allocate a new ID to imported callsite counters and callsites. We're
+  // using -1 to indicate a counter we delete. Most likely the entry, for
+  // example, will be deleted - we don't want 2 IDs in the same BB, and the
+  // entry would have been cloned in the callsite's old BB.
+  std::vector<int64_t> CalleeCounterMap;
+  std::vector<int64_t> CalleeCallsiteMap;
+  CalleeCounterMap.resize(CalleeCounters, -1);
+  CalleeCallsiteMap.resize(CalleeCallsites, -1);
+
+  auto RewriteInstrIfNeeded = [&](InstrProfIncrementInst &Ins) -> bool {
+    if (Ins.getNameValue() == &Caller)
+      return false;
+    const auto OldID = static_cast<uint32_t>(Ins.getIndex()->getZExtValue());
+    if (CalleeCounterMap[OldID] == -1)
+      CalleeCounterMap[OldID] = CtxProf.allocateNextCounterIndex(Caller);
+    const auto NewID = static_cast<uint32_t>(CalleeCounterMap[OldID]);
+
+    Ins.setNameValue(&Caller);
+    Ins.setIndex(NewID);
+    return true;
+  };
+
+  auto RewriteCallsiteInsIfNeeded = [&](InstrProfCallsite &Ins) -> bool {
+    if (Ins.getNameValue() == &Caller)
+      return false;
+    const auto OldID = static_cast<uint32_t>(Ins.getIndex()->getZExtValue());
+    if (CalleeCallsiteMap[OldID] == -1)
+      CalleeCallsiteMap[OldID] = CtxProf.allocateNextCallsiteIndex(Caller);
+    const auto NewID = static_cast<uint32_t>(CalleeCallsiteMap[OldID]);
+
+    Ins.setNameValue(&Caller);
+    Ins.setIndex(NewID);
+    return true;
+  };
+
+  std::deque<BasicBlock *> Worklist;
+  DenseSet<const BasicBlock *> Seen;
+  // We will traverse the BBs starting from the callsite BB. The callsite BB
+  // will have at least a BB ID - maybe its own, and in any case the one coming
+  // from the cloned function's entry BB. The other BBs we'll start seeing from
+  // there on may or may not have BB IDs. BBs with IDs belonging to our caller
+  // are definitely not coming from the imported function and form a boundary
+  // past which we don't need to traverse anymore. BBs may have no
+  // instrumentation, in which case we'll traverse past them.
+  // An invariant we'll keep is that a BB will have at most 1 BB ID. For
+  // example, the callsite BB will delete the callee BB's instrumentation. This
+  // doesn't result in information loss: the entry BB of the caller will have
+  // the same count as the callsite's BB.
+  // At the end of this traversal, all the callee's instrumentation would be
+  // mapped into the caller's instrumentation index space. Some of the callee's
+  // counters may be deleted (as mentioned, this should result in no loss of
+  // information).
+  Worklist.push_back(StartBB);
+  while (!Worklist.empty()) {
+    auto *BB = Worklist.front();
+    Worklist.pop_front();
+    bool Changed = false;
+    auto *BBID = CtxProfAnalysis::getBBInstrumentation(*BB);
+    if (BBID) {
+      Changed |= RewriteInstrIfNeeded(*BBID);
+      // this may be the entryblock from the inlined callee, coming into a BB
+      // that didn't have instrumentation because of MST decisions. Let's make
+      // sure it's placed accordingly. This is a noop elsewhere.
+      BBID->moveBefore(&*BB->getFirstInsertionPt());
+    }
+    for (auto &I : llvm::make_early_inc_range(*BB)) {
+      if (auto *Inc = dyn_cast<InstrProfIncrementInst>(&I)) {
+        if (Inc != BBID) {
+          Inc->eraseFromParent();
+          Changed = true;
+        }
+      } else if (auto *CS = dyn_cast<InstrProfCallsite>(&I)) {
+        Changed |= RewriteCallsiteInsIfNeeded(*CS);
+      }
+    }
+    if (!BBID || Changed)
+      for (auto *Succ : successors(BB))
+        if (Seen.insert(Succ).second)
+          Worklist.push_back(Succ);
+  }
+  return {std::move(CalleeCounterMap), std::move(CalleeCallsiteMap)};
+}
+
+llvm::InlineResult llvm::InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
+                                        CtxProfAnalysis::Result &CtxProf,
+                                        bool MergeAttributes,
+                                        AAResults *CalleeAAR,
+                                        bool InsertLifetime,
+                                        Function *ForwardVarArgsTo) {
+  if (!CtxProf)
+    return InlineFunction(CB, IFI, MergeAttributes, CalleeAAR, InsertLifetime,
+                          ForwardVarArgsTo);
+
+  auto &Caller = *CB.getCaller();
+  auto &Callee = *CB.getCalledFunction();
+  auto *StartBB = CB.getParent();
+
+  // Get some preliminary data about the callsite before it might get inlined.
+  // Inlining shouldn't delete the callee, but it's cleaner (and low-cost) to
+  // get this data upfront and rely less on InlineFunction's behavior.
+  const auto CalleeGUID = AssignGUIDPass::getGUID(Callee);
+  auto *CallsiteIDIns = CtxProfAnalysis::getCallsiteInstrumentation(CB);
+  const auto CallsiteID =
+      static_cast<uint32_t>(CallsiteIDIns->getIndex()->getZExtValue());
+
+  const auto NrCalleeCounters = CtxProf.getNrCounters(Callee);
+  const auto NrCalleeCallsites = CtxProf.getNrCallsites(Callee);
+
+  auto Ret = InlineFunction(CB, IFI, MergeAttributes, CalleeAAR, InsertLifetime,
+                            ForwardVarArgsTo);
+  if (!Ret.isSuccess())
+    return Ret;
+
+  // Inlining succeeded, we don't need the instrumentation of the inlined
+  // callsite.
+  CallsiteIDIns->eraseFromParent();
+
+  // Assinging Maps and then capturing references into it in the lambda because
+  // captured structured bindings are a C++20 extension. We do also need a
+  // capture here, though.
+  const auto IndicesMaps = remapIndices(Caller, StartBB, CtxProf,
+                                        NrCalleeCounters, NrCalleeCallsites);
+  const uint32_t NewCountersSize = CtxProf.getNrCounters(Caller);
+
+  auto Updater = [&](PGOCtxProfContext &Ctx) {
+    assert(Ctx.guid() == AssignGUIDPass::getGUID(Caller));
+    const auto &[CalleeCounterMap, CalleeCallsiteMap] = IndicesMaps;
+    assert(
+        (Ctx.counters().size() +
+             llvm::count_if(CalleeCounterMap, [](auto V) { return V != -1; }) ==
+         NewCountersSize) &&
+        "The caller's counters size should have grown by the number of new "
+        "distinct counters inherited from the inlined callee.");
+    Ctx.resizeCounters(NewCountersSize);
+    // If the callsite wasn't exercised in this context, the value of the
+    // counters coming from it is 0 and so we're done.
+    auto CSIt = Ctx.callsites().find(CallsiteID);
+    if (CSIt == Ctx.callsites().end())
+      return;
+    auto CalleeCtxIt = CSIt->second.find(CalleeGUID);
+    // The callsite was exercised, but not with this callee (so presumably this
+    // is an indirect callsite). Again, we're done here.
+    if (CalleeCtxIt == CSIt->second.end())
+      return;
+
+    // Let's pull in the counter values and the subcontexts coming from the
+    // inlined callee.
+    auto &CalleeCtx = CalleeCtxIt->second;
+    assert(CalleeCtx.guid() == CalleeGUID);
+
+    for (auto I = 0U; I < CalleeCtx.counters().size(); ++I) {
+      const int64_t NewIndex = CalleeCounterMap[I];
+      if (NewIndex >= 0)
+        Ctx.counters()[NewIndex] = CalleeCtx.counters()[I];
+    }
+    for (auto &[I, OtherSet] : CalleeCtx.callsites()) {
+      const int64_t NewCSIdx = CalleeCallsiteMap[I];
+      if (NewCSIdx >= 0)
+        Ctx.ingestAllContexts(NewCSIdx, std::move(OtherSet));
+    }
+    // We know the traversal is preorder, so it wouldn't have yet looked at the
+    // sub-contexts of this context that it's currently visiting. Meaning, the
+    // erase below invalidates no iterators.
+    auto Deleted = Ctx.callsites().erase(CallsiteID);
+    assert(Deleted);
+    (void)Deleted;
+  };
+  CtxProf.update(Updater, &Caller);
+  return Ret;
+}
+
 /// This function inlines the called function into the basic block of the
 /// caller. This returns false if it is not possible to inline this call.
 /// The program is still in a well defined state if this occurs though.
diff --git a/llvm/test/Analysis/CtxProfAnalysis/inline.ll b/llvm/test/Analysis/CtxProfAnalysis/inline.ll
new file mode 100644
index 00000000000000..97158e3e0640cd
--- /dev/null
+++ b/llvm/test/Analysis/CtxProfAnalysis/inline.ll
@@ -0,0 +1,101 @@
+; RUN: rm -rf %t
+; RUN: split-file %s %t
+; RUN: llvm-ctxprof-util fromJSON --input=%t/profile.json --output=%t/profile.ctxprofdata
+
+; RUN: opt -passes='module-inline,print<ctx-prof-analysis>' %t/module.ll -S \
+; RUN:   -use-ctx-profile=%t/profile.ctxprofdata -ctx-profile-printer-level=json \
+; RUN:   -o - 2> %t/profile-final.txt | FileCheck %s
+; RUN: %python %S/json_equals.py %t/profile-final.txt %t/expected.json
+
+; CHECK-LABEL: @entrypoint
+; CHECK-LABEL: yes:
+; CHECK:         call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 3, i32 1)
+; CHECK-NEXT:    br label %loop.i
+; CHECK-LABEL:  loop.i:
+; CHECK-NEXT:    %indvar.i = phi i32 [ %indvar.next.i, %loop.i ], [ 0, %yes ]
+; CHECK-NEXT:    call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 2, i32 3)
+; CHECK-NEXT:    %b.i = add i32 %x, %indvar.i
+; CHECK-NEXT:    call void @llvm.instrprof.callsite(ptr @entrypoint, i64 0, i32 1, i32 2, ptr @b)
+; CHECK-NEXT:    %call3.i = call i32 @b() #1
+; CHECK-LABEL: no:
+; CHECK-NEXT:    call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 3, i32 2)
+; CHECK-NEXT:    call void @llvm.instrprof.callsite(ptr @entrypoint, i64 0, i32 2, i32 1, ptr @a)
+; CHECK-NEXT:    %call2 = call i32 @a(i32 %x) #1
+; CHECK-NEXT:    br label %exit
+
+
+;--- module.ll
+define i32 @entrypoint(i32 %x) !guid !0 {
+  call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 3, i32 0)
+  %t = icmp eq i32 %x, 0
+  br i1 %t, label %yes, label %no
+yes:
+  call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 3, i32 1)
+  call void @llvm.instrprof.callsite(ptr @entrypoint, i64 0, i32 2, i32 0, ptr @a)
+  %call1 = call i32 @a(i32 %x) alwaysinline
+  br label %exit
+no:
+  call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 3, i32 2)
+  call void @llvm.instrprof.callsite(ptr @entrypoint, i64 0, i32 2, i32 1, ptr @a)
+  %call2 = call i32 @a(i32 %x) noinline
+  br label %exit
+exit:
+  %ret = phi i32 [%call1, %yes], [%call2, %no]
+  ret i32 %ret
+}
+
+define i32 @a(i32 %x) !guid !1 {
+entry:
+  call void @llvm.instrprof.increment(ptr @a, i64 0, i32 2, i32 0)
+  br label %loop
+loop:
+  %indvar = phi i32 [%indvar.next, %loop], [0, %entry]
+  call void @llvm.instrprof.increment(ptr @a, i64 0, i32 2, i32 1)
+  %b = add i32 %x, %indvar
+  call void @llvm.instrprof.callsite(ptr @a, i64 0, i32 1, i32 0, ptr @b)
+  %call3 = call i32 @b() noinline
+  %indvar.next = add i32 %indvar,...
[truncated]

Copy link

github-actions bot commented Aug 28, 2024

✅ With the latest revision this PR passed the Python code formatter.

@mtrofin mtrofin force-pushed the users/mtrofin/08-22-_ctx_prof_add_inlining_support branch 2 times, most recently from b6279c7 to e11ff65 Compare August 28, 2024 22:11
Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't reviewed the meat of the PR yet, but one question below and also a request to add a short high level summary to the PR.

llvm/lib/Analysis/CtxProfAnalysis.cpp Outdated Show resolved Hide resolved
@@ -63,6 +63,16 @@ class PGOContextualProfile {
return getDefinedFunctionGUID(F) != 0;
}

uint32_t getNrCounters(const Function &F) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what's Nr short for? Maybe add a comment to describe it, or put the full name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number. "No" is the other common abbreviation, but can be read as a negation. I could rename to "getNumberOfCounters".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using "num".

llvm/include/llvm/ProfileData/PGOCtxProfReader.h Outdated Show resolved Hide resolved
auto Ret = InlineFunction(CB, IFI, MergeAttributes, CalleeAAR, InsertLifetime,
ForwardVarArgsTo);
if (!Ret.isSuccess())
return Ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naive question, what are the scenarios for un-successful inlines if contextual profile shows a <caller, inlined-callee> pair?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contextual profile never shows a <caller, inlined-callee> pair. It shows <caller, callee> pairs. Whether the post-instrumentation inliner pass inlined or not functions is immaterial - because it'd be decisions about inlining on an instrumented binary - and also transparent - because the instrumentation mechanism doesn't care how we get from a callsite instrumentation to the entry BB of a function.

All that is about the instrumented run and explains what the profile contains: call chains (with the observation above). Here we're doing profile use, though. There might be confusion about "why instrumentation...": we carry around instrumentation so we can use it as a marker - to tie counter values back to BBs later, when we flatten and lower the profile.

So at this stage, any inlining decision may be unsuccessful for whatever legal reason an inlining could be unsuccessful, irrespective of contextual profiling or not (we mostly try to avoid trying to inline callsites that can't be inlined, but IIUC there are some cases that only the actual attempt uncovers)


; CHECK-LABEL: @entrypoint
; CHECK-LABEL: yes:
; CHECK: call void @llvm.instrprof.increment(ptr @entrypoint, i64 0, i32 3, i32 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question, is it intended that llvm.instrprof.increment intrinsic is present in a ctx profile use IR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. we keep the instrumentation until we flatten the profile and then lower the flattened profile to MD_prof. The instrumentation will serve as a means to associate counter values in the contextual profile to BBs.

llvm/lib/Analysis/CtxProfAnalysis.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Utils/InlineFunction.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Utils/InlineFunction.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Utils/InlineFunction.cpp Outdated Show resolved Hide resolved
uint32_t CalleeCallsites) {
// We'll allocate a new ID to imported callsite counters and callsites. We're
// using -1 to indicate a counter we delete. Most likely the entry, for
// example, will be deleted - we don't want 2 IDs in the same BB, and the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "...the callee entry BB, for example, would have been deleted when the callee was inlined"..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no :) the callee entry BB is cloned in the BB of the callsite. What the "deletion" is about is the ID - i.e. the instrumentation instruction (the "ID" is the index of the instrumentation instruction. It IDs the BB, basically) - and it's something remapIndices will likely do. It won't always do that: if the callsite BB didn't have an ID (because MST decided not to add instrumentation to that BB), then we have no reason to delete the instrumentation the callee entry BB comes with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the word "ID" there ("Most likely the entry ID, for example...")

llvm/lib/Transforms/Utils/InlineFunction.cpp Show resolved Hide resolved
llvm/lib/Transforms/Utils/InlineFunction.cpp Show resolved Hide resolved
}
for (auto &I : llvm::make_early_inc_range(*BB)) {
if (auto *Inc = dyn_cast<InstrProfIncrementInst>(&I)) {
if (Inc != BBID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this case correspond to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a bit to remember, so... added a comment, ptal.

Changed |= RewriteCallsiteInsIfNeeded(*CS);
}
}
if (!BBID || Changed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the !BBID case mean in practice - i.e. why does it mean we need to look at successors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instrumentation instructions are added on MST edges (PGOInstrument...), so no BBID means "MST thought this is a likely cold BB". Which it could've thought in the callee, too, we don't know.

; RUN: -o - 2> %t/profile-final.txt | FileCheck %s
; RUN: %python %S/json_equals.py %t/profile-final.txt %t/expected.json

; CHECK-LABEL: @entrypoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments noting what we are looking for would be good

Copy link
Member Author

@mtrofin mtrofin Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, ptal

@mtrofin mtrofin force-pushed the users/mtrofin/08-22-_ctx_prof_add_inlining_support branch 5 times, most recently from aef887e to f2349d6 Compare September 3, 2024 19:30
Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with some comment fixes

@mtrofin mtrofin force-pushed the users/mtrofin/08-22-_ctx_prof_add_inlining_support branch from f2349d6 to a6d5e9f Compare September 3, 2024 21:13
@mtrofin mtrofin force-pushed the users/mtrofin/08-22-_ctx_prof_add_inlining_support branch from a6d5e9f to e1ab7de Compare September 3, 2024 21:37
Copy link
Member Author

mtrofin commented Sep 3, 2024

Merge activity

  • Sep 3, 7:12 PM EDT: @mtrofin started a stack merge that includes this pull request via Graphite.
  • Sep 3, 7:14 PM EDT: @mtrofin merged this pull request with Graphite.

@mtrofin mtrofin merged commit 3209766 into main Sep 3, 2024
8 checks passed
@mtrofin mtrofin deleted the users/mtrofin/08-22-_ctx_prof_add_inlining_support branch September 3, 2024 23:14
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Sep 4, 2024
revets: 3209766 [ctx_prof] Add Inlining support (llvm#106154)

Matthew looking to reland base patch

Change-Id: I38c1a12d756af27dd7d748020c49b86caddd07e8
mtrofin added a commit that referenced this pull request Sep 6, 2024
Pass to flatten and lower the contextual profile to profile (i.e. `MD_prof`) metadata. This is expected to be used after all IPO transformations have happened.

Prior to lowering, the instrumentation is maintained during IPO and the contextual profile is kept in sync (see PRs #105469, #106154). Flattening (#104539) sums up all the counters belonging to all a function's context nodes.

We first propagate counter values (from the flattened profile) using the same propagation algorithm as `PGOUseFunc::populateCounters`, then map the edge values to `branch_weights`. Functions. in the module that don't have an entry in the flattened profile are deemed cold, and any `MD_prof` metadata they may have is reset. The profile summary is also reset at this point.

Issue [#89287](#89287)
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Sep 7, 2024
Add an overload of `InlineFunction` that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant.

Post-inlining, the update mainly consists of:
- making the PGO instrumentation of the callee "the caller's": the owner function (the "name" parameter of the instrumentation instructions) becomes the caller, and new index values are allocated for each of the callee's indices (this happens for both increment and callsite instrumentation instructions)
- in the contextual profile:
   - each context corresponding to the caller has its counters updated to incorporate the counters inherited from the callee at the inlined callsite. Counter values are copied as-is because no scaling is required since the profile is contextual.
   - the contexts of the callee (at the inlined callsite) are moved to the caller.
   - the callee context at the inlined callsite is deleted.

Change-Id: If4814bc9aeea2a55e597b93cb0ecb33aa73be01f
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Sep 12, 2024
Pass to flatten and lower the contextual profile to profile (i.e. `MD_prof`) metadata. This is expected to be used after all IPO transformations have happened.

Prior to lowering, the instrumentation is maintained during IPO and the contextual profile is kept in sync (see PRs llvm#105469, llvm#106154). Flattening (llvm#104539) sums up all the counters belonging to all a function's context nodes.

We first propagate counter values (from the flattened profile) using the same propagation algorithm as `PGOUseFunc::populateCounters`, then map the edge values to `branch_weights`. Functions. in the module that don't have an entry in the flattened profile are deemed cold, and any `MD_prof` metadata they may have is reset. The profile summary is also reset at this point.

Issue [llvm#89287](llvm#89287)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants