Skip to content

Commit

Permalink
[SamplePGO] Add a cutoff for number of profile matching anchors (#95542)
Browse files Browse the repository at this point in the history
The algorithm added by PR #87375 can be potentially quadratic in the
number of anchors. This is almost never a problem because normally
functions have a reasonable number of function calls.

However, in some rare cases of auto-generated code we observed very
large functions that trigger quadratic behaviour here (resulting in
>130GB of peak heap memory usage for clang). Let's add a knob for
controlling the max number of callsites in a function above which stale
profile matching won't be performed.
  • Loading branch information
amharc authored Jun 17, 2024
1 parent 3a2e442 commit 21ba91c
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 0 deletions.
16 changes: 16 additions & 0 deletions llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
#include "llvm/Transforms/IPO/SampleProfileMatcher.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/MDBuilder.h"
#include "llvm/Support/CommandLine.h"

using namespace llvm;
using namespace sampleprof;
Expand All @@ -24,6 +25,11 @@ extern cl::opt<bool> SalvageStaleProfile;
extern cl::opt<bool> PersistProfileStaleness;
extern cl::opt<bool> ReportProfileStaleness;

static cl::opt<unsigned> SalvageStaleProfileMaxCallsites(
"salvage-stale-profile-max-callsites", cl::Hidden, cl::init(UINT_MAX),
cl::desc("The maximum number of callsites in a function, above which stale "
"profile matching will be skipped."));

void SampleProfileMatcher::findIRAnchors(const Function &F,
AnchorMap &IRAnchors) {
// For inlined code, recover the original callsite and callee by finding the
Expand Down Expand Up @@ -300,6 +306,16 @@ void SampleProfileMatcher::runStaleProfileMatching(
if (FilteredIRAnchorsList.empty() || FilteredProfileAnchorList.empty())
return;

if (FilteredIRAnchorsList.size() > SalvageStaleProfileMaxCallsites ||
FilteredProfileAnchorList.size() > SalvageStaleProfileMaxCallsites) {
LLVM_DEBUG(dbgs() << "Skip stale profile matching for " << F.getName()
<< " because the number of callsites in the IR is "
<< FilteredIRAnchorsList.size()
<< " and in the profile is "
<< FilteredProfileAnchorList.size() << "\n");
return;
}

// Match the callsite anchors by finding the longest common subsequence
// between IR and profile. Note that we need to use IR anchor as base(A side)
// to align with the order of IRToProfileLocationMap.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
; REQUIRES: x86_64-linux
; REQUIRES: asserts
; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/pseudo-probe-stale-profile-matching-LCS.prof --salvage-stale-profile -S --debug-only=sample-profile,sample-profile-matcher,sample-profile-impl 2>&1 | FileCheck %s
; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/pseudo-probe-stale-profile-matching-LCS.prof --salvage-stale-profile -S --debug-only=sample-profile,sample-profile-matcher,sample-profile-impl --salvage-stale-profile-max-callsites=6 2>&1 | FileCheck %s -check-prefix=CHECK-MAX-CALLSITES

; CHECK: Run stale profile matching for test_direct_call
; CHECK: Location is matched from 1 to 1
Expand All @@ -27,6 +28,8 @@
; CHECK: Callsite with callee:unknown.indirect.callee is matched from 9 to 6
; CHECK: Callsite with callee:C is matched from 10 to 7

; CHECK-MAX-CALLSITES: Skip stale profile matching for test_direct_call
; CHECK-MAX-CALLSITES-NOT: Skip stale profile matching for test_indirect_call

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
Expand Down

0 comments on commit 21ba91c

Please sign in to comment.