Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOLT] Match blocks with pseudo probes #99891

Merged
merged 65 commits into from
Nov 12, 2024

Conversation

shawbyoung
Copy link
Contributor

@shawbyoung shawbyoung commented Jul 22, 2024

Match inline trees first between profile and the binary: by GUID,
checksum, parent, and inline site for inlined functions. Map profile
probes to binary probes via matched inline tree nodes. Each binary probe
has an associated binary basic block. If all probes from one profile
basic block map to the same binary basic block, it’s an exact match,
otherwise the block is determined by majority vote and reported as loose
match.

Pseudo probe matching happens between exact hash matching and call/loose
matching.

Introduce ProbeMatchSpec - a mechanism to match probes belonging to
another binary function. For example, given functions foo and bar:

void foo() {
  bar();
}

profiled binary: bar is not inlined => have top-level function bar
new binary where the profile is applied to: bar is inlined into foo.

Currently, BOLT does 1:1 matching between profile functions and binary
functions based on the name. #100446 will extend this to N:M where
multiple profiles can be matched to one binary function (as in the
example above where binary function foo would use profiles for foo and
bar), and one profile can be matched to multiple binary functions (e.g.
if bar was inlined into multiple functions).

In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile
(existing name-based matching).

Test Plan: Added match-blocks-with-pseudo-probes.test

Performance test:

  • Setup:
    • Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO
      ([Clang][CMake] Add CSSPGO support to LLVM_BUILD_INSTRUMENTED #79942)
    • BOLT fresh: BOLTed Clang using fresh profile,
    • BOLT stale (hash): BOLTed Clang using stale profile (collected on
      Clang 10K commits back), -infer-stale-profile (hash+call block
      matching)
    • BOLT stale (+probe): BOLTed Clang using stale profile,
      -infer-stale-profile with -stale-matching-with-pseudo-probes
      (hash+call+pseudo probe block matching)
    • 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for
      build,
    • BOLT profiles are collected on Clang compiling large preprocessed
      C++ file.
  • Benchmark: building Clang (average of 5 runs), see driver in
    aaupov/llvm-devmtg-2022
  • Results, wall time, lower is better:
    • Baseline no-BOLT: 429.52 +- 2.61s,
    • BOLT stale (hash): 413.21 +- 2.19s,
    • BOLT stale (+probe): 409.69 +- 1.41s,
    • BOLT fresh: 384.50 +- 1.80s.

shawbyoung and others added 4 commits July 22, 2024 09:19
bolt/lib/Profile/StaleProfileMatching.cpp Outdated Show resolved Hide resolved
bolt/lib/Profile/StaleProfileMatching.cpp Outdated Show resolved Hide resolved
@shawbyoung
Copy link
Contributor Author

cc @WenleiHe @wlei-llvm

Created using spr 1.3.4
Created using spr 1.3.4
@shawbyoung shawbyoung marked this pull request as ready for review July 23, 2024 23:10
@llvmbot llvmbot added the BOLT label Jul 23, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 23, 2024

@llvm/pr-subscribers-bolt

Author: Shaw Young (shawbyoung)

Changes

Implemented pseudo probe block matching. When matched functions have
equal pseudo probe checksums, the indices of block pseudo probes are
used to match blocks following opcode and call hash block matching.

Test Plan: Added match-blocks-with-pseudo-probes.test.


Full diff: https://github.com/llvm/llvm-project/pull/99891.diff

4 Files Affected:

  • (modified) bolt/include/bolt/Core/BinaryContext.h (+8-4)
  • (modified) bolt/lib/Passes/BinaryPasses.cpp (+14-4)
  • (modified) bolt/lib/Profile/StaleProfileMatching.cpp (+170-15)
  • (added) bolt/test/X86/match-blocks-with-pseudo-probes.test (+62)
diff --git a/bolt/include/bolt/Core/BinaryContext.h b/bolt/include/bolt/Core/BinaryContext.h
index b3cf9f834cc08..39f2ac512d305 100644
--- a/bolt/include/bolt/Core/BinaryContext.h
+++ b/bolt/include/bolt/Core/BinaryContext.h
@@ -717,12 +717,16 @@ class BinaryContext {
     /// Stats for stale profile matching:
     ///   the total number of basic blocks in the profile
     uint32_t NumStaleBlocks{0};
-    ///   the number of matched basic blocks
-    uint32_t NumMatchedBlocks{0};
+    ///   the number of exactly matched basic blocks
+    uint32_t NumExactMatchedBlocks{0};
+    ///   the number of pseudo probe matched basic blocks
+    uint32_t NumPseudoProbeMatchedBlocks{0};
     ///   the total count of samples in the profile
     uint64_t StaleSampleCount{0};
-    ///   the count of matched samples
-    uint64_t MatchedSampleCount{0};
+    ///   the count of exactly matched samples
+    uint64_t ExactMatchedSampleCount{0};
+    ///   the count of pseudo probe matched samples
+    uint64_t PseudoProbeMatchedSampleCount{0};
     ///   the number of stale functions that have matching number of blocks in
     ///   the profile
     uint64_t NumStaleFuncsWithEqualBlockCount{0};
diff --git a/bolt/lib/Passes/BinaryPasses.cpp b/bolt/lib/Passes/BinaryPasses.cpp
index fa95ad7324ac1..b786f07a6a665 100644
--- a/bolt/lib/Passes/BinaryPasses.cpp
+++ b/bolt/lib/Passes/BinaryPasses.cpp
@@ -1519,10 +1519,20 @@ Error PrintProgramStats::runOnFunctions(BinaryContext &BC) {
         "BOLT-INFO: inference found an exact match for %.2f%% of basic blocks"
         " (%zu out of %zu stale) responsible for %.2f%% samples"
         " (%zu out of %zu stale)\n",
-        100.0 * BC.Stats.NumMatchedBlocks / BC.Stats.NumStaleBlocks,
-        BC.Stats.NumMatchedBlocks, BC.Stats.NumStaleBlocks,
-        100.0 * BC.Stats.MatchedSampleCount / BC.Stats.StaleSampleCount,
-        BC.Stats.MatchedSampleCount, BC.Stats.StaleSampleCount);
+        100.0 * BC.Stats.NumExactMatchedBlocks / BC.Stats.NumStaleBlocks,
+        BC.Stats.NumExactMatchedBlocks, BC.Stats.NumStaleBlocks,
+        100.0 * BC.Stats.ExactMatchedSampleCount / BC.Stats.StaleSampleCount,
+        BC.Stats.ExactMatchedSampleCount, BC.Stats.StaleSampleCount);
+    BC.outs() << format(
+        "BOLT-INFO: inference found a pseudo probe match for %.2f%% of basic "
+        "blocks"
+        " (%zu out of %zu stale) responsible for %.2f%% samples"
+        " (%zu out of %zu stale)\n",
+        100.0 * BC.Stats.NumPseudoProbeMatchedBlocks / BC.Stats.NumStaleBlocks,
+        BC.Stats.NumPseudoProbeMatchedBlocks, BC.Stats.NumStaleBlocks,
+        100.0 * BC.Stats.PseudoProbeMatchedSampleCount /
+            BC.Stats.StaleSampleCount,
+        BC.Stats.PseudoProbeMatchedSampleCount, BC.Stats.StaleSampleCount);
   }
 
   if (const uint64_t NumUnusedObjects = BC.getNumUnusedProfiledObjects()) {
diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp b/bolt/lib/Profile/StaleProfileMatching.cpp
index cd6e96f7e2cf4..c621c29a0db83 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -45,6 +45,7 @@ namespace opts {
 
 extern cl::opt<bool> TimeRewrite;
 extern cl::OptionCategory BoltOptCategory;
+extern cl::opt<unsigned> Verbosity;
 
 cl::opt<bool>
     InferStaleProfile("infer-stale-profile",
@@ -194,7 +195,13 @@ class StaleMatcher {
   /// Initialize stale matcher.
   void init(const std::vector<FlowBlock *> &Blocks,
             const std::vector<BlendedBlockHash> &Hashes,
-            const std::vector<uint64_t> &CallHashes) {
+            const std::vector<uint64_t> &CallHashes,
+            const std::unordered_map<uint64_t,
+                                     std::vector<const MCDecodedPseudoProbe *>>
+                &IndexToBinaryPseudoProbes,
+            const std::unordered_map<const MCDecodedPseudoProbe *, FlowBlock *>
+                &BinaryPseudoProbeToBlock,
+            const uint64_t YamlBFGUID) {
     assert(Blocks.size() == Hashes.size() &&
            Hashes.size() == CallHashes.size() &&
            "incorrect matcher initialization");
@@ -205,14 +212,24 @@ class StaleMatcher {
       if (CallHashes[I])
         CallHashToBlocks[CallHashes[I]].push_back(
             std::make_pair(Hashes[I], Block));
+      this->Blocks.push_back(Block);
     }
+    this->IndexToBinaryPseudoProbes = IndexToBinaryPseudoProbes;
+    this->BinaryPseudoProbeToBlock = BinaryPseudoProbeToBlock;
+    this->YamlBFGUID = YamlBFGUID;
   }
 
   /// Find the most similar block for a given hash.
-  const FlowBlock *matchBlock(BlendedBlockHash BlendedHash,
-                              uint64_t CallHash) const {
+  const FlowBlock *
+  matchBlock(BlendedBlockHash BlendedHash, uint64_t CallHash,
+             const std::vector<yaml::bolt::PseudoProbeInfo> &PseudoProbes) {
     const FlowBlock *BestBlock = matchWithOpcodes(BlendedHash);
-    return BestBlock ? BestBlock : matchWithCalls(BlendedHash, CallHash);
+    if (BestBlock)
+      ++MatchedWithOpcodes;
+    BestBlock = BestBlock ? BestBlock : matchWithCalls(BlendedHash, CallHash);
+    return BestBlock || !YamlBFGUID
+               ? BestBlock
+               : matchWithPseudoProbes(BlendedHash, PseudoProbes);
   }
 
   /// Returns true if the two basic blocks (in the binary and in the profile)
@@ -223,10 +240,32 @@ class StaleMatcher {
     return Hash1.InstrHash == Hash2.InstrHash;
   }
 
+  /// Returns true if a profiled block was matched with its pseudo probe.
+  bool isPseudoProbeMatch(BlendedBlockHash YamlBBHash) {
+    return MatchedWithPseudoProbes.find(YamlBBHash.combine()) !=
+           MatchedWithPseudoProbes.end();
+  }
+
+  /// Returns the number of blocks matched with pseudo probes.
+  size_t getNumBlocksMatchedWithPseudoProbes() const {
+    return MatchedWithPseudoProbes.size();
+  }
+
+  /// Returns the number of blocks matched with opcodes.
+  size_t getNumBlocksMatchedWithOpcodes() const { return MatchedWithOpcodes; }
+
 private:
   using HashBlockPairType = std::pair<BlendedBlockHash, FlowBlock *>;
   std::unordered_map<uint16_t, std::vector<HashBlockPairType>> OpHashToBlocks;
   std::unordered_map<uint64_t, std::vector<HashBlockPairType>> CallHashToBlocks;
+  std::unordered_map<uint64_t, std::vector<const MCDecodedPseudoProbe *>>
+      IndexToBinaryPseudoProbes;
+  std::unordered_map<const MCDecodedPseudoProbe *, FlowBlock *>
+      BinaryPseudoProbeToBlock;
+  std::unordered_set<uint64_t> MatchedWithPseudoProbes;
+  std::vector<const FlowBlock *> Blocks;
+  uint64_t YamlBFGUID{0};
+  uint64_t MatchedWithOpcodes{0};
 
   // Uses OpcodeHash to find the most similar block for a given hash.
   const FlowBlock *matchWithOpcodes(BlendedBlockHash BlendedHash) const {
@@ -266,6 +305,68 @@ class StaleMatcher {
     }
     return BestBlock;
   }
+  // Uses pseudo probe information to attach the profile to the appropriate
+  // block.
+  const FlowBlock *matchWithPseudoProbes(
+      BlendedBlockHash BlendedHash,
+      const std::vector<yaml::bolt::PseudoProbeInfo> &PseudoProbes) {
+    // Searches for the pseudo probe attached to the matched function's block,
+    // ignoring pseudo probes attached to function calls and inlined functions'
+    // blocks.
+    if (opts::Verbosity >= 3)
+      outs() << "BOLT-INFO: attempting to match block with pseudo probes\n";
+
+    std::vector<const yaml::bolt::PseudoProbeInfo *> BlockPseudoProbes;
+    for (const auto &PseudoProbe : PseudoProbes) {
+      // Ensures that pseudo probe information belongs to the appropriate
+      // function and not an inlined function.
+      if (PseudoProbe.GUID != YamlBFGUID)
+        continue;
+      // Skips pseudo probes attached to function calls.
+      if (PseudoProbe.Type != static_cast<uint8_t>(PseudoProbeType::Block))
+        continue;
+
+      BlockPseudoProbes.push_back(&PseudoProbe);
+    }
+    // Returns nullptr if there is not a 1:1 mapping of the yaml block pseudo
+    // probe and binary pseudo probe.
+    if (BlockPseudoProbes.size() == 0) {
+      if (opts::Verbosity >= 3)
+        errs() << "BOLT-WARNING: no pseudo probes in profile block\n";
+      return nullptr;
+    }
+    if (BlockPseudoProbes.size() > 1) {
+      if (opts::Verbosity >= 3)
+        errs() << "BOLT-WARNING: more than 1 pseudo probes in profile block\n";
+      return nullptr;
+    }
+    uint64_t Index = BlockPseudoProbes[0]->Index;
+    if (Index > Blocks.size()) {
+      if (opts::Verbosity >= 3)
+        errs() << "BOLT-WARNING: invalid index block pseudo probe index\n";
+      return nullptr;
+    }
+    auto It = IndexToBinaryPseudoProbes.find(Index);
+    if (It == IndexToBinaryPseudoProbes.end()) {
+      if (opts::Verbosity >= 3)
+        errs() << "BOLT-WARNING: no block pseudo probes found within binary "
+                  "block at index\n";
+      return nullptr;
+    }
+    if (It->second.size() > 1) {
+      if (opts::Verbosity >= 3)
+        errs() << "BOLT-WARNING: more than 1 block pseudo probes in binary "
+                  "block at index\n";
+      return nullptr;
+    }
+    const MCDecodedPseudoProbe *BinaryPseudoProbe = It->second[0];
+    auto BinaryPseudoProbeIt = BinaryPseudoProbeToBlock.find(BinaryPseudoProbe);
+    assert(BinaryPseudoProbeIt != BinaryPseudoProbeToBlock.end() &&
+           "All binary pseudo probes should belong a binary basic block");
+
+    MatchedWithPseudoProbes.insert(BlendedHash.combine());
+    return BinaryPseudoProbeIt->second;
+  }
 };
 
 void BinaryFunction::computeBlockHashes(HashFunction HashFunction) const {
@@ -447,18 +548,22 @@ createFlowFunction(const BinaryFunction::BasicBlockOrderType &BlockOrder) {
 /// of the basic blocks in the binary, the count is "matched" to the block.
 /// Similarly, if both the source and the target of a count in the profile are
 /// matched to a jump in the binary, the count is recorded in CFG.
-size_t
-matchWeightsByHashes(BinaryContext &BC,
-                     const BinaryFunction::BasicBlockOrderType &BlockOrder,
-                     const yaml::bolt::BinaryFunctionProfile &YamlBF,
-                     FlowFunction &Func, HashFunction HashFunction,
-                     YAMLProfileReader::ProfileLookupMap &IdToYamlBF) {
+size_t matchWeightsByHashes(
+    BinaryContext &BC, const BinaryFunction::BasicBlockOrderType &BlockOrder,
+    const yaml::bolt::BinaryFunctionProfile &YamlBF, FlowFunction &Func,
+    HashFunction HashFunction, YAMLProfileReader::ProfileLookupMap &IdToYamlBF,
+    const BinaryFunction &BF) {
 
   assert(Func.Blocks.size() == BlockOrder.size() + 2);
 
   std::vector<uint64_t> CallHashes;
   std::vector<FlowBlock *> Blocks;
   std::vector<BlendedBlockHash> BlendedHashes;
+  std::unordered_map<uint64_t, std::vector<const MCDecodedPseudoProbe *>>
+      IndexToBinaryPseudoProbes;
+  std::unordered_map<const MCDecodedPseudoProbe *, FlowBlock *>
+      BinaryPseudoProbeToBlock;
+  const MCPseudoProbeDecoder *PseudoProbeDecoder = BC.getPseudoProbeDecoder();
   for (uint64_t I = 0; I < BlockOrder.size(); I++) {
     const BinaryBasicBlock *BB = BlockOrder[I];
     assert(BB->getHash() != 0 && "empty hash of BinaryBasicBlock");
@@ -478,11 +583,49 @@ matchWeightsByHashes(BinaryContext &BC,
     Blocks.push_back(&Func.Blocks[I + 1]);
     BlendedBlockHash BlendedHash(BB->getHash());
     BlendedHashes.push_back(BlendedHash);
+    if (PseudoProbeDecoder) {
+      const AddressProbesMap &ProbeMap =
+          PseudoProbeDecoder->getAddress2ProbesMap();
+      const uint64_t FuncAddr = BF.getAddress();
+      const std::pair<uint64_t, uint64_t> &BlockRange =
+          BB->getInputAddressRange();
+      const auto &BlockProbes =
+          llvm::make_range(ProbeMap.lower_bound(FuncAddr + BlockRange.first),
+                           ProbeMap.lower_bound(FuncAddr + BlockRange.second));
+      for (const auto &[_, Probes] : BlockProbes) {
+        for (const MCDecodedPseudoProbe &Probe : Probes) {
+          if (Probe.getInlineTreeNode()->hasInlineSite())
+            continue;
+          if (Probe.getType() != static_cast<uint8_t>(PseudoProbeType::Block))
+            continue;
+          IndexToBinaryPseudoProbes[Probe.getIndex()].push_back(&Probe);
+          BinaryPseudoProbeToBlock[&Probe] = Blocks[I];
+        }
+      }
+    }
+
     LLVM_DEBUG(dbgs() << "BB with index " << I << " has hash = "
                       << Twine::utohexstr(BB->getHash()) << "\n");
   }
+
+  uint64_t BFPseudoProbeDescHash = 0;
+  if (BF.getGUID() != 0) {
+    assert(PseudoProbeDecoder &&
+           "If BF has pseudo probe, BC should have a pseudo probe decoder");
+    auto &GUID2FuncDescMap = PseudoProbeDecoder->getGUID2FuncDescMap();
+    auto It = GUID2FuncDescMap.find(BF.getGUID());
+    if (It != GUID2FuncDescMap.end())
+      BFPseudoProbeDescHash = It->second.FuncHash;
+  }
+  uint64_t YamlBFGUID =
+      BFPseudoProbeDescHash && YamlBF.PseudoProbeDescHash &&
+              BFPseudoProbeDescHash == YamlBF.PseudoProbeDescHash
+          ? static_cast<uint64_t>(YamlBF.GUID)
+          : 0;
+
   StaleMatcher Matcher;
-  Matcher.init(Blocks, BlendedHashes, CallHashes);
+  Matcher.init(Blocks, BlendedHashes, CallHashes, IndexToBinaryPseudoProbes,
+               BinaryPseudoProbeToBlock, YamlBFGUID);
 
   // Index in yaml profile => corresponding (matched) block
   DenseMap<uint64_t, const FlowBlock *> MatchedBlocks;
@@ -502,7 +645,7 @@ matchWeightsByHashes(BinaryContext &BC,
       else
         llvm_unreachable("Unhandled HashFunction");
     }
-    MatchedBlock = Matcher.matchBlock(YamlHash, CallHash);
+    MatchedBlock = Matcher.matchBlock(YamlHash, CallHash, YamlBB.PseudoProbes);
     if (MatchedBlock == nullptr && YamlBB.Index == 0)
       MatchedBlock = Blocks[0];
     if (MatchedBlock != nullptr) {
@@ -516,9 +659,13 @@ matchWeightsByHashes(BinaryContext &BC,
                         << "\n");
       // Update matching stats accounting for the matched block.
       if (Matcher.isHighConfidenceMatch(BinHash, YamlHash)) {
-        ++BC.Stats.NumMatchedBlocks;
-        BC.Stats.MatchedSampleCount += YamlBB.ExecCount;
+        ++BC.Stats.NumExactMatchedBlocks;
+        BC.Stats.ExactMatchedSampleCount += YamlBB.ExecCount;
         LLVM_DEBUG(dbgs() << "  exact match\n");
+      } else if (Matcher.isPseudoProbeMatch(YamlHash)) {
+        ++BC.Stats.NumPseudoProbeMatchedBlocks;
+        BC.Stats.PseudoProbeMatchedSampleCount += YamlBB.ExecCount;
+        LLVM_DEBUG(dbgs() << "  pseudo probe match\n");
       } else {
         LLVM_DEBUG(dbgs() << "  loose match\n");
       }
@@ -535,6 +682,14 @@ matchWeightsByHashes(BinaryContext &BC,
     BC.Stats.StaleSampleCount += YamlBB.ExecCount;
   }
 
+  if (opts::Verbosity >= 2) {
+    outs() << "BOLT-INFO: "
+           << Matcher.getNumBlocksMatchedWithPseudoProbes()
+           << " blocks matched with pseudo probes\n"
+           << "BOLT-INFO: " << Matcher.getNumBlocksMatchedWithOpcodes()
+           << " blocks matched with opcodes\n";
+  }
+
   // Match jumps from the profile to the jumps from CFG
   std::vector<uint64_t> OutWeight(Func.Blocks.size(), 0);
   std::vector<uint64_t> InWeight(Func.Blocks.size(), 0);
@@ -828,7 +983,7 @@ bool YAMLProfileReader::inferStaleProfile(
   // Match as many block/jump counts from the stale profile as possible
   size_t MatchedBlocks =
       matchWeightsByHashes(BF.getBinaryContext(), BlockOrder, YamlBF, Func,
-                           YamlBP.Header.HashFunction, IdToYamLBF);
+                           YamlBP.Header.HashFunction, IdToYamLBF, BF);
 
   // Adjust the flow function by marking unreachable blocks Unlikely so that
   // they don't get any counts assigned.
diff --git a/bolt/test/X86/match-blocks-with-pseudo-probes.test b/bolt/test/X86/match-blocks-with-pseudo-probes.test
new file mode 100644
index 0000000000000..83f9c20f31ba6
--- /dev/null
+++ b/bolt/test/X86/match-blocks-with-pseudo-probes.test
@@ -0,0 +1,62 @@
+## Tests stale block matching with pseudo probes.
+
+# REQUIRES: system-linux
+# RUN: split-file %s %t
+# RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown %t/main.s -o %t.o
+# RUN: %clang %cflags %t.o -o %t.exe -Wl,-q -nostdlib
+# RUN: llvm-bolt %t.exe -o %t.out --data %t/yaml -v=2 \
+# RUN:   --print-cfg --funcs=main --profile-ignore-hash=0 --infer-stale-profile 2>&1 | FileCheck %s
+
+# CHECK: BOLT-INFO: inference found a pseudo probe match for 100.00% of basic blocks (1 out of 1 stale) responsible for -nan% samples (0 out of 0 stale)
+
+#--- main.s
+ .text
+  .globl  main                            # -- Begin function main
+  .p2align        4, 0x90
+  .type   main,@function
+main:                                   # @main
+# %bb.0:
+  pushq   %rbp
+  movq    %rsp, %rbp
+  movl    $0, -4(%rbp)
+  .pseudoprobe    15822663052811949562 1 0 0 main
+  xorl    %eax, %eax
+  popq    %rbp
+  retq
+.Lfunc_end0:
+  .size   main, .Lfunc_end0-main
+                                  # -- End function
+  .section        .pseudo_probe_desc,"",@progbits
+  .quad   -2624081020897602054
+  .quad   4294967295
+  .byte   4
+  .ascii  "main"
+  .ident  "clang version 17.0.6 (CentOS 17.0.6-5.el9)"
+  .section        ".note.GNU-stack","",@progbits
+  .addrsig
+
+#--- yaml
+---
+header:
+  profile-version: 1
+  binary-name:     'match-blocks-with-pseudo-probes.s.tmp.exe'
+  binary-build-id: '<unknown>'
+  profile-flags:   [ lbr ]
+  profile-origin:  branch profile reader
+  profile-events:  ''
+  dfs-order:       false
+  hash-func:       xxh3
+functions:
+  - name:                   main
+    fid:                    0
+    hash:                   0x0000000000000001
+    exec:                   1
+    nblocks:                6
+    guid:                   0xDB956436E78DD5FA
+    pseudo_probe_desc_hash: 4294967295    #lookup in code in a second
+    blocks:
+      - bid:             1
+        hash:            0x0000000000000001
+        insns:           1
+        succ:            [ { bid: 3, cnt: 1} ]
+        pseudo_probes: [ { guid: 0xDB956436E78DD5FA, id: 1, type: 0 } ]

Copy link
Contributor

@aaupov aaupov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good with a couple of nits.

bolt/lib/Profile/StaleProfileMatching.cpp Outdated Show resolved Hide resolved
bolt/lib/Profile/StaleProfileMatching.cpp Outdated Show resolved Hide resolved
bolt/lib/Profile/StaleProfileMatching.cpp Outdated Show resolved Hide resolved
shawbyoung and others added 2 commits July 24, 2024 07:42
Created using spr 1.3.4
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Sep 12, 2024
The flag currently controls writing of probe information in YAML
profile. llvm#99891 adds a separate flag to use probe information for stale
profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer
and `profile-write-pseudo-probes` better captures the intent.

Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci

Reviewed By: rafaelauler

Pull Request: llvm#106364
aaupov added a commit that referenced this pull request Sep 13, 2024
Add probe inline tree information to YAML profile, at function level:
- function GUID,
- checksum,
- parent node id,
- call site in the parent.

This information is used for pseudo probe block matching (#99891).

The encoding adds/changes probe information in multiple levels of
YAML profile:
- BinaryProfile: add pseudo_probe_desc with GUIDs and Hashes, which
  permits deduplication of data:
  - many GUIDs are duplicate as the same callee is commonly inlined
    into multiple callers,
  - hashes are also very repetitive, especially for functions with
    low block counts.
- FunctionProfile: add inline tree (see above). Top-level function
  is included as root of function inline tree, which makes guid and
  pseudo_probe_desc_hash fields redundant.
- BlockProfile: densely-encoded block probe information:
  - probes reference their containing inline tree node,
  - separate lists for block, call, indirect call probes,
  - block probe encoding is specialized: ids are encoded as bitset
    in uint64_t. If only block probe with id=1 is present, it's
    encoded as implicit entry (id=0, omitted).
  - inline tree nodes with identical probes share probe description
    where node indices are combined into a list.

On top of #107970, profile with new probe encoding has the following
characteristics (profile for a large binary):

- Profile without probe information: 33MB, 3.8MB compressed (baseline).
- Profile with inline tree information: 92MB, 14MB compressed.

Profile processing time (YAML parsing, inference, attaching steps):
- profile without pseudo probes: 5s,
- profile with pseudo probes, without pseudo probe matching: 11s,
- with pseudo probe matching: 12.5s.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: wlei-llvm, ayermolo, rafaelauler, dcci, maksfb

Reviewed By: wlei-llvm, rafaelauler

Pull Request: #107137
bolt/include/bolt/Core/BinaryContext.h Outdated Show resolved Hide resolved
bolt/lib/Profile/YAMLProfileReader.cpp Outdated Show resolved Hide resolved
bolt/lib/Profile/StaleProfileMatching.cpp Outdated Show resolved Hide resolved
bolt/include/bolt/Profile/YAMLProfileReader.h Outdated Show resolved Hide resolved
bolt/lib/Profile/StaleProfileMatching.cpp Outdated Show resolved Hide resolved
bolt/include/bolt/Profile/YAMLProfileReader.h Outdated Show resolved Hide resolved
bolt/include/bolt/Profile/YAMLProfileReader.h Show resolved Hide resolved
bolt/include/bolt/Profile/YAMLProfileReader.h Outdated Show resolved Hide resolved
bolt/lib/Profile/StaleProfileMatching.cpp Outdated Show resolved Hide resolved
@aaupov
Copy link
Contributor

aaupov commented Oct 29, 2024

Ping @wlei-llvm

@wlei-llvm
Copy link
Contributor

Ping @wlei-llvm

Sorry for the delay. The new version addressed my last comment (with just minor nits). However, I didn't fully follow the new features related to ProbeMatchSpecs stuffs. Could you add more descriptions to the diff summary? Or if it’s not a lot of work, could we split it into two patches? We could commit the first part, and I will review the second part separately.

bolt/lib/Profile/YAMLProfileReader.cpp Outdated Show resolved Hide resolved
bolt/include/bolt/Core/BinaryContext.h Outdated Show resolved Hide resolved
@wlei-llvm
Copy link
Contributor

Ping @wlei-llvm

Sorry for the delay. The new version addressed my last comment (with just minor nits). However, I didn't fully follow the new features related to ProbeMatchSpecs stuffs. Could you add more descriptions to the diff summary? Or if it’s not a lot of work, could we split it into two patches? We could commit the first part, and I will review the second part separately.

NVM, I think now I get what ProbeMatchSpecs does, it's a vector because a function can have multiple sections(function split)

bolt/lib/Profile/StaleProfileMatching.cpp Outdated Show resolved Hide resolved
bolt/lib/Profile/StaleProfileMatching.cpp Outdated Show resolved Hide resolved
@aaupov
Copy link
Contributor

aaupov commented Nov 8, 2024

Ping @wlei-llvm

Sorry for the delay. The new version addressed my last comment (with just minor nits). However, I didn't fully follow the new features related to ProbeMatchSpecs stuffs. Could you add more descriptions to the diff summary? Or if it’s not a lot of work, could we split it into two patches? We could commit the first part, and I will review the second part separately.

NVM, I think now I get what ProbeMatchSpecs does, it's a vector because a function can have multiple sections(function split)

Thank you for reviewing and sorry for the delay from my end, was busy with profile quality work.

ProbeMatchSpecs is a mechanism to match probes belonging to another binary function. I'm going to utilize it in probe-based function matching (#100446). For example:

source function:

void foo() {
  bar();
}

profiled binary: bar is not inlined => have top-level function bar
new binary (where the profile is applied to): bar is inlined into foo.

Right now, BOLT does 1:1 matching between profile functions and binary functions based on the name. #100446 will extend this to N:M where multiple profiles can be matched to one binary function (as in the example above where binary function foo would use profiles for foo and bar), and one profile can be matched to multiple binary functions (eg if bar was inlined into multiple functions).

In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile (existing name-based matching).

Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
aaupov added a commit that referenced this pull request Nov 8, 2024
YAML function profiles have sparse function IDs, assigned from
sequential function IDs from profiled binary. For example, for one large
binary, YAML profile has 15K functions, but the highest ID is ~600K,
close to number of functions in the profiled binary.

In `matchProfileToFunction`, `YamlProfileToFunction` vector was resized
to match function ID, which entails a 40X overcommit. Change the type of
`YamlProfileToFunction` to DenseMap to reduce memory utilization.

#99891 makes use of it for profile lookup associated with a given binary
function.
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
@aaupov aaupov changed the base branch from users/shawbyoung/spr/main.bolt-match-blocks-with-pseudo-probes to main November 8, 2024 23:26
@wlei-llvm
Copy link
Contributor

Ping @wlei-llvm

Sorry for the delay. The new version addressed my last comment (with just minor nits). However, I didn't fully follow the new features related to ProbeMatchSpecs stuffs. Could you add more descriptions to the diff summary? Or if it’s not a lot of work, could we split it into two patches? We could commit the first part, and I will review the second part separately.

NVM, I think now I get what ProbeMatchSpecs does, it's a vector because a function can have multiple sections(function split)

Thank you for reviewing and sorry for the delay from my end, was busy with profile quality work.

ProbeMatchSpecs is a mechanism to match probes belonging to another binary function. I'm going to utilize it in probe-based function matching (#100446). For example:

source function:

void foo() {
  bar();
}

profiled binary: bar is not inlined => have top-level function bar new binary (where the profile is applied to): bar is inlined into foo.

Right now, BOLT does 1:1 matching between profile functions and binary functions based on the name. #100446 will extend this to N:M where multiple profiles can be matched to one binary function (as in the example above where binary function foo would use profiles for foo and bar), and one profile can be matched to multiple binary functions (eg if bar was inlined into multiple functions).

In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile (existing name-based matching).

Thanks for the explanation! I was confused of the use of ProbeMatchSpecs, it would be great to add more descriptions in the diff summary or somewhere in the comments, or in the another patch when it's used(IMO if we add a feature, but we doesn't use it in the patch, it should be in the future patch when it's used)

Copy link
Contributor

@wlei-llvm wlei-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing the comments!

@aaupov
Copy link
Contributor

aaupov commented Nov 9, 2024

Ping @wlei-llvm

Sorry for the delay. The new version addressed my last comment (with just minor nits). However, I didn't fully follow the new features related to ProbeMatchSpecs stuffs. Could you add more descriptions to the diff summary? Or if it’s not a lot of work, could we split it into two patches? We could commit the first part, and I will review the second part separately.

NVM, I think now I get what ProbeMatchSpecs does, it's a vector because a function can have multiple sections(function split)

Thank you for reviewing and sorry for the delay from my end, was busy with profile quality work.
ProbeMatchSpecs is a mechanism to match probes belonging to another binary function. I'm going to utilize it in probe-based function matching (#100446). For example:
source function:

void foo() {
  bar();
}

profiled binary: bar is not inlined => have top-level function bar new binary (where the profile is applied to): bar is inlined into foo.
Right now, BOLT does 1:1 matching between profile functions and binary functions based on the name. #100446 will extend this to N:M where multiple profiles can be matched to one binary function (as in the example above where binary function foo would use profiles for foo and bar), and one profile can be matched to multiple binary functions (eg if bar was inlined into multiple functions).
In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile (existing name-based matching).

Thanks for the explanation! I was confused of the use of ProbeMatchSpecs, it would be great to add more descriptions in the diff summary or somewhere in the comments, or in the another patch when it's used(IMO if we add a feature, but we doesn't use it in the patch, it should be in the future patch when it's used)

Thank you. Added description of ProbeMatchSpecs to the summary.
I decided to introduce it in this diff because there's tight coupling between probe-based block matching and function matching. This way, probe-based function matching would not need to change how block matching works.

@wlei-llvm
Copy link
Contributor

Ping @wlei-llvm

Sorry for the delay. The new version addressed my last comment (with just minor nits). However, I didn't fully follow the new features related to ProbeMatchSpecs stuffs. Could you add more descriptions to the diff summary? Or if it’s not a lot of work, could we split it into two patches? We could commit the first part, and I will review the second part separately.

NVM, I think now I get what ProbeMatchSpecs does, it's a vector because a function can have multiple sections(function split)

Thank you for reviewing and sorry for the delay from my end, was busy with profile quality work.
ProbeMatchSpecs is a mechanism to match probes belonging to another binary function. I'm going to utilize it in probe-based function matching (#100446). For example:
source function:

void foo() {
  bar();
}

profiled binary: bar is not inlined => have top-level function bar new binary (where the profile is applied to): bar is inlined into foo.
Right now, BOLT does 1:1 matching between profile functions and binary functions based on the name. #100446 will extend this to N:M where multiple profiles can be matched to one binary function (as in the example above where binary function foo would use profiles for foo and bar), and one profile can be matched to multiple binary functions (eg if bar was inlined into multiple functions).
In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile (existing name-based matching).

Thanks for the explanation! I was confused of the use of ProbeMatchSpecs, it would be great to add more descriptions in the diff summary or somewhere in the comments, or in the another patch when it's used(IMO if we add a feature, but we doesn't use it in the patch, it should be in the future patch when it's used)

Thank you. Added description of ProbeMatchSpecs to the summary. I decided to introduce it in this diff because there's tight coupling between probe-based block matching and function matching. This way, probe-based function matching would not need to change how block matching works.

Makes sense, thanks for clarifying!

@aaupov aaupov merged commit 9a9af0a into main Nov 12, 2024
9 of 10 checks passed
@aaupov aaupov deleted the users/shawbyoung/spr/bolt-match-blocks-with-pseudo-probes branch November 12, 2024 15:21
@kazutakahirata
Copy link
Contributor

I've fixed a couple of warnings from this PR with 06e0869. Thanks!

Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
YAML function profiles have sparse function IDs, assigned from
sequential function IDs from profiled binary. For example, for one large
binary, YAML profile has 15K functions, but the highest ID is ~600K,
close to number of functions in the profiled binary.

In `matchProfileToFunction`, `YamlProfileToFunction` vector was resized
to match function ID, which entails a 40X overcommit. Change the type of
`YamlProfileToFunction` to DenseMap to reduce memory utilization.

llvm#99891 makes use of it for profile lookup associated with a given binary
function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants