Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOLT] Match blocks with pseudo probes #99891

Merged
merged 65 commits into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from 61 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
5a5991a
[𝘀𝗽𝗿] initial version
shawbyoung Jul 22, 2024
94ffb45
[𝘀𝗽𝗿] changes to main this commit is based on
aaupov Jul 22, 2024
0274f69
Changed assignment of profiles with pseudo probe index
shawbyoung Jul 22, 2024
7e3d8d6
Edit test and assert
shawbyoung Jul 22, 2024
780a07e
Fixed failing asserts, pruned prospective pseudo probes for matching
shawbyoung Jul 23, 2024
1638ac1
Added logging for pseudo probe block matching
shawbyoung Jul 23, 2024
144716b
Changed pseudo probe matching failure logging to v=3
shawbyoung Jul 23, 2024
2934710
More loggin
shawbyoung Jul 23, 2024
b74fc8b
Logging blocks matched with opcodes
shawbyoung Jul 23, 2024
c38fb98
Updated test
shawbyoung Jul 23, 2024
b2a3ca7
Name changes in prep for inlined block pseudo probe block matching
shawbyoung Jul 23, 2024
2eb7bf2
Rm unnecessary Blocks vec in StaleMatcher
shawbyoung Jul 24, 2024
212bd00
Improved matched block counting
shawbyoung Jul 24, 2024
eb6dfb9
Removed comment from test
shawbyoung Jul 24, 2024
16b5cfb
Added comments and check for null YamlBFGUID in StaleMatcher before P…
shawbyoung Jul 24, 2024
799f20c
Omitting braces in one line if
shawbyoung Jul 24, 2024
1e9af7f
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Jul 24, 2024
e3599d4
Pseudo probe matching now is triggered by flag
shawbyoung Jul 24, 2024
33f1b2a
Omit unnecessary braces
shawbyoung Jul 24, 2024
9889f89
Change initialization of index -> probe and probe -> block mappings
shawbyoung Jul 24, 2024
022c517
Formatting
shawbyoung Jul 24, 2024
5109893
Comments
shawbyoung Jul 24, 2024
5bf4220
Changed std ADTs to LLVM
shawbyoung Jul 24, 2024
f1179b1
In matchWithPseudoProbe, hoist BlocksPseudoProbes.size(), added loggi…
shawbyoung Jul 24, 2024
5076bab
A more beautiful helper function for matchWithPseudoProbes
shawbyoung Jul 24, 2024
4f2f642
Added inlined block pseudo probe matching
shawbyoung Jul 25, 2024
327eb81
Added flag to trigger pseudo probe block matching
shawbyoung Jul 25, 2024
37793aa
Added flag for pseudo probe block matching
shawbyoung Jul 25, 2024
ba00b22
Set flag init val, changed std::string to StringRef
shawbyoung Jul 25, 2024
5e47249
[BOLT][NFC] Add timers for MetadataManager invocations
aaupov Aug 1, 2024
3902eff
[MC][NFC] Count pseudo probes and function records
aaupov Aug 26, 2024
d20d4d6
[MC][NFC] Drop unused MCDecodedPseudoProbeInlineTree::ChildrenToProce…
aaupov Jul 25, 2024
a857d32
[profgen][NFC] Pass parameter as const_ref
aaupov Aug 11, 2024
cddea6a
[MC][NFC] Statically allocate storage for decoded pseudo probes and f…
aaupov Aug 26, 2024
9746055
[MC][profgen][NFC] Expand auto for MCDecodedPseudoProbe
aaupov Aug 11, 2024
3dcef48
[MC][NFC] Reduce Address2ProbesMap size
aaupov Aug 26, 2024
ba149d9
[MC][NFC] Use vector for GUIDProbeFunctionMap
aaupov Aug 26, 2024
c35e8ac
buildAddress2ProbeMap timers
aaupov Jul 30, 2024
1c469cf
[BOLT][NFC] Rename profile-use-pseudo-probes
aaupov Aug 27, 2024
97f8101
[BOLT][NFCI] Strip suffix in getLTOCommonName
aaupov Aug 27, 2024
e0a705e
[BOLT] Only parse probes for profiled functions in profile-write-pseu…
aaupov Aug 26, 2024
66fe5d5
[BOLT] Add pseudo probe inline tree to YAML profile
aaupov Aug 31, 2024
36197b1
Reworked block probe matching
aaupov Aug 28, 2024
bfa0afc
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Sep 4, 2024
0f455d0
rebase
aaupov Sep 4, 2024
8fafc04
drop logIf
aaupov Sep 4, 2024
b1be6e6
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Sep 10, 2024
4c5156c
Use new profile probe encoding
aaupov Sep 10, 2024
205c79c
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Sep 10, 2024
544a6ad
Test fix
aaupov Sep 10, 2024
ee214d5
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Sep 11, 2024
0bb4e3a
Memoize top-level GUID->InlineTree mapping, cuts inference time by ~30%
aaupov Sep 11, 2024
880bd37
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Sep 11, 2024
2ba5591
clang-format
aaupov Sep 11, 2024
75d6229
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Sep 12, 2024
3b4e3f4
Update with #107137 changes
aaupov Sep 12, 2024
41e1fa0
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Sep 12, 2024
7ee82b6
Move matchInlineTrees into InlineTreeNodeMapTy
aaupov Sep 12, 2024
648f2bb
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Sep 26, 2024
e8e1cb9
Add test with inline trees, address comments
aaupov Sep 26, 2024
ebd3acf
Allow null block participate in majority vote, improves run-time perf…
aaupov Sep 26, 2024
c84de42
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Nov 8, 2024
e7bce6d
address comments
aaupov Nov 8, 2024
2502434
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Nov 8, 2024
956bcf2
rebase
aaupov Nov 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 20 additions & 4 deletions bolt/include/bolt/Core/BinaryContext.h
Original file line number Diff line number Diff line change
Expand Up @@ -722,12 +722,28 @@ class BinaryContext {
/// Stats for stale profile matching:
/// the total number of basic blocks in the profile
uint32_t NumStaleBlocks{0};
/// the number of matched basic blocks
uint32_t NumMatchedBlocks{0};
/// the number of exactly matched basic blocks
uint32_t NumExactMatchedBlocks{0};
/// the number of loosely matched basic blocks
uint32_t NumLooseMatchedBlocks{0};
/// the number of exactly pseudo probe matched basic blocks
uint32_t NumPseudoProbeExactMatchedBlocks{0};
/// the number of loosely pseudo probe matched basic blocks
uint32_t NumPseudoProbeLooseMatchedBlocks{0};
/// the number of call matched basic blocks
uint32_t NumCallMatchedBlocks{0};
/// the total count of samples in the profile
uint64_t StaleSampleCount{0};
/// the count of matched samples
uint64_t MatchedSampleCount{0};
/// the count of exactly matched samples
uint64_t ExactMatchedSampleCount{0};
/// the count of exactly matched samples
aaupov marked this conversation as resolved.
Show resolved Hide resolved
aaupov marked this conversation as resolved.
Show resolved Hide resolved
uint64_t LooseMatchedSampleCount{0};
/// the count of exactly pseudo probe matched samples
uint64_t PseudoProbeExactMatchedSampleCount{0};
/// the count of loosely pseudo probe matched samples
uint64_t PseudoProbeLooseMatchedSampleCount{0};
/// the count of call matched samples
uint64_t CallMatchedSampleCount{0};
/// the number of stale functions that have matching number of blocks in
/// the profile
uint64_t NumStaleFuncsWithEqualBlockCount{0};
Expand Down
3 changes: 3 additions & 0 deletions bolt/include/bolt/Profile/ProfileYAMLMapping.h
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,9 @@ struct InlineTreeNode {
uint32_t CallSiteProbe;
// Index in PseudoProbeDesc.GUID, UINT32_MAX for same as previous (omitted)
uint32_t GUIDIndex;
// Decoded contents, ParentIndexDelta becomes absolute value.
uint64_t GUID;
uint64_t Hash;
bool operator==(const InlineTreeNode &) const { return false; }
};
} // end namespace bolt
Expand Down
78 changes: 74 additions & 4 deletions bolt/include/bolt/Profile/YAMLProfileReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
#include <unordered_set>

namespace llvm {
class MCDecodedPseudoProbeInlineTree;

namespace bolt {

class YAMLProfileReader : public ProfileReaderBase {
Expand Down Expand Up @@ -43,6 +45,9 @@ class YAMLProfileReader : public ProfileReaderBase {
using ProfileLookupMap =
DenseMap<uint32_t, yaml::bolt::BinaryFunctionProfile *>;

using GUIDInlineTreeMap =
std::unordered_map<uint64_t, const MCDecodedPseudoProbeInlineTree *>;

/// A class for matching binary functions in functions in the YAML profile.
/// First, a call graph is constructed for both profiled and binary functions.
/// Then functions are hashed based on the names of their callee/caller
Expand Down Expand Up @@ -96,6 +101,61 @@ class YAMLProfileReader : public ProfileReaderBase {
YamlBFAdjacencyMap;
};

// A class for matching inline tree nodes between profile and binary.
// Provides the mapping from profile inline tree node id to a
// corresponding binary MCDecodedPseudoProbeInlineTree node.
//
// The whole mapping process is the following:
//
// (profile) (binary)
// | blocks ^
// v |
// yaml::bolt::BinaryBasicBlockProfile ~= FlowBlock
// ||| probes ^ (majority vote)
// v ||| BBPseudoProbeToBlock
// yaml::bolt::PseudoProbeInfo MCDecodedPseudoProbe
// | InlineTreeIndex ^
// v | probe id
// [ profile node id (uint32_t) -> MCDecodedPseudoProbeInlineTree *]
// InlineTreeNodeMapTy
class InlineTreeNodeMapTy {
DenseMap<uint32_t, const MCDecodedPseudoProbeInlineTree *> Map;
aaupov marked this conversation as resolved.
Show resolved Hide resolved

void mapInlineTreeNode(uint32_t ProfileNodeIdx,
const MCDecodedPseudoProbeInlineTree *BinaryNode) {
auto Res = Map.try_emplace(ProfileNodeIdx, BinaryNode);
assert(Res.second &&
"Duplicate mapping from profile node index to binary inline tree");
(void)Res;
}

public:
/// Returns matched InlineTree * for a given profile inline_tree_id.
const MCDecodedPseudoProbeInlineTree *
getInlineTreeNode(uint32_t ProfileInlineTreeNodeId) const {
auto It = Map.find(ProfileInlineTreeNodeId);
if (It == Map.end())
return nullptr;
return It->second;
}

// Match up \p YamlInlineTree with binary inline tree rooted at \p Root.
// Return the number of matched nodes.
//
// This function populates the mapping from profile inline tree node id to a
// corresponding binary MCDecodedPseudoProbeInlineTree node.
size_t matchInlineTrees(
const MCPseudoProbeDecoder &Decoder,
const std::vector<yaml::bolt::InlineTreeNode> &YamlInlineTree,
const MCDecodedPseudoProbeInlineTree *Root);
};

// Partial probe matching specification: matched inline tree and corresponding
// BinaryFunctionProfile
using ProbeMatchSpec =
std::pair<InlineTreeNodeMapTy,
std::reference_wrapper<yaml::bolt::BinaryFunctionProfile>>;

private:
/// Adjustments for basic samples profiles (without LBR).
bool NormalizeByInsnCount{false};
Expand All @@ -105,7 +165,7 @@ class YAMLProfileReader : public ProfileReaderBase {
yaml::bolt::BinaryProfile YamlBP;

/// Map a function ID from a YAML profile to a BinaryFunction object.
std::vector<BinaryFunction *> YamlProfileToFunction;
DenseMap<uint32_t, BinaryFunction *> YamlProfileToFunction;

using FunctionSet = std::unordered_set<const BinaryFunction *>;
/// To keep track of functions that have a matched profile before the profile
Expand All @@ -129,6 +189,13 @@ class YAMLProfileReader : public ProfileReaderBase {
/// BinaryFunction pointers indexed by YamlBP functions.
std::vector<BinaryFunction *> ProfileBFs;

// Pseudo probe function GUID to inline tree node
GUIDInlineTreeMap TopLevelGUIDToInlineTree;

// Mapping from a binary function to its partial match specification
// (YAML profile and its inline tree mapping to binary).
DenseMap<BinaryFunction *, std::vector<ProbeMatchSpec>> BFToProbeMatchSpecs;

/// Populate \p Function profile with the one supplied in YAML format.
bool parseFunctionProfile(BinaryFunction &Function,
const yaml::bolt::BinaryFunctionProfile &YamlBF);
Expand All @@ -139,7 +206,8 @@ class YAMLProfileReader : public ProfileReaderBase {

/// Infer function profile from stale data (collected on older binaries).
bool inferStaleProfile(BinaryFunction &Function,
const yaml::bolt::BinaryFunctionProfile &YamlBF);
const yaml::bolt::BinaryFunctionProfile &YamlBF,
const ArrayRef<ProbeMatchSpec> ProbeMatchSpecs);

/// Initialize maps for profile matching.
void buildNameMaps(BinaryContext &BC);
Expand All @@ -156,14 +224,16 @@ class YAMLProfileReader : public ProfileReaderBase {
/// Matches functions using the call graph.
size_t matchWithCallGraph(BinaryContext &BC);

/// Matches functions using the call graph.
/// Populates BF->partial probe match spec map.
size_t matchWithPseudoProbes(BinaryContext &BC);

/// Matches functions with similarly named profiled functions.
size_t matchWithNameSimilarity(BinaryContext &BC);

/// Update matched YAML -> BinaryFunction pair.
void matchProfileToFunction(yaml::bolt::BinaryFunctionProfile &YamlBF,
BinaryFunction &BF) {
if (YamlBF.Id >= YamlProfileToFunction.size())
YamlProfileToFunction.resize(YamlBF.Id + 1);
YamlProfileToFunction[YamlBF.Id] = &BF;
YamlBF.Used = true;

Expand Down
46 changes: 42 additions & 4 deletions bolt/lib/Passes/BinaryPasses.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1519,10 +1519,48 @@ Error PrintProgramStats::runOnFunctions(BinaryContext &BC) {
"BOLT-INFO: inference found an exact match for %.2f%% of basic blocks"
" (%zu out of %zu stale) responsible for %.2f%% samples"
" (%zu out of %zu stale)\n",
100.0 * BC.Stats.NumMatchedBlocks / BC.Stats.NumStaleBlocks,
BC.Stats.NumMatchedBlocks, BC.Stats.NumStaleBlocks,
100.0 * BC.Stats.MatchedSampleCount / BC.Stats.StaleSampleCount,
BC.Stats.MatchedSampleCount, BC.Stats.StaleSampleCount);
100.0 * BC.Stats.NumExactMatchedBlocks / BC.Stats.NumStaleBlocks,
BC.Stats.NumExactMatchedBlocks, BC.Stats.NumStaleBlocks,
100.0 * BC.Stats.ExactMatchedSampleCount / BC.Stats.StaleSampleCount,
BC.Stats.ExactMatchedSampleCount, BC.Stats.StaleSampleCount);
BC.outs() << format(
"BOLT-INFO: inference found an exact pseudo probe match for %.2f%% of "
"basic blocks (%zu out of %zu stale) responsible for %.2f%% samples"
" (%zu out of %zu stale)\n",
100.0 * BC.Stats.NumPseudoProbeExactMatchedBlocks /
BC.Stats.NumStaleBlocks,
BC.Stats.NumPseudoProbeExactMatchedBlocks, BC.Stats.NumStaleBlocks,
100.0 * BC.Stats.PseudoProbeExactMatchedSampleCount /
BC.Stats.StaleSampleCount,
BC.Stats.PseudoProbeExactMatchedSampleCount, BC.Stats.StaleSampleCount);
BC.outs() << format(
"BOLT-INFO: inference found a loose pseudo probe match for %.2f%% of "
"basic blocks (%zu out of %zu stale) responsible for %.2f%% samples"
" (%zu out of %zu stale)\n",
100.0 * BC.Stats.NumPseudoProbeLooseMatchedBlocks /
BC.Stats.NumStaleBlocks,
BC.Stats.NumPseudoProbeLooseMatchedBlocks, BC.Stats.NumStaleBlocks,
100.0 * BC.Stats.PseudoProbeLooseMatchedSampleCount /
BC.Stats.StaleSampleCount,
BC.Stats.PseudoProbeLooseMatchedSampleCount, BC.Stats.StaleSampleCount);
BC.outs() << format(
"BOLT-INFO: inference found a call match for %.2f%% of basic "
"blocks"
" (%zu out of %zu stale) responsible for %.2f%% samples"
" (%zu out of %zu stale)\n",
100.0 * BC.Stats.NumCallMatchedBlocks / BC.Stats.NumStaleBlocks,
BC.Stats.NumCallMatchedBlocks, BC.Stats.NumStaleBlocks,
100.0 * BC.Stats.CallMatchedSampleCount / BC.Stats.StaleSampleCount,
BC.Stats.CallMatchedSampleCount, BC.Stats.StaleSampleCount);
BC.outs() << format(
"BOLT-INFO: inference found a loose match for %.2f%% of basic "
"blocks"
" (%zu out of %zu stale) responsible for %.2f%% samples"
" (%zu out of %zu stale)\n",
100.0 * BC.Stats.NumLooseMatchedBlocks / BC.Stats.NumStaleBlocks,
BC.Stats.NumLooseMatchedBlocks, BC.Stats.NumStaleBlocks,
100.0 * BC.Stats.LooseMatchedSampleCount / BC.Stats.StaleSampleCount,
BC.Stats.LooseMatchedSampleCount, BC.Stats.StaleSampleCount);
}

if (const uint64_t NumUnusedObjects = BC.getNumUnusedProfiledObjects()) {
Expand Down
Loading