-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SampleFDO] Improve stale profile matching by diff algorithm #87375
Conversation
|
||
// The basic greedy version of Myers's algorithm. Refer to page 6 of the | ||
// original paper. | ||
DiffResult shortestEdit(const std::vector<Anchor> &A, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps name it 'longestCommonSequence' to convey the intention? 'shortestEdit' is implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I was thinking to use shortestEdit
to match the return type which is a set of "edit"(insertion, deletion...), but agreed that LCS is clearer in the context of stale profile matching.
@llvm/pr-subscribers-llvm-transforms Author: Lei Wang (wlei-llvm) ChangesThis change improves the matching algorithm by using the diff algorithm, the current matching algorithm only processes the callsites grouped by the same name functions, it doesn't consider the order relationships between different name functions, this sometimes fails to handle this ambiguous anchor case. For example.
The Full diff: https://github.com/llvm/llvm-project/pull/87375.diff 4 Files Affected:
diff --git a/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h b/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
index 7ae6194da7c9cc..5b56638c13344b 100644
--- a/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
+++ b/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
@@ -19,6 +19,57 @@
namespace llvm {
+// Callsite location based matching anchor.
+struct Anchor {
+ LineLocation Loc;
+ FunctionId FuncId;
+
+ Anchor(const LineLocation &Loc, const FunctionId &FuncId)
+ : Loc(Loc), FuncId(FuncId) {}
+ bool operator==(const Anchor &Other) const {
+ return this->FuncId == Other.FuncId;
+ }
+};
+
+// This class implements the Myers diff algorithm used for stale profile
+// matching. The algorithm provides a simple and efficient way to find the
+// Longest Common Subsequence(LCS) or the Shortest Edit Script(SES) of two
+// sequences. For more details, refer to the paper 'An O(ND) Difference
+// Algorithm and Its Variations' by Eugene W. Myers.
+// In the scenario of profile fuzzy matching, the two sequences are the IR
+// callsite anchors and profile callsite anchors. The subsequence equivalent
+// parts from the resulting SES are used to remap the IR locations to the
+// profile locations.
+class MyersDiff {
+public:
+ struct DiffResult {
+ LocToLocMap EqualLocations;
+#ifndef NDEBUG
+ // New IR locations that are inserted in the new version.
+ std::vector<LineLocation> Insertions;
+ // Old Profile locations that are deleted in the new version.
+ std::vector<LineLocation> Deletions;
+#endif
+ void addEqualLocations(const LineLocation &IRLoc,
+ const LineLocation &ProfLoc) {
+ EqualLocations.insert({IRLoc, ProfLoc});
+ }
+#ifndef NDEBUG
+ void addInsertion(const LineLocation &IRLoc) {
+ Insertions.push_back(IRLoc);
+ }
+ void addDeletion(const LineLocation &ProfLoc) {
+ Deletions.push_back(ProfLoc);
+ }
+#endif
+ };
+
+ // The basic greedy version of Myers's algorithm. Refer to page 6 of the
+ // original paper.
+ DiffResult longestCommonSequence(const std::vector<Anchor> &A,
+ const std::vector<Anchor> &B) const;
+};
+
// Sample profile matching - fuzzy match.
class SampleProfileMatcher {
Module &M;
@@ -27,8 +78,8 @@ class SampleProfileMatcher {
const ThinOrFullLTOPhase LTOPhase;
SampleProfileMap FlattenedProfiles;
// For each function, the matcher generates a map, of which each entry is a
- // mapping from the source location of current build to the source location in
- // the profile.
+ // mapping from the source location of current build to the source location
+ // in the profile.
StringMap<LocToLocMap> FuncMappings;
// Match state for an anchor/callsite.
@@ -143,6 +194,10 @@ class SampleProfileMatcher {
}
void distributeIRToProfileLocationMap();
void distributeIRToProfileLocationMap(FunctionSamples &FS);
+ void matchNonAnchorAndWriteResults(
+ const LocToLocMap &AnchorMatchings,
+ const std::map<LineLocation, StringRef> &IRAnchors,
+ LocToLocMap &IRToProfileLocationMap);
void runStaleProfileMatching(
const Function &F, const std::map<LineLocation, StringRef> &IRAnchors,
const std::map<LineLocation, std::unordered_set<FunctionId>>
diff --git a/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp b/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
index 1ca89e0091daff..de81450f0da252 100644
--- a/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
@@ -122,15 +122,153 @@ void SampleProfileMatcher::findProfileAnchors(
}
}
+MyersDiff::DiffResult
+MyersDiff::longestCommonSequence(const std::vector<Anchor> &A,
+ const std::vector<Anchor> &B) const {
+ int32_t N = A.size(), M = B.size(), Max = N + M;
+ auto Index = [&](int32_t I) { return I + Max; };
+
+ DiffResult Diff;
+ if (Max == 0)
+ return Diff;
+
+ // Backtrack the SES result.
+ auto Backtrack = [&](const std::vector<std::vector<int32_t>> &Trace,
+ const std::vector<Anchor> &A,
+ const std::vector<Anchor> &B) {
+ int32_t X = N, Y = M;
+ for (int32_t D = Trace.size() - 1; X > 0 || Y > 0; D--) {
+ const auto &P = Trace[D];
+ int32_t K = X - Y;
+ int32_t PrevK = K;
+ if (K == -D || (K != D && P[Index(K - 1)] < P[Index(K + 1)]))
+ PrevK = K + 1;
+ else
+ PrevK = K - 1;
+
+ int32_t PrevX = P[Index(PrevK)];
+ int32_t PrevY = PrevX - PrevK;
+ while (X > PrevX && Y > PrevY) {
+ X--;
+ Y--;
+ Diff.addEqualLocations(A[X].Loc, B[Y].Loc);
+ }
+
+ if (D == 0)
+ break;
+
+ if (Y == PrevY) {
+ X--;
+#ifndef NDEBUG
+ Diff.addInsertion(A[X].Loc);
+#endif
+ } else if (X == PrevX) {
+ Y--;
+#ifndef NDEBUG
+ Diff.addDeletion(B[Y].Loc);
+#endif
+ }
+ X = PrevX;
+ Y = PrevY;
+ }
+ };
+
+ // The greedy LCS/SES algorithm.
+ std::vector<int32_t> V(2 * Max + 1, -1);
+ V[Index(1)] = 0;
+ std::vector<std::vector<int32_t>> Trace;
+ for (int32_t D = 0; D <= Max; D++) {
+ Trace.push_back(V);
+ for (int32_t K = -D; K <= D; K += 2) {
+ int32_t X = 0, Y = 0;
+ if (K == -D || (K != D && V[Index(K - 1)] < V[Index(K + 1)]))
+ X = V[Index(K + 1)];
+ else
+ X = V[Index(K - 1)] + 1;
+ Y = X - K;
+ while (X < N && Y < M && A[X] == B[Y])
+ X++, Y++;
+
+ V[Index(K)] = X;
+
+ if (X >= N && Y >= M) {
+ // Length of an SES is D.
+ Backtrack(Trace, A, B);
+ return Diff;
+ }
+ }
+ }
+ // Length of an SES is greater than Max.
+ return Diff;
+}
+
+void SampleProfileMatcher::matchNonAnchorAndWriteResults(
+ const LocToLocMap &AnchorMatchings,
+ const std::map<LineLocation, StringRef> &IRAnchors,
+ LocToLocMap &IRToProfileLocationMap) {
+ auto InsertMatching = [&](const LineLocation &From, const LineLocation &To) {
+ // Skip the unchanged location mapping to save memory.
+ if (From != To)
+ IRToProfileLocationMap.insert({From, To});
+ };
+
+ // Use function's beginning location as the initial anchor.
+ int32_t LocationDelta = 0;
+ SmallVector<LineLocation> LastMatchedNonAnchors;
+ for (const auto &IR : IRAnchors) {
+ const auto &Loc = IR.first;
+ [[maybe_unused]] StringRef CalleeName = IR.second;
+ bool IsMatchedAnchor = false;
+
+ // Match the anchor location in lexical order.
+ auto R = AnchorMatchings.find(Loc);
+ if (R != AnchorMatchings.end()) {
+ const auto &Candidate = R->second;
+ InsertMatching(Loc, Candidate);
+ LLVM_DEBUG(dbgs() << "Callsite with callee:" << CalleeName
+ << " is matched from " << Loc << " to " << Candidate
+ << "\n");
+ LocationDelta = Candidate.LineOffset - Loc.LineOffset;
+
+ // Match backwards for non-anchor locations.
+ // The locations in LastMatchedNonAnchors have been matched forwards
+ // based on the previous anchor, spilt it evenly and overwrite the
+ // second half based on the current anchor.
+ for (size_t I = (LastMatchedNonAnchors.size() + 1) / 2;
+ I < LastMatchedNonAnchors.size(); I++) {
+ const auto &L = LastMatchedNonAnchors[I];
+ uint32_t CandidateLineOffset = L.LineOffset + LocationDelta;
+ LineLocation Candidate(CandidateLineOffset, L.Discriminator);
+ InsertMatching(L, Candidate);
+ LLVM_DEBUG(dbgs() << "Location is rematched backwards from " << L
+ << " to " << Candidate << "\n");
+ }
+
+ IsMatchedAnchor = true;
+ LastMatchedNonAnchors.clear();
+ }
+
+ // Match forwards for non-anchor locations.
+ if (!IsMatchedAnchor) {
+ uint32_t CandidateLineOffset = Loc.LineOffset + LocationDelta;
+ LineLocation Candidate(CandidateLineOffset, Loc.Discriminator);
+ InsertMatching(Loc, Candidate);
+ LLVM_DEBUG(dbgs() << "Location is matched from " << Loc << " to "
+ << Candidate << "\n");
+ LastMatchedNonAnchors.emplace_back(Loc);
+ }
+ }
+}
+
// Call target name anchor based profile fuzzy matching.
// Input:
// For IR locations, the anchor is the callee name of direct callsite; For
// profile locations, it's the call target name for BodySamples or inlinee's
// profile name for CallsiteSamples.
// Matching heuristic:
-// First match all the anchors in lexical order, then split the non-anchor
-// locations between the two anchors evenly, first half are matched based on the
-// start anchor, second half are matched based on the end anchor.
+// First match all the anchors using the diff algorithm, then split the
+// non-anchor locations between the two anchors evenly, first half are matched
+// based on the start anchor, second half are matched based on the end anchor.
// For example, given:
// IR locations: [1, 2(foo), 3, 5, 6(bar), 7]
// Profile locations: [1, 2, 3(foo), 4, 7, 8(bar), 9]
@@ -149,77 +287,42 @@ void SampleProfileMatcher::runStaleProfileMatching(
assert(IRToProfileLocationMap.empty() &&
"Run stale profile matching only once per function");
- std::unordered_map<FunctionId, std::set<LineLocation>> CalleeToCallsitesMap;
+ std::vector<Anchor> ProfileCallsiteAnchors;
for (const auto &I : ProfileAnchors) {
const auto &Loc = I.first;
const auto &Callees = I.second;
// Filter out possible indirect calls, use direct callee name as anchor.
if (Callees.size() == 1) {
- FunctionId CalleeName = *Callees.begin();
- const auto &Candidates = CalleeToCallsitesMap.try_emplace(
- CalleeName, std::set<LineLocation>());
- Candidates.first->second.insert(Loc);
+ auto CalleeName = *Callees.begin();
+ ProfileCallsiteAnchors.emplace_back(Loc, CalleeName);
+ } else if (Callees.size() > 1) {
+ ProfileCallsiteAnchors.emplace_back(Loc,
+ FunctionId(UnknownIndirectCallee));
}
}
- auto InsertMatching = [&](const LineLocation &From, const LineLocation &To) {
- // Skip the unchanged location mapping to save memory.
- if (From != To)
- IRToProfileLocationMap.insert({From, To});
- };
-
- // Use function's beginning location as the initial anchor.
- int32_t LocationDelta = 0;
- SmallVector<LineLocation> LastMatchedNonAnchors;
+ std::vector<Anchor> IRCallsiteAnchors;
+ for (const auto &I : IRAnchors) {
+ const auto &Loc = I.first;
+ const auto &CalleeName = I.second;
+ if (CalleeName.empty())
+ continue;
+ IRCallsiteAnchors.emplace_back(Loc, FunctionId(CalleeName));
+ }
- for (const auto &IR : IRAnchors) {
- const auto &Loc = IR.first;
- auto CalleeName = IR.second;
- bool IsMatchedAnchor = false;
- // Match the anchor location in lexical order.
- if (!CalleeName.empty()) {
- auto CandidateAnchors =
- CalleeToCallsitesMap.find(getRepInFormat(CalleeName));
- if (CandidateAnchors != CalleeToCallsitesMap.end() &&
- !CandidateAnchors->second.empty()) {
- auto CI = CandidateAnchors->second.begin();
- const auto Candidate = *CI;
- CandidateAnchors->second.erase(CI);
- InsertMatching(Loc, Candidate);
- LLVM_DEBUG(dbgs() << "Callsite with callee:" << CalleeName
- << " is matched from " << Loc << " to " << Candidate
- << "\n");
- LocationDelta = Candidate.LineOffset - Loc.LineOffset;
-
- // Match backwards for non-anchor locations.
- // The locations in LastMatchedNonAnchors have been matched forwards
- // based on the previous anchor, spilt it evenly and overwrite the
- // second half based on the current anchor.
- for (size_t I = (LastMatchedNonAnchors.size() + 1) / 2;
- I < LastMatchedNonAnchors.size(); I++) {
- const auto &L = LastMatchedNonAnchors[I];
- uint32_t CandidateLineOffset = L.LineOffset + LocationDelta;
- LineLocation Candidate(CandidateLineOffset, L.Discriminator);
- InsertMatching(L, Candidate);
- LLVM_DEBUG(dbgs() << "Location is rematched backwards from " << L
- << " to " << Candidate << "\n");
- }
+ if (IRCallsiteAnchors.empty() || ProfileCallsiteAnchors.empty())
+ return;
- IsMatchedAnchor = true;
- LastMatchedNonAnchors.clear();
- }
- }
+ // Use the diff algorithm to find the LCS/SES, the resulting equal locations
+ // from IR to Profile are used as anchor to match other locations. Note that
+ // here use IR anchor as base(A) to align with the order of
+ // IRToProfileLocationMap.
+ MyersDiff Diff;
+ auto DiffRes =
+ Diff.longestCommonSequence(IRCallsiteAnchors, ProfileCallsiteAnchors);
- // Match forwards for non-anchor locations.
- if (!IsMatchedAnchor) {
- uint32_t CandidateLineOffset = Loc.LineOffset + LocationDelta;
- LineLocation Candidate(CandidateLineOffset, Loc.Discriminator);
- InsertMatching(Loc, Candidate);
- LLVM_DEBUG(dbgs() << "Location is matched from " << Loc << " to "
- << Candidate << "\n");
- LastMatchedNonAnchors.emplace_back(Loc);
- }
- }
+ matchNonAnchorAndWriteResults(DiffRes.EqualLocations, IRAnchors,
+ IRToProfileLocationMap);
}
void SampleProfileMatcher::runOnFunction(Function &F) {
diff --git a/llvm/unittests/Transforms/IPO/CMakeLists.txt b/llvm/unittests/Transforms/IPO/CMakeLists.txt
index 4e4372179b46c7..80d0eadf0cce09 100644
--- a/llvm/unittests/Transforms/IPO/CMakeLists.txt
+++ b/llvm/unittests/Transforms/IPO/CMakeLists.txt
@@ -3,6 +3,7 @@ set(LLVM_LINK_COMPONENTS
AsmParser
Core
IPO
+ ProfileData
Support
TargetParser
TransformUtils
@@ -13,6 +14,7 @@ add_llvm_unittest(IPOTests
WholeProgramDevirt.cpp
AttributorTest.cpp
FunctionSpecializationTest.cpp
+ SampleProfileMatcherTests.cpp
)
set_property(TARGET IPOTests PROPERTY FOLDER "Tests/UnitTests/TransformsTests")
diff --git a/llvm/unittests/Transforms/IPO/SampleProfileMatcherTests.cpp b/llvm/unittests/Transforms/IPO/SampleProfileMatcherTests.cpp
new file mode 100644
index 00000000000000..25d5c053d3ccbd
--- /dev/null
+++ b/llvm/unittests/Transforms/IPO/SampleProfileMatcherTests.cpp
@@ -0,0 +1,150 @@
+//===- SampleProfileMatcherTests.cpp - SampleProfileMatcher Unit Tests -----==//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/IPO/SampleProfileMatcher.h"
+#include "gtest/gtest.h"
+
+using namespace llvm;
+
+MyersDiff Diff;
+
+std::vector<Anchor>
+createAnchorsFromStrings(const std::vector<std::string> &SV) {
+ std::vector<Anchor> Anchors;
+ for (uint64_t I = 0; I < SV.size(); I++) {
+ Anchors.push_back(Anchor(LineLocation(I, 0), FunctionId(SV[I])));
+ }
+ return Anchors;
+}
+
+LocToLocMap
+createEqualLocations(const std::vector<std::pair<uint32_t, uint32_t>> &V) {
+ LocToLocMap LocMap;
+ for (auto P : V) {
+ LocMap.emplace(LineLocation(P.first, 0), LineLocation(P.second, 0));
+ }
+ return LocMap;
+}
+
+std::vector<LineLocation> createLocations(const std::vector<uint32_t> &V) {
+ std::vector<LineLocation> Locations;
+ for (auto I : V) {
+ Locations.emplace_back(LineLocation(I, 0));
+ }
+ return Locations;
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest1) {
+
+ std::vector<Anchor> AnchorsA;
+ std::vector<Anchor> AnchorsB;
+ auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+ EXPECT_TRUE(R.EqualLocations.empty());
+#ifndef NDEBUG
+ EXPECT_TRUE(R.Deletions.empty());
+ EXPECT_TRUE(R.Insertions.empty());
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest2) {
+ std::vector<std::string> A({"a", "b", "c"});
+ std::vector<Anchor> AnchorsA = createAnchorsFromStrings(A);
+ std::vector<Anchor> AnchorsB;
+ auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+ EXPECT_TRUE(R.EqualLocations.empty());
+#ifndef NDEBUG
+ EXPECT_EQ(R.Insertions, createLocations(std::vector<uint32_t>({2, 1, 0})));
+ EXPECT_TRUE(R.Deletions.empty());
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest3) {
+
+ std::vector<Anchor> AnchorsA;
+ std::vector<std::string> B({"a", "b", "c"});
+ std::vector<Anchor> AnchorsB = createAnchorsFromStrings(B);
+ auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+ EXPECT_TRUE(R.EqualLocations.empty());
+#ifndef NDEBUG
+ EXPECT_TRUE(R.Insertions.empty());
+ EXPECT_EQ(R.Deletions, createLocations(std::vector<uint32_t>({2, 1, 0})));
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest4) {
+ std::vector<std::string> A({"a", "b", "c"});
+ std::vector<std::string> B({"a", "b", "c"});
+ std::vector<Anchor> AnchorsA = createAnchorsFromStrings(A);
+ std::vector<Anchor> AnchorsB = createAnchorsFromStrings(B);
+ LocToLocMap ExpectEqualLocations =
+ createEqualLocations({{0, 0}, {1, 1}, {2, 2}});
+ auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+ EXPECT_EQ(R.EqualLocations, ExpectEqualLocations);
+#ifndef NDEBUG
+ EXPECT_TRUE(R.Insertions.empty());
+ EXPECT_TRUE(R.Deletions.empty());
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest5) {
+ std::vector<std::string> A({"a", "b", "c"});
+ std::vector<std::string> B({"b", "c", "d"});
+ std::vector<Anchor> AnchorsA = createAnchorsFromStrings(A);
+ std::vector<Anchor> AnchorsB = createAnchorsFromStrings(B);
+ LocToLocMap ExpectEqualLocations = createEqualLocations({{1, 0}, {2, 1}});
+ auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+ EXPECT_EQ(R.EqualLocations, ExpectEqualLocations);
+#ifndef NDEBUG
+ EXPECT_EQ(R.Insertions, createLocations(std::vector<uint32_t>({0})));
+ EXPECT_EQ(R.Deletions, createLocations(std::vector<uint32_t>({2})));
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest6) {
+ std::vector<std::string> A({"a", "b", "d"});
+ std::vector<std::string> B({"a", "c", "d"});
+ std::vector<Anchor> AnchorsA = createAnchorsFromStrings(A);
+ std::vector<Anchor> AnchorsB = createAnchorsFromStrings(B);
+ LocToLocMap ExpectEqualLocations = createEqualLocations({{0, 0}, {2, 2}});
+ auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+ EXPECT_EQ(R.EqualLocations, ExpectEqualLocations);
+#ifndef NDEBUG
+ EXPECT_EQ(R.Insertions, createLocations(std::vector<uint32_t>({1})));
+ EXPECT_EQ(R.Deletions, createLocations(std::vector<uint32_t>({1})));
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest7) {
+ std::vector<std::string> A({"a", "b", "c", "a", "b", "b", "a"});
+ std::vector<std::string> B({"c", "b", "a", "b", "a", "c"});
+ std::vector<Anchor> AnchorsA = createAnchorsFromStrings(A);
+ std::vector<Anchor> AnchorsB = createAnchorsFromStrings(B);
+ LocToLocMap ExpectEqualLocations =
+ createEqualLocations({{2, 0}, {3, 2}, {4, 3}, {6, 4}});
+ auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+ EXPECT_EQ(R.EqualLocations, ExpectEqualLocations);
+#ifndef NDEBUG
+ EXPECT_EQ(R.Insertions, createLocations(std::vector<uint32_t>({5, 1, 0})));
+ EXPECT_EQ(R.Deletions, createLocations(std::vector<uint32_t>({5, 1})));
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest8) {
+ std::vector<std::string> A({"a", "c", "b", "c", "b", "d", "e"});
+ std::vector<std::string> B({"a", "b", "c", "a", "a", "b", "c", "c", "d"});
+ std::vector<Anchor> AnchorsA = createAnchorsFromStrings(A);
+ std::vector<Anchor> AnchorsB = createAnchorsFromStrings(B);
+ LocToLocMap ExpectEqualLocations =
+ createEqualLocations({{0, 0}, {2, 1}, {3, 2}, {4, 5}, {5, 8}});
+ auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+ EXPECT_EQ(R.EqualLocations, ExpectEqualLocations);
+#ifndef NDEBUG
+ EXPECT_EQ(R.Insertions, createLocations(std::vector<uint32_t>({6, 1})));
+ EXPECT_EQ(R.Deletions, createLocations(std::vector<uint32_t>({7, 6, 4, 3})));
+#endif
+}
|
MyersDiff::DiffResult | ||
MyersDiff::longestCommonSequence(const std::vector<Anchor> &A, | ||
const std::vector<Anchor> &B) const { | ||
int32_t N = A.size(), M = B.size(), Max = N + M; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment: can we replace A,B,M,N,X,Y,V names with something meaningful and more readable? Also Max is not really Max but total size of two sequence.
@@ -0,0 +1,150 @@ | |||
//===- SampleProfileMatcherTests.cpp - SampleProfileMatcher Unit Tests -----==// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight preference to test it with lit rather than unit test. I think we should be able to craft a test case that shows different matching from the new algorithm?
That way we can also remove the test/debug only APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about it, I feel we don't really need this level of unit test? There are a lot algorithms in the compiler, and when developing these algorithms it's important to test the core of it, but I don't know if we need to have this level of unit test committed in tree. Think about profi, or even layout algorithm, or many more, we usually test e2e, rather than creating tests for internals of the algorithms.
I hope we can turn some of this into lit tests and also remove the debug / test functions in diffing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, removed this unit test and the debug things.
MyersDiff::longestCommonSequence(const std::vector<Anchor> &A, | ||
const std::vector<Anchor> &B) const { | ||
int32_t N = A.size(), M = B.size(), Max = N + M; | ||
auto Index = [&](int32_t I) { return I + Max; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this simple +Max
need a lambda function?
else | ||
X = V[Index(K - 1)] + 1; | ||
Y = X - K; | ||
while (X < N && Y < M && A[X] == B[Y]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the reason you convert ProfileAnchors
and IRAnchors
into std::vector<Anchor>
is to remove indirect calls and allow random access by indexing. Looking at how std::vector<Anchor>
, I don't really see real random access (it's always sequential through X++
and Y++
). Can you simply use ProfileAnchors
and IRAnchors
directly but through an iterator/access that skips indirect call locations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert ProfileAnchors and IRAnchors into std::vector is to remove indirect calls
This is to remove the non-anchor
/non-callsite
location, (maybe the original name is confusing), so before the IRAnchors
does contains all the locations, calls and non-calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, I was trying to use the iterator things, but found that the X, Y is not always ++
or --
, it is updated by the previous value X = V[Index(K + 1)];
which can be random, so perhaps it's easier just use the index to access.
#endif | ||
}; | ||
|
||
// The basic greedy version of Myers's algorithm. Refer to page 6 of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Refer to page 6 ..." either mention that together in the comment above where you actually mentioned the paper, or remove it.
|
||
// The basic greedy version of Myers's algorithm. Refer to page 6 of the | ||
// original paper. | ||
DiffResult longestCommonSequence(const std::vector<Anchor> &A, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would simpler if LCS just return a "CommonSequence" as represented by LocToLocMap
.
The abstraction of DiffResult and MyersDiff seem a bit unnecessary. A standalone function LocToLocMap longestCommonSequence(..)
would be cleaner I think.
// here use IR anchor as base(A) to align with the order of | ||
// IRToProfileLocationMap. | ||
MyersDiff Diff; | ||
auto DiffRes = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: LocToLocMap MatchedAnchors = longestCommonSequence(...)
return Diff; | ||
} | ||
|
||
void SampleProfileMatcher::matchNonAnchorAndWriteResults( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "WriteResults" part of this function? Suggest:
matchNonAnchorAndWriteResults -> matchNonAnchorLocations
AnchorMatchings -> MatchedAnchors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "WriteResults", here it writes the matching results to IRToProfileLocationMap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "WriteResults", here it writes the matching results to IRToProfileLocationMap.
I mean.. of course the result will persistent somewhere, is it significant enough to include that in names?
Or should we say "readAnchorsAndMatchNonAnchorAndWriteResults" just because it reads anchors? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the review!
Seems many suggestions are related to specialize the diff algorithm. Let me try to clarify them together here:
- The algorithm is a basic version, the result is optimal, but there are still many space for time/space optimizations, decoupling it from the stale matching could make people easily extend it in future.
- The paper is very classical(assuming it's rigorously designed and written), so I wanted to keep code/variables as much as the same as the paper to convey the intension this is really a "copy-paste" of the algorithm(rather than my maybe inaccurate interpretation).
- In very soon, I want to use this diff algorithm for other purpose: function renaming matching, which use the edit scripts to compute a similarity of profile and IR, like
similarity = equalSet / (equalSet + insertSet + deleteSet)
, so thoseinsertSet
anddeleteSet
will be used there.
can we replace A,B,M,N,X,Y,V names with something meaningful and more readable? Also Max is not really Max but total size of two sequence.
The original paper converts the problem(find the LCS) to find the shortest path in the edit graph, I guess those X,Y,M,N, are just common used notation in graph algorithm area. V
is Vector?
new defined in the paper, but I guess from "An array, V, contains the..."), the MAX
is the upper bound(the longest path). So basically all the variables are the paper's original used/defined variables( reason #2).
It seems the confusing comes up after I renamed the function from "shortestEdit" to "longestCommonSequence", which the paper actually does find the shortestEdit
in graph, I think I need to change it back to shortestEdit
.
Does this simple +Max need a lambda function?
This is also for reason #2, The index in pseudo code of the paper can be negative(guess an old c-style), this is just to simply use V[Index(K - 1)]
to looks closer to V[K - 1]
than V[Max + K - 1]
It would simpler if LCS just return a "CommonSequence" as represented by LocToLocMap.
The abstraction of DiffResult and MyersDiff seem a bit unnecessary. A standalone function LocToLocMap longestCommonSequence(..) would be cleaner I think.
Similarly here, for reason #3, it will need the other fields(insert
/delete
).
Slight preference to test it with lit rather than unit test. I think we should be able to craft a test case that shows different matching from the new algorithm?
That way we can also remove the test/debug only APIs.
Similarly, I wanted to make it easy to extend to a "shared" library, so tried to test it independently.
return Diff; | ||
} | ||
|
||
void SampleProfileMatcher::matchNonAnchorAndWriteResults( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "WriteResults", here it writes the matching results to IRToProfileLocationMap
.
else | ||
X = V[Index(K - 1)] + 1; | ||
Y = X - K; | ||
while (X < N && Y < M && A[X] == B[Y]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert ProfileAnchors and IRAnchors into std::vector is to remove indirect calls
This is to remove the non-anchor
/non-callsite
location, (maybe the original name is confusing), so before the IRAnchors
does contains all the locations, calls and non-calls.
Or is it more clear to just use
So for profile matching, we know the intention is to find the LCS , and for the implementation, it still keep the |
In the new change:
|
assert(IRToProfileLocationMap.empty() && | ||
"Run stale profile matching only once per function"); | ||
|
||
std::vector<Anchor> ProfileCallsiteAnchors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By definition, anchors are call sites. The key difference between ProfileCallsiteAnchors
and ProfileAnchors
is not callsites, but sequence vs map, plus filtering?
How about
ProfileCallsiteAnchors -> FilteredProfileAnchorList
IRAnchors -> FilteredIRAnchorsList
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
Anchor(const LineLocation &Loc, const FunctionId &FuncId) | ||
: Loc(Loc), FuncId(FuncId) {} | ||
Anchor(const LineLocation &Loc, StringRef &FName) : Loc(Loc), FuncId(FName) {} | ||
bool operator==(const Anchor &Other) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we not compare Loc
as well?
Except for this special operator== that excludes Loc, this is essentially just a typedef pair<LineLocation, FunctionId> Anchor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, changed to using Anchor = std::pair<LineLocation, FunctionId>;
and removed this struct.
class MyersDiff { | ||
public: | ||
struct DiffResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we can simplify this. The abstractions here seem unnecessary -- 1) MyersDiff
is entirely stateless, so no need for class / instances. 2) DiffResult
is really just a LocToLocMap
if we remove all the test/debug related stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, removed this class
@@ -0,0 +1,150 @@ | |||
//===- SampleProfileMatcherTests.cpp - SampleProfileMatcher Unit Tests -----==// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about it, I feel we don't really need this level of unit test? There are a lot algorithms in the compiler, and when developing these algorithms it's important to test the core of it, but I don't know if we need to have this level of unit test committed in tree. Think about profi, or even layout algorithm, or many more, we usually test e2e, rather than creating tests for internals of the algorithms.
I hope we can turn some of this into lit tests and also remove the debug / test functions in diffing.
const std::vector<Anchor> &B) const; | ||
}; | ||
|
||
using AnchorMap = std::map<LineLocation, Anchor>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this AnchorMap
duplicates its LineLocation
(key) as part of Anchor
(value)? Is that necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the duplicated location, now the value is only the FunctionId
We don't actually need the insert/delete set, instead we only care about the counts/ratios. Can we simply do something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all good suggestions! addressing in the new version
Anchor(const LineLocation &Loc, const FunctionId &FuncId) | ||
: Loc(Loc), FuncId(FuncId) {} | ||
Anchor(const LineLocation &Loc, StringRef &FName) : Loc(Loc), FuncId(FName) {} | ||
bool operator==(const Anchor &Other) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, changed to using Anchor = std::pair<LineLocation, FunctionId>;
and removed this struct.
class MyersDiff { | ||
public: | ||
struct DiffResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, removed this class
const std::vector<Anchor> &B) const; | ||
}; | ||
|
||
using AnchorMap = std::map<LineLocation, Anchor>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the duplicated location, now the value is only the FunctionId
assert(IRToProfileLocationMap.empty() && | ||
"Run stale profile matching only once per function"); | ||
|
||
std::vector<Anchor> ProfileCallsiteAnchors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
@@ -0,0 +1,150 @@ | |||
//===- SampleProfileMatcherTests.cpp - SampleProfileMatcher Unit Tests -----==// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, removed this unit test and the debug things.
Can we add a lit test case to verify the new diffing? something that will yield different results with new diffing algorithm. |
|
||
void SampleProfileMatcher::matchNonCallsiteLocsAndWriteResults( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might have fell through the cracks..
For "WriteResults", here it writes the matching results to IRToProfileLocationMap.
I mean.. of course the result will persistent somewhere, is it significant enough to include that in names?
Similarly, we don't say "readAnchorsAndMatchNonAnchorAndWriteResults" just because it reads anchors, right? :)
The test change in the new version can verify the algorithm(yield different results). the original one is Though it's simple and a bit tricky, I can add a complicated case. |
@@ -19,6 +19,9 @@ | |||
|
|||
namespace llvm { | |||
|
|||
using Anchor = std::pair<LineLocation, FunctionId>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of defining Anchor
which is not used by AnchorMap
, how about using AnchorList = std::vector<std::pair<LineLocation, FunctionId>>
and not defining a common Anchor
type?
auto isInvalidLineOffset = [](uint32_t LineOffset) { | ||
return LineOffset & 0x8000; | ||
}; | ||
|
||
std::map<LineLocation, std::unordered_set<FunctionId>> ProfileCallsites; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid the intermediate ProfileCallsites
? i.e. how about we just populate ProfileAnchors
directly? When we encounter a second callee for an anchor location, change the callee to UnknownIndirectCallee
on the fly.
std::vector<int32_t> V(2 * MaxDepth + 1, -1); | ||
V[Index(1)] = 0; | ||
// Trace is used to backtrack the SES result. | ||
std::vector<std::vector<int32_t>> Trace; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size of Trace
is known, so we can reserve capacity to avoid growing Trace
and reallocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trace is "unknown"(not always the MaxDepth) here, the size of the Trace is the shortest Depth(LCS) to find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. Looking good overall and getting really close now.
I just wonder now that we rely on sequence match, do we still need to exclude indirect calls?
if (R != MatchedAnchors.end()) { | ||
const auto &Candidate = R->second; | ||
InsertMatching(Loc, Candidate); | ||
LLVM_DEBUG(dbgs() << "Callsite with callee:" << CalleeName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: fold IR.second.stringRef()
to use here so you don't have to use [[maybe_unused]]
to suppress warning for non-debug builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, Good catch!
[[maybe_unused]] const auto &Callees = I.second; | ||
assert(!Callees.empty() && "Callees should not be empty"); | ||
[[maybe_unused]] StringRef CalleeName = I.second.stringRef(); | ||
assert(!CalleeName.empty() && "Callees should not be empty"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: CalleeName
is only used in assertion, why not fold it into the assertion?
// Filter the non-callsite from IRAnchors. | ||
for (const auto &I : IRAnchors) { | ||
if (I.second.stringRef().empty()) | ||
continue; | ||
FilteredIRAnchorsList.emplace_back(I); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually for IRAnchors
, why does it have to be a AnchorMap
? I didn't find any usage that needs on map::find
? Can we use AnchorList
type from the beginning to avoid the conversion from map to vector here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering for matching sequence, do we have to exclude indirect call on both sides?
Like what if we just allow indirect calls on both sides to match as part of sequence? Would that simplify the implementation more by removing the filtering, but without hurting the effectiveness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually for IRAnchors, why does it have to be a AnchorMap? I didn't find any usage that needs on map::find? Can we use AnchorList type from the beginning to avoid the conversion from map to vector here?
AnchorMap
is used here to sort and deduplicate the IR location(the map key)(see the findIRAnchors
), as there could have BB/call duplication optimization.
I'm also wondering for matching sequence, do we have to exclude indirect call on both sides?
Like what if we just allow indirect calls on both sides to match as part of sequence? Would that simplify the implementation more by removing the filtering, but without hurting the effectiveness?
We do include the indirect call for the matching, I added one test for this case (see the Run stale profile matching for test_indirect_call...
, it basically uses the dummy name(unknown.indirect.callee
) as anchor which now both sides can have.
To clarify, here the filtering is to filter the non-call inst/location, recalling that in findIRAnchors
, we use the empty name to represent the non-call inst. then diff alg needs to work on call only(anchor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. can you remind me why we need non-call site to be in the anchor map? I thought we only consider call sites as anchors.
What I'm really trying get at is whether it's possible to make AnchorMap and AnchorList really close, and maybe even merge them. Maybe it's not impossible, but wanted to understand all the reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the beginning we used separate two map for call and non-call, then we refined them into one to make it cleaner(no need to pass two map as parameters). Another reason is later for the non-call anchor matching, the non-call matching is based on call anchor(by the lexical order), so we regardlessly need to sort the non-call/call location in one container, then it's easy to use the map(sorted at the beginning).
Yeah, it's probably hard to merge them, in summary, map is used for deduplication and sorting, and list is used for random access and filtering the non-call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the changes and working through the feedbacks.
The algorithm added by PR llvm#87375 can be potentially quadratic in the number of anchors. This is almost never a problem because normally functions have a reasonable number of function calls. However, in some rare cases of auto-generated code we observed very large functions that trigger quadratic behaviour here (resulting in >130GB of peak heap memory usage for clang). Let's add a knob for controlling the max number of callsites in a function above which stale profile matching won't be performed.
The algorithm added by PR #87375 can be potentially quadratic in the number of anchors. This is almost never a problem because normally functions have a reasonable number of function calls. However, in some rare cases of auto-generated code we observed very large functions that trigger quadratic behaviour here (resulting in >130GB of peak heap memory usage for clang). Let's add a knob for controlling the max number of callsites in a function above which stale profile matching won't be performed.
This change improves the matching algorithm by using the diff algorithm, the current matching algorithm only processes the callsites grouped by the same name functions, it doesn't consider the order relationships between different name functions, this sometimes fails to handle this ambiguous anchor case. For example. (
Foo:1
means a calliste[callee_name: callsite_location])The
foo:1
is matched to the 2ndfoo:5
and using the diff algorithm(finding longest common subsequence ) can help on this issue. One well-known diff algorithm is the Myers diff algorithm(paper "An O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its variations have been implemented and used in many famous tools, like the GNU diff or git diff. It provides an efficient way to find the longest common subsequence or the shortest edit script through graph searching.There are several variations/refinements for the algorithm, but as in our case, the num of function callsites is usually very small, so we implemented the basic greedy version in this change which should be good enough.
We observed better matchings and positive perf improvement on our internal services.