Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SampleFDO] Improve stale profile matching by diff algorithm #87375

Merged
merged 9 commits into from
May 13, 2024

Conversation

wlei-llvm
Copy link
Contributor

@wlei-llvm wlei-llvm commented Apr 2, 2024

This change improves the matching algorithm by using the diff algorithm, the current matching algorithm only processes the callsites grouped by the same name functions, it doesn't consider the order relationships between different name functions, this sometimes fails to handle this ambiguous anchor case. For example. (Foo:1 means a calliste[callee_name: callsite_location])

IR :      foo:1  bar:2  foo:4  bar:5 
Profile :        bar:3  foo:5  bar:6

The foo:1 is matched to the 2nd foo:5 and using the diff algorithm(finding longest common subsequence ) can help on this issue. One well-known diff algorithm is the Myers diff algorithm(paper "An O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its variations have been implemented and used in many famous tools, like the GNU diff or git diff. It provides an efficient way to find the longest common subsequence or the shortest edit script through graph searching.
There are several variations/refinements for the algorithm, but as in our case, the num of function callsites is usually very small, so we implemented the basic greedy version in this change which should be good enough.
We observed better matchings and positive perf improvement on our internal services.

@david-xl david-xl requested a review from huangjd April 2, 2024 18:37
llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h Outdated Show resolved Hide resolved

// The basic greedy version of Myers's algorithm. Refer to page 6 of the
// original paper.
DiffResult shortestEdit(const std::vector<Anchor> &A,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps name it 'longestCommonSequence' to convey the intention? 'shortestEdit' is implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I was thinking to use shortestEdit to match the return type which is a set of "edit"(insertion, deletion...), but agreed that LCS is clearer in the context of stale profile matching.

@llvmbot
Copy link
Member

llvmbot commented Apr 26, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Lei Wang (wlei-llvm)

Changes

This change improves the matching algorithm by using the diff algorithm, the current matching algorithm only processes the callsites grouped by the same name functions, it doesn't consider the order relationships between different name functions, this sometimes fails to handle this ambiguous anchor case. For example.

IR :      foo:1  bar:2  foo:4  bar:5 
Profile :        bar:3  foo:5  bar:6

The foo:1 is matched to the 2nd foo:5 and using the diff algorithm(finding longest common subsequence ) can help on this issue. One well-known diff algorithm is the Myers diff algorithm(paper "An O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its variations have been implemented and used in many famous tools, like the GNU diff or git diff. It provides an efficient way to find the longest common subsequence or the shortest edit script through graph searching.
There are several variations/refinements for the algorithm, but as in our case, the num of function callsites is usually very small, so we implemented the basic greedy version in this change which should be good enough.
We observed better matchings and positive perf improvement on our internal services.


Full diff: https://github.com/llvm/llvm-project/pull/87375.diff

4 Files Affected:

  • (modified) llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h (+57-2)
  • (modified) llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp (+166-63)
  • (modified) llvm/unittests/Transforms/IPO/CMakeLists.txt (+2)
  • (added) llvm/unittests/Transforms/IPO/SampleProfileMatcherTests.cpp (+150)
diff --git a/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h b/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
index 7ae6194da7c9cc..5b56638c13344b 100644
--- a/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
+++ b/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
@@ -19,6 +19,57 @@
 
 namespace llvm {
 
+// Callsite location based matching anchor.
+struct Anchor {
+  LineLocation Loc;
+  FunctionId FuncId;
+
+  Anchor(const LineLocation &Loc, const FunctionId &FuncId)
+      : Loc(Loc), FuncId(FuncId) {}
+  bool operator==(const Anchor &Other) const {
+    return this->FuncId == Other.FuncId;
+  }
+};
+
+// This class implements the Myers diff algorithm used for stale profile
+// matching. The algorithm provides a simple and efficient way to find the
+// Longest Common Subsequence(LCS) or the Shortest Edit Script(SES) of two
+// sequences. For more details, refer to the paper 'An O(ND) Difference
+// Algorithm and Its Variations' by Eugene W. Myers.
+// In the scenario of profile fuzzy matching, the two sequences are the IR
+// callsite anchors and profile callsite anchors. The subsequence equivalent
+// parts from the resulting SES are used to remap the IR locations to the
+// profile locations.
+class MyersDiff {
+public:
+  struct DiffResult {
+    LocToLocMap EqualLocations;
+#ifndef NDEBUG
+    // New IR locations that are inserted in the new version.
+    std::vector<LineLocation> Insertions;
+    // Old Profile locations that are deleted in the new version.
+    std::vector<LineLocation> Deletions;
+#endif
+    void addEqualLocations(const LineLocation &IRLoc,
+                           const LineLocation &ProfLoc) {
+      EqualLocations.insert({IRLoc, ProfLoc});
+    }
+#ifndef NDEBUG
+    void addInsertion(const LineLocation &IRLoc) {
+      Insertions.push_back(IRLoc);
+    }
+    void addDeletion(const LineLocation &ProfLoc) {
+      Deletions.push_back(ProfLoc);
+    }
+#endif
+  };
+
+  // The basic greedy version of Myers's algorithm. Refer to page 6 of the
+  // original paper.
+  DiffResult longestCommonSequence(const std::vector<Anchor> &A,
+                                   const std::vector<Anchor> &B) const;
+};
+
 // Sample profile matching - fuzzy match.
 class SampleProfileMatcher {
   Module &M;
@@ -27,8 +78,8 @@ class SampleProfileMatcher {
   const ThinOrFullLTOPhase LTOPhase;
   SampleProfileMap FlattenedProfiles;
   // For each function, the matcher generates a map, of which each entry is a
-  // mapping from the source location of current build to the source location in
-  // the profile.
+  // mapping from the source location of current build to the source location
+  // in the profile.
   StringMap<LocToLocMap> FuncMappings;
 
   // Match state for an anchor/callsite.
@@ -143,6 +194,10 @@ class SampleProfileMatcher {
   }
   void distributeIRToProfileLocationMap();
   void distributeIRToProfileLocationMap(FunctionSamples &FS);
+  void matchNonAnchorAndWriteResults(
+      const LocToLocMap &AnchorMatchings,
+      const std::map<LineLocation, StringRef> &IRAnchors,
+      LocToLocMap &IRToProfileLocationMap);
   void runStaleProfileMatching(
       const Function &F, const std::map<LineLocation, StringRef> &IRAnchors,
       const std::map<LineLocation, std::unordered_set<FunctionId>>
diff --git a/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp b/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
index 1ca89e0091daff..de81450f0da252 100644
--- a/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
@@ -122,15 +122,153 @@ void SampleProfileMatcher::findProfileAnchors(
   }
 }
 
+MyersDiff::DiffResult
+MyersDiff::longestCommonSequence(const std::vector<Anchor> &A,
+                                 const std::vector<Anchor> &B) const {
+  int32_t N = A.size(), M = B.size(), Max = N + M;
+  auto Index = [&](int32_t I) { return I + Max; };
+
+  DiffResult Diff;
+  if (Max == 0)
+    return Diff;
+
+  // Backtrack the SES result.
+  auto Backtrack = [&](const std::vector<std::vector<int32_t>> &Trace,
+                       const std::vector<Anchor> &A,
+                       const std::vector<Anchor> &B) {
+    int32_t X = N, Y = M;
+    for (int32_t D = Trace.size() - 1; X > 0 || Y > 0; D--) {
+      const auto &P = Trace[D];
+      int32_t K = X - Y;
+      int32_t PrevK = K;
+      if (K == -D || (K != D && P[Index(K - 1)] < P[Index(K + 1)]))
+        PrevK = K + 1;
+      else
+        PrevK = K - 1;
+
+      int32_t PrevX = P[Index(PrevK)];
+      int32_t PrevY = PrevX - PrevK;
+      while (X > PrevX && Y > PrevY) {
+        X--;
+        Y--;
+        Diff.addEqualLocations(A[X].Loc, B[Y].Loc);
+      }
+
+      if (D == 0)
+        break;
+
+      if (Y == PrevY) {
+        X--;
+#ifndef NDEBUG
+        Diff.addInsertion(A[X].Loc);
+#endif
+      } else if (X == PrevX) {
+        Y--;
+#ifndef NDEBUG
+        Diff.addDeletion(B[Y].Loc);
+#endif
+      }
+      X = PrevX;
+      Y = PrevY;
+    }
+  };
+
+  // The greedy LCS/SES algorithm.
+  std::vector<int32_t> V(2 * Max + 1, -1);
+  V[Index(1)] = 0;
+  std::vector<std::vector<int32_t>> Trace;
+  for (int32_t D = 0; D <= Max; D++) {
+    Trace.push_back(V);
+    for (int32_t K = -D; K <= D; K += 2) {
+      int32_t X = 0, Y = 0;
+      if (K == -D || (K != D && V[Index(K - 1)] < V[Index(K + 1)]))
+        X = V[Index(K + 1)];
+      else
+        X = V[Index(K - 1)] + 1;
+      Y = X - K;
+      while (X < N && Y < M && A[X] == B[Y])
+        X++, Y++;
+
+      V[Index(K)] = X;
+
+      if (X >= N && Y >= M) {
+        // Length of an SES is D.
+        Backtrack(Trace, A, B);
+        return Diff;
+      }
+    }
+  }
+  // Length of an SES is greater than Max.
+  return Diff;
+}
+
+void SampleProfileMatcher::matchNonAnchorAndWriteResults(
+    const LocToLocMap &AnchorMatchings,
+    const std::map<LineLocation, StringRef> &IRAnchors,
+    LocToLocMap &IRToProfileLocationMap) {
+  auto InsertMatching = [&](const LineLocation &From, const LineLocation &To) {
+    // Skip the unchanged location mapping to save memory.
+    if (From != To)
+      IRToProfileLocationMap.insert({From, To});
+  };
+
+  // Use function's beginning location as the initial anchor.
+  int32_t LocationDelta = 0;
+  SmallVector<LineLocation> LastMatchedNonAnchors;
+  for (const auto &IR : IRAnchors) {
+    const auto &Loc = IR.first;
+    [[maybe_unused]] StringRef CalleeName = IR.second;
+    bool IsMatchedAnchor = false;
+
+    // Match the anchor location in lexical order.
+    auto R = AnchorMatchings.find(Loc);
+    if (R != AnchorMatchings.end()) {
+      const auto &Candidate = R->second;
+      InsertMatching(Loc, Candidate);
+      LLVM_DEBUG(dbgs() << "Callsite with callee:" << CalleeName
+                        << " is matched from " << Loc << " to " << Candidate
+                        << "\n");
+      LocationDelta = Candidate.LineOffset - Loc.LineOffset;
+
+      // Match backwards for non-anchor locations.
+      // The locations in LastMatchedNonAnchors have been matched forwards
+      // based on the previous anchor, spilt it evenly and overwrite the
+      // second half based on the current anchor.
+      for (size_t I = (LastMatchedNonAnchors.size() + 1) / 2;
+           I < LastMatchedNonAnchors.size(); I++) {
+        const auto &L = LastMatchedNonAnchors[I];
+        uint32_t CandidateLineOffset = L.LineOffset + LocationDelta;
+        LineLocation Candidate(CandidateLineOffset, L.Discriminator);
+        InsertMatching(L, Candidate);
+        LLVM_DEBUG(dbgs() << "Location is rematched backwards from " << L
+                          << " to " << Candidate << "\n");
+      }
+
+      IsMatchedAnchor = true;
+      LastMatchedNonAnchors.clear();
+    }
+
+    // Match forwards for non-anchor locations.
+    if (!IsMatchedAnchor) {
+      uint32_t CandidateLineOffset = Loc.LineOffset + LocationDelta;
+      LineLocation Candidate(CandidateLineOffset, Loc.Discriminator);
+      InsertMatching(Loc, Candidate);
+      LLVM_DEBUG(dbgs() << "Location is matched from " << Loc << " to "
+                        << Candidate << "\n");
+      LastMatchedNonAnchors.emplace_back(Loc);
+    }
+  }
+}
+
 // Call target name anchor based profile fuzzy matching.
 // Input:
 // For IR locations, the anchor is the callee name of direct callsite; For
 // profile locations, it's the call target name for BodySamples or inlinee's
 // profile name for CallsiteSamples.
 // Matching heuristic:
-// First match all the anchors in lexical order, then split the non-anchor
-// locations between the two anchors evenly, first half are matched based on the
-// start anchor, second half are matched based on the end anchor.
+// First match all the anchors using the diff algorithm, then split the
+// non-anchor locations between the two anchors evenly, first half are matched
+// based on the start anchor, second half are matched based on the end anchor.
 // For example, given:
 // IR locations:      [1, 2(foo), 3, 5, 6(bar), 7]
 // Profile locations: [1, 2, 3(foo), 4, 7, 8(bar), 9]
@@ -149,77 +287,42 @@ void SampleProfileMatcher::runStaleProfileMatching(
   assert(IRToProfileLocationMap.empty() &&
          "Run stale profile matching only once per function");
 
-  std::unordered_map<FunctionId, std::set<LineLocation>> CalleeToCallsitesMap;
+  std::vector<Anchor> ProfileCallsiteAnchors;
   for (const auto &I : ProfileAnchors) {
     const auto &Loc = I.first;
     const auto &Callees = I.second;
     // Filter out possible indirect calls, use direct callee name as anchor.
     if (Callees.size() == 1) {
-      FunctionId CalleeName = *Callees.begin();
-      const auto &Candidates = CalleeToCallsitesMap.try_emplace(
-          CalleeName, std::set<LineLocation>());
-      Candidates.first->second.insert(Loc);
+      auto CalleeName = *Callees.begin();
+      ProfileCallsiteAnchors.emplace_back(Loc, CalleeName);
+    } else if (Callees.size() > 1) {
+      ProfileCallsiteAnchors.emplace_back(Loc,
+                                          FunctionId(UnknownIndirectCallee));
     }
   }
 
-  auto InsertMatching = [&](const LineLocation &From, const LineLocation &To) {
-    // Skip the unchanged location mapping to save memory.
-    if (From != To)
-      IRToProfileLocationMap.insert({From, To});
-  };
-
-  // Use function's beginning location as the initial anchor.
-  int32_t LocationDelta = 0;
-  SmallVector<LineLocation> LastMatchedNonAnchors;
+  std::vector<Anchor> IRCallsiteAnchors;
+  for (const auto &I : IRAnchors) {
+    const auto &Loc = I.first;
+    const auto &CalleeName = I.second;
+    if (CalleeName.empty())
+      continue;
+    IRCallsiteAnchors.emplace_back(Loc, FunctionId(CalleeName));
+  }
 
-  for (const auto &IR : IRAnchors) {
-    const auto &Loc = IR.first;
-    auto CalleeName = IR.second;
-    bool IsMatchedAnchor = false;
-    // Match the anchor location in lexical order.
-    if (!CalleeName.empty()) {
-      auto CandidateAnchors =
-          CalleeToCallsitesMap.find(getRepInFormat(CalleeName));
-      if (CandidateAnchors != CalleeToCallsitesMap.end() &&
-          !CandidateAnchors->second.empty()) {
-        auto CI = CandidateAnchors->second.begin();
-        const auto Candidate = *CI;
-        CandidateAnchors->second.erase(CI);
-        InsertMatching(Loc, Candidate);
-        LLVM_DEBUG(dbgs() << "Callsite with callee:" << CalleeName
-                          << " is matched from " << Loc << " to " << Candidate
-                          << "\n");
-        LocationDelta = Candidate.LineOffset - Loc.LineOffset;
-
-        // Match backwards for non-anchor locations.
-        // The locations in LastMatchedNonAnchors have been matched forwards
-        // based on the previous anchor, spilt it evenly and overwrite the
-        // second half based on the current anchor.
-        for (size_t I = (LastMatchedNonAnchors.size() + 1) / 2;
-             I < LastMatchedNonAnchors.size(); I++) {
-          const auto &L = LastMatchedNonAnchors[I];
-          uint32_t CandidateLineOffset = L.LineOffset + LocationDelta;
-          LineLocation Candidate(CandidateLineOffset, L.Discriminator);
-          InsertMatching(L, Candidate);
-          LLVM_DEBUG(dbgs() << "Location is rematched backwards from " << L
-                            << " to " << Candidate << "\n");
-        }
+  if (IRCallsiteAnchors.empty() || ProfileCallsiteAnchors.empty())
+    return;
 
-        IsMatchedAnchor = true;
-        LastMatchedNonAnchors.clear();
-      }
-    }
+  // Use the diff algorithm to find the LCS/SES, the resulting equal locations
+  // from IR to Profile are used as anchor to match other locations. Note that
+  // here use IR anchor as base(A) to align with the order of
+  // IRToProfileLocationMap.
+  MyersDiff Diff;
+  auto DiffRes =
+      Diff.longestCommonSequence(IRCallsiteAnchors, ProfileCallsiteAnchors);
 
-    // Match forwards for non-anchor locations.
-    if (!IsMatchedAnchor) {
-      uint32_t CandidateLineOffset = Loc.LineOffset + LocationDelta;
-      LineLocation Candidate(CandidateLineOffset, Loc.Discriminator);
-      InsertMatching(Loc, Candidate);
-      LLVM_DEBUG(dbgs() << "Location is matched from " << Loc << " to "
-                        << Candidate << "\n");
-      LastMatchedNonAnchors.emplace_back(Loc);
-    }
-  }
+  matchNonAnchorAndWriteResults(DiffRes.EqualLocations, IRAnchors,
+                                IRToProfileLocationMap);
 }
 
 void SampleProfileMatcher::runOnFunction(Function &F) {
diff --git a/llvm/unittests/Transforms/IPO/CMakeLists.txt b/llvm/unittests/Transforms/IPO/CMakeLists.txt
index 4e4372179b46c7..80d0eadf0cce09 100644
--- a/llvm/unittests/Transforms/IPO/CMakeLists.txt
+++ b/llvm/unittests/Transforms/IPO/CMakeLists.txt
@@ -3,6 +3,7 @@ set(LLVM_LINK_COMPONENTS
   AsmParser
   Core
   IPO
+  ProfileData
   Support
   TargetParser
   TransformUtils
@@ -13,6 +14,7 @@ add_llvm_unittest(IPOTests
   WholeProgramDevirt.cpp
   AttributorTest.cpp
   FunctionSpecializationTest.cpp
+  SampleProfileMatcherTests.cpp
   )
 
 set_property(TARGET IPOTests PROPERTY FOLDER "Tests/UnitTests/TransformsTests")
diff --git a/llvm/unittests/Transforms/IPO/SampleProfileMatcherTests.cpp b/llvm/unittests/Transforms/IPO/SampleProfileMatcherTests.cpp
new file mode 100644
index 00000000000000..25d5c053d3ccbd
--- /dev/null
+++ b/llvm/unittests/Transforms/IPO/SampleProfileMatcherTests.cpp
@@ -0,0 +1,150 @@
+//===- SampleProfileMatcherTests.cpp - SampleProfileMatcher Unit Tests -----==//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/IPO/SampleProfileMatcher.h"
+#include "gtest/gtest.h"
+
+using namespace llvm;
+
+MyersDiff Diff;
+
+std::vector<Anchor>
+createAnchorsFromStrings(const std::vector<std::string> &SV) {
+  std::vector<Anchor> Anchors;
+  for (uint64_t I = 0; I < SV.size(); I++) {
+    Anchors.push_back(Anchor(LineLocation(I, 0), FunctionId(SV[I])));
+  }
+  return Anchors;
+}
+
+LocToLocMap
+createEqualLocations(const std::vector<std::pair<uint32_t, uint32_t>> &V) {
+  LocToLocMap LocMap;
+  for (auto P : V) {
+    LocMap.emplace(LineLocation(P.first, 0), LineLocation(P.second, 0));
+  }
+  return LocMap;
+}
+
+std::vector<LineLocation> createLocations(const std::vector<uint32_t> &V) {
+  std::vector<LineLocation> Locations;
+  for (auto I : V) {
+    Locations.emplace_back(LineLocation(I, 0));
+  }
+  return Locations;
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest1) {
+
+  std::vector<Anchor> AnchorsA;
+  std::vector<Anchor> AnchorsB;
+  auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+  EXPECT_TRUE(R.EqualLocations.empty());
+#ifndef NDEBUG
+  EXPECT_TRUE(R.Deletions.empty());
+  EXPECT_TRUE(R.Insertions.empty());
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest2) {
+  std::vector<std::string> A({"a", "b", "c"});
+  std::vector<Anchor> AnchorsA = createAnchorsFromStrings(A);
+  std::vector<Anchor> AnchorsB;
+  auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+  EXPECT_TRUE(R.EqualLocations.empty());
+#ifndef NDEBUG
+  EXPECT_EQ(R.Insertions, createLocations(std::vector<uint32_t>({2, 1, 0})));
+  EXPECT_TRUE(R.Deletions.empty());
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest3) {
+
+  std::vector<Anchor> AnchorsA;
+  std::vector<std::string> B({"a", "b", "c"});
+  std::vector<Anchor> AnchorsB = createAnchorsFromStrings(B);
+  auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+  EXPECT_TRUE(R.EqualLocations.empty());
+#ifndef NDEBUG
+  EXPECT_TRUE(R.Insertions.empty());
+  EXPECT_EQ(R.Deletions, createLocations(std::vector<uint32_t>({2, 1, 0})));
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest4) {
+  std::vector<std::string> A({"a", "b", "c"});
+  std::vector<std::string> B({"a", "b", "c"});
+  std::vector<Anchor> AnchorsA = createAnchorsFromStrings(A);
+  std::vector<Anchor> AnchorsB = createAnchorsFromStrings(B);
+  LocToLocMap ExpectEqualLocations =
+      createEqualLocations({{0, 0}, {1, 1}, {2, 2}});
+  auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+  EXPECT_EQ(R.EqualLocations, ExpectEqualLocations);
+#ifndef NDEBUG
+  EXPECT_TRUE(R.Insertions.empty());
+  EXPECT_TRUE(R.Deletions.empty());
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest5) {
+  std::vector<std::string> A({"a", "b", "c"});
+  std::vector<std::string> B({"b", "c", "d"});
+  std::vector<Anchor> AnchorsA = createAnchorsFromStrings(A);
+  std::vector<Anchor> AnchorsB = createAnchorsFromStrings(B);
+  LocToLocMap ExpectEqualLocations = createEqualLocations({{1, 0}, {2, 1}});
+  auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+  EXPECT_EQ(R.EqualLocations, ExpectEqualLocations);
+#ifndef NDEBUG
+  EXPECT_EQ(R.Insertions, createLocations(std::vector<uint32_t>({0})));
+  EXPECT_EQ(R.Deletions, createLocations(std::vector<uint32_t>({2})));
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest6) {
+  std::vector<std::string> A({"a", "b", "d"});
+  std::vector<std::string> B({"a", "c", "d"});
+  std::vector<Anchor> AnchorsA = createAnchorsFromStrings(A);
+  std::vector<Anchor> AnchorsB = createAnchorsFromStrings(B);
+  LocToLocMap ExpectEqualLocations = createEqualLocations({{0, 0}, {2, 2}});
+  auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+  EXPECT_EQ(R.EqualLocations, ExpectEqualLocations);
+#ifndef NDEBUG
+  EXPECT_EQ(R.Insertions, createLocations(std::vector<uint32_t>({1})));
+  EXPECT_EQ(R.Deletions, createLocations(std::vector<uint32_t>({1})));
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest7) {
+  std::vector<std::string> A({"a", "b", "c", "a", "b", "b", "a"});
+  std::vector<std::string> B({"c", "b", "a", "b", "a", "c"});
+  std::vector<Anchor> AnchorsA = createAnchorsFromStrings(A);
+  std::vector<Anchor> AnchorsB = createAnchorsFromStrings(B);
+  LocToLocMap ExpectEqualLocations =
+      createEqualLocations({{2, 0}, {3, 2}, {4, 3}, {6, 4}});
+  auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+  EXPECT_EQ(R.EqualLocations, ExpectEqualLocations);
+#ifndef NDEBUG
+  EXPECT_EQ(R.Insertions, createLocations(std::vector<uint32_t>({5, 1, 0})));
+  EXPECT_EQ(R.Deletions, createLocations(std::vector<uint32_t>({5, 1})));
+#endif
+}
+
+TEST(SampleProfileMatcherTests, MyersDiffTest8) {
+  std::vector<std::string> A({"a", "c", "b", "c", "b", "d", "e"});
+  std::vector<std::string> B({"a", "b", "c", "a", "a", "b", "c", "c", "d"});
+  std::vector<Anchor> AnchorsA = createAnchorsFromStrings(A);
+  std::vector<Anchor> AnchorsB = createAnchorsFromStrings(B);
+  LocToLocMap ExpectEqualLocations =
+      createEqualLocations({{0, 0}, {2, 1}, {3, 2}, {4, 5}, {5, 8}});
+  auto R = Diff.longestCommonSequence(AnchorsA, AnchorsB);
+  EXPECT_EQ(R.EqualLocations, ExpectEqualLocations);
+#ifndef NDEBUG
+  EXPECT_EQ(R.Insertions, createLocations(std::vector<uint32_t>({6, 1})));
+  EXPECT_EQ(R.Deletions, createLocations(std::vector<uint32_t>({7, 6, 4, 3})));
+#endif
+}

MyersDiff::DiffResult
MyersDiff::longestCommonSequence(const std::vector<Anchor> &A,
const std::vector<Anchor> &B) const {
int32_t N = A.size(), M = B.size(), Max = N + M;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment: can we replace A,B,M,N,X,Y,V names with something meaningful and more readable? Also Max is not really Max but total size of two sequence.

@@ -0,0 +1,150 @@
//===- SampleProfileMatcherTests.cpp - SampleProfileMatcher Unit Tests -----==//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight preference to test it with lit rather than unit test. I think we should be able to craft a test case that shows different matching from the new algorithm?

That way we can also remove the test/debug only APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about it, I feel we don't really need this level of unit test? There are a lot algorithms in the compiler, and when developing these algorithms it's important to test the core of it, but I don't know if we need to have this level of unit test committed in tree. Think about profi, or even layout algorithm, or many more, we usually test e2e, rather than creating tests for internals of the algorithms.

I hope we can turn some of this into lit tests and also remove the debug / test functions in diffing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, removed this unit test and the debug things.

MyersDiff::longestCommonSequence(const std::vector<Anchor> &A,
const std::vector<Anchor> &B) const {
int32_t N = A.size(), M = B.size(), Max = N + M;
auto Index = [&](int32_t I) { return I + Max; };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this simple +Max need a lambda function?

else
X = V[Index(K - 1)] + 1;
Y = X - K;
while (X < N && Y < M && A[X] == B[Y])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the reason you convert ProfileAnchors and IRAnchors into std::vector<Anchor> is to remove indirect calls and allow random access by indexing. Looking at how std::vector<Anchor>, I don't really see real random access (it's always sequential through X++ and Y++). Can you simply use ProfileAnchors and IRAnchors directly but through an iterator/access that skips indirect call locations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert ProfileAnchors and IRAnchors into std::vector is to remove indirect calls

This is to remove the non-anchor/non-callsite location, (maybe the original name is confusing), so before the IRAnchors does contains all the locations, calls and non-calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I was trying to use the iterator things, but found that the X, Y is not always ++ or --, it is updated by the previous value X = V[Index(K + 1)]; which can be random, so perhaps it's easier just use the index to access.

#endif
};

// The basic greedy version of Myers's algorithm. Refer to page 6 of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Refer to page 6 ..." either mention that together in the comment above where you actually mentioned the paper, or remove it.


// The basic greedy version of Myers's algorithm. Refer to page 6 of the
// original paper.
DiffResult longestCommonSequence(const std::vector<Anchor> &A,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would simpler if LCS just return a "CommonSequence" as represented by LocToLocMap.

The abstraction of DiffResult and MyersDiff seem a bit unnecessary. A standalone function LocToLocMap longestCommonSequence(..) would be cleaner I think.

// here use IR anchor as base(A) to align with the order of
// IRToProfileLocationMap.
MyersDiff Diff;
auto DiffRes =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: LocToLocMap MatchedAnchors = longestCommonSequence(...)

return Diff;
}

void SampleProfileMatcher::matchNonAnchorAndWriteResults(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "WriteResults" part of this function? Suggest:

matchNonAnchorAndWriteResults -> matchNonAnchorLocations
AnchorMatchings -> MatchedAnchors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "WriteResults", here it writes the matching results to IRToProfileLocationMap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "WriteResults", here it writes the matching results to IRToProfileLocationMap.

I mean.. of course the result will persistent somewhere, is it significant enough to include that in names?

Or should we say "readAnchorsAndMatchNonAnchorAndWriteResults" just because it reads anchors? :)

Copy link
Contributor Author

@wlei-llvm wlei-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the review!
Seems many suggestions are related to specialize the diff algorithm. Let me try to clarify them together here:

  • The algorithm is a basic version, the result is optimal, but there are still many space for time/space optimizations, decoupling it from the stale matching could make people easily extend it in future.
  • The paper is very classical(assuming it's rigorously designed and written), so I wanted to keep code/variables as much as the same as the paper to convey the intension this is really a "copy-paste" of the algorithm(rather than my maybe inaccurate interpretation).
  • In very soon, I want to use this diff algorithm for other purpose: function renaming matching, which use the edit scripts to compute a similarity of profile and IR, like similarity = equalSet / (equalSet + insertSet + deleteSet), so those insertSet and deleteSet will be used there.

can we replace A,B,M,N,X,Y,V names with something meaningful and more readable? Also Max is not really Max but total size of two sequence.

The original paper converts the problem(find the LCS) to find the shortest path in the edit graph, I guess those X,Y,M,N, are just common used notation in graph algorithm area. V is Vector?new defined in the paper, but I guess from "An array, V, contains the..."), the MAX is the upper bound(the longest path). So basically all the variables are the paper's original used/defined variables( reason #2).

It seems the confusing comes up after I renamed the function from "shortestEdit" to "longestCommonSequence", which the paper actually does find the shortestEdit in graph, I think I need to change it back to shortestEdit.

Does this simple +Max need a lambda function?

This is also for reason #2, The index in pseudo code of the paper can be negative(guess an old c-style), this is just to simply use V[Index(K - 1)] to looks closer to V[K - 1] than V[Max + K - 1]

It would simpler if LCS just return a "CommonSequence" as represented by LocToLocMap.
The abstraction of DiffResult and MyersDiff seem a bit unnecessary. A standalone function LocToLocMap longestCommonSequence(..) would be cleaner I think.

Similarly here, for reason #3, it will need the other fields(insert/delete).

Slight preference to test it with lit rather than unit test. I think we should be able to craft a test case that shows different matching from the new algorithm?
That way we can also remove the test/debug only APIs.

Similarly, I wanted to make it easy to extend to a "shared" library, so tried to test it independently.

return Diff;
}

void SampleProfileMatcher::matchNonAnchorAndWriteResults(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "WriteResults", here it writes the matching results to IRToProfileLocationMap.

else
X = V[Index(K - 1)] + 1;
Y = X - K;
while (X < N && Y < M && A[X] == B[Y])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert ProfileAnchors and IRAnchors into std::vector is to remove indirect calls

This is to remove the non-anchor/non-callsite location, (maybe the original name is confusing), so before the IRAnchors does contains all the locations, calls and non-calls.

@wlei-llvm
Copy link
Contributor Author

Or is it more clear to just use longestCommonSequence as a wrapper, like

LocToLocMap SampleProfileMatcher::longestCommonSequence(..) {
   ...shortestEditScript(...);
  return SES.EqualLocations;
}

So for profile matching, we know the intention is to find the LCS , and for the implementation, it still keep the shortestEditScript to align with the paper.

@wlei-llvm
Copy link
Contributor Author

In the new change:

  • Update the IRAnchors, ProfileAnchors to use Anchor as map value.
  • Some variable renaming.

assert(IRToProfileLocationMap.empty() &&
"Run stale profile matching only once per function");

std::vector<Anchor> ProfileCallsiteAnchors;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By definition, anchors are call sites. The key difference between ProfileCallsiteAnchors and ProfileAnchors is not callsites, but sequence vs map, plus filtering?

How about
ProfileCallsiteAnchors -> FilteredProfileAnchorList
IRAnchors -> FilteredIRAnchorsList

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Anchor(const LineLocation &Loc, const FunctionId &FuncId)
: Loc(Loc), FuncId(FuncId) {}
Anchor(const LineLocation &Loc, StringRef &FName) : Loc(Loc), FuncId(FName) {}
bool operator==(const Anchor &Other) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not compare Loc as well?

Except for this special operator== that excludes Loc, this is essentially just a typedef pair<LineLocation, FunctionId> Anchor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, changed to using Anchor = std::pair<LineLocation, FunctionId>; and removed this struct.

Comment on lines 45 to 47
class MyersDiff {
public:
struct DiffResult {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we can simplify this. The abstractions here seem unnecessary -- 1) MyersDiff is entirely stateless, so no need for class / instances. 2) DiffResult is really just a LocToLocMap if we remove all the test/debug related stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, removed this class

@@ -0,0 +1,150 @@
//===- SampleProfileMatcherTests.cpp - SampleProfileMatcher Unit Tests -----==//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about it, I feel we don't really need this level of unit test? There are a lot algorithms in the compiler, and when developing these algorithms it's important to test the core of it, but I don't know if we need to have this level of unit test committed in tree. Think about profi, or even layout algorithm, or many more, we usually test e2e, rather than creating tests for internals of the algorithms.

I hope we can turn some of this into lit tests and also remove the debug / test functions in diffing.

const std::vector<Anchor> &B) const;
};

using AnchorMap = std::map<LineLocation, Anchor>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this AnchorMap duplicates its LineLocation (key) as part of Anchor (value)? Is that necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the duplicated location, now the value is only the FunctionId

@WenleiHe
Copy link
Member

WenleiHe commented May 9, 2024

In very soon, I want to use this diff algorithm for other purpose: function renaming matching, which use the edit scripts to compute a similarity of profile and IR, like similarity = equalSet / (equalSet + insertSet + deleteSet), so those insertSet and deleteSet will be used there.

We don't actually need the insert/delete set, instead we only care about the counts/ratios. Can we simply do something like |equalSet| * 2 / (|seq1| + |seq2|)

Copy link
Contributor Author

@wlei-llvm wlei-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all good suggestions! addressing in the new version

Anchor(const LineLocation &Loc, const FunctionId &FuncId)
: Loc(Loc), FuncId(FuncId) {}
Anchor(const LineLocation &Loc, StringRef &FName) : Loc(Loc), FuncId(FName) {}
bool operator==(const Anchor &Other) const {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, changed to using Anchor = std::pair<LineLocation, FunctionId>; and removed this struct.

Comment on lines 45 to 47
class MyersDiff {
public:
struct DiffResult {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, removed this class

const std::vector<Anchor> &B) const;
};

using AnchorMap = std::map<LineLocation, Anchor>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the duplicated location, now the value is only the FunctionId

assert(IRToProfileLocationMap.empty() &&
"Run stale profile matching only once per function");

std::vector<Anchor> ProfileCallsiteAnchors;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@@ -0,0 +1,150 @@
//===- SampleProfileMatcherTests.cpp - SampleProfileMatcher Unit Tests -----==//
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, removed this unit test and the debug things.

@WenleiHe
Copy link
Member

WenleiHe commented May 9, 2024

Can we add a lit test case to verify the new diffing? something that will yield different results with new diffing algorithm.


void SampleProfileMatcher::matchNonCallsiteLocsAndWriteResults(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have fell through the cracks..

For "WriteResults", here it writes the matching results to IRToProfileLocationMap.

I mean.. of course the result will persistent somewhere, is it significant enough to include that in names?

Similarly, we don't say "readAnchorsAndMatchNonAnchorAndWriteResults" just because it reads anchors, right? :)

@wlei-llvm
Copy link
Contributor Author

Can we add a lit test case to verify the new diffing? something that will yield different results with new diffing algorithm.

The test change in the new version can verify the algorithm(yield different results).

the original one is
Profile: bar foo bar foo
IR : bar foo bar foo
Now I remove the 1st bar in the profile, the old algorithm will match the 2nd bar to the 1st bar in the IR, but diff algorithm will keep the same match.

Though it's simple and a bit tricky, I can add a complicated case.

@@ -19,6 +19,9 @@

namespace llvm {

using Anchor = std::pair<LineLocation, FunctionId>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of defining Anchor which is not used by AnchorMap, how about using AnchorList = std::vector<std::pair<LineLocation, FunctionId>> and not defining a common Anchor type?

auto isInvalidLineOffset = [](uint32_t LineOffset) {
return LineOffset & 0x8000;
};

std::map<LineLocation, std::unordered_set<FunctionId>> ProfileCallsites;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the intermediate ProfileCallsites? i.e. how about we just populate ProfileAnchors directly? When we encounter a second callee for an anchor location, change the callee to UnknownIndirectCallee on the fly.

std::vector<int32_t> V(2 * MaxDepth + 1, -1);
V[Index(1)] = 0;
// Trace is used to backtrack the SES result.
std::vector<std::vector<int32_t>> Trace;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of Trace is known, so we can reserve capacity to avoid growing Trace and reallocation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trace is "unknown"(not always the MaxDepth) here, the size of the Trace is the shortest Depth(LCS) to find.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. Looking good overall and getting really close now.

I just wonder now that we rely on sequence match, do we still need to exclude indirect calls?

if (R != MatchedAnchors.end()) {
const auto &Candidate = R->second;
InsertMatching(Loc, Candidate);
LLVM_DEBUG(dbgs() << "Callsite with callee:" << CalleeName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fold IR.second.stringRef() to use here so you don't have to use [[maybe_unused]] to suppress warning for non-debug builds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, Good catch!

[[maybe_unused]] const auto &Callees = I.second;
assert(!Callees.empty() && "Callees should not be empty");
[[maybe_unused]] StringRef CalleeName = I.second.stringRef();
assert(!CalleeName.empty() && "Callees should not be empty");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: CalleeName is only used in assertion, why not fold it into the assertion?

Comment on lines +293 to +298
// Filter the non-callsite from IRAnchors.
for (const auto &I : IRAnchors) {
if (I.second.stringRef().empty())
continue;
FilteredIRAnchorsList.emplace_back(I);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually for IRAnchors, why does it have to be a AnchorMap? I didn't find any usage that needs on map::find? Can we use AnchorList type from the beginning to avoid the conversion from map to vector here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering for matching sequence, do we have to exclude indirect call on both sides?

Like what if we just allow indirect calls on both sides to match as part of sequence? Would that simplify the implementation more by removing the filtering, but without hurting the effectiveness?

Copy link
Contributor Author

@wlei-llvm wlei-llvm May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually for IRAnchors, why does it have to be a AnchorMap? I didn't find any usage that needs on map::find? Can we use AnchorList type from the beginning to avoid the conversion from map to vector here?

AnchorMap is used here to sort and deduplicate the IR location(the map key)(see the findIRAnchors), as there could have BB/call duplication optimization.

I'm also wondering for matching sequence, do we have to exclude indirect call on both sides?
Like what if we just allow indirect calls on both sides to match as part of sequence? Would that simplify the implementation more by removing the filtering, but without hurting the effectiveness?

We do include the indirect call for the matching, I added one test for this case (see the Run stale profile matching for test_indirect_call..., it basically uses the dummy name(unknown.indirect.callee) as anchor which now both sides can have.

To clarify, here the filtering is to filter the non-call inst/location, recalling that in findIRAnchors, we use the empty name to represent the non-call inst. then diff alg needs to work on call only(anchor).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. can you remind me why we need non-call site to be in the anchor map? I thought we only consider call sites as anchors.

What I'm really trying get at is whether it's possible to make AnchorMap and AnchorList really close, and maybe even merge them. Maybe it's not impossible, but wanted to understand all the reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the beginning we used separate two map for call and non-call, then we refined them into one to make it cleaner(no need to pass two map as parameters). Another reason is later for the non-call anchor matching, the non-call matching is based on call anchor(by the lexical order), so we regardlessly need to sort the non-call/call location in one container, then it's easy to use the map(sorted at the beginning).

Yeah, it's probably hard to merge them, in summary, map is used for deduplication and sorting, and list is used for random access and filtering the non-call.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the changes and working through the feedbacks.

@wlei-llvm wlei-llvm merged commit 5b6f151 into llvm:main May 13, 2024
4 checks passed
amharc added a commit to amharc/llvm-project that referenced this pull request Jun 14, 2024
The algorithm added by PR llvm#87375 can be potentially quadratic in the
number of anchors. This is almost never a problem because normally
functions have a reasonable number of function calls.

However, in some rare cases of auto-generated code we observed very
large functions that trigger quadratic behaviour here (resulting in
>130GB of peak heap memory usage for clang). Let's add a knob for
controlling the max number of callsites in a function above which stale
profile matching won't be performed.
amharc added a commit that referenced this pull request Jun 17, 2024
The algorithm added by PR #87375 can be potentially quadratic in the
number of anchors. This is almost never a problem because normally
functions have a reasonable number of function calls.

However, in some rare cases of auto-generated code we observed very
large functions that trigger quadratic behaviour here (resulting in
>130GB of peak heap memory usage for clang). Let's add a knob for
controlling the max number of callsites in a function above which stale
profile matching won't be performed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants