Aggregate errors from llvm-dwarfdump --verify #79648

kevinfrei · 2024-01-26T21:13:19Z

The amount and format of output from llvm-dwarfdump --verify makes it quite difficult to know if a change to a tool that produces or modifies DWARF is causing new problems, or is fixing existing problems. This diff adds a categorized summary of issues found by the DWARF verifier, on by default, at the bottom of the error output.

The change includes a new --error-display option with 4 settings:

--error-display=quiet: Only display if errors occurred, but no details or summary are printed.
--error-display=summary: Only display the aggregated summary of errors with no error detail.
--error-display=details: Only display the detailed error messages with no summary (previous behavior)
--error-display=full: Display both the detailed error messages and the aggregated summary of errors (the default)

I changed a handful of tests that were failing due to new output, adding the flag to use the old behavior for all but a couple. For those two I added the new aggregated output to the expected output of the test.

The OutputCategoryAggregator is a pretty simple little class that @clayborg suggested to allow code to only be run to dump detail if it's enabled, while still collating counts of the category. Knowing that the lambda passed in is only conditionally executed is pretty important (handling errors has to be done outside the lambda). I'm happy to move this somewhere else (and change/improve it) to be more broadly useful if folks would like.

github-actions · 2024-01-26T21:15:44Z

✅ With the latest revision this PR passed the C/C++ code formatter.

kevinfrei · 2024-01-26T21:15:59Z

How does one assign a reviewer if you don't have write access? I'm 98% certain I'd like @dwblaikie to see this one...

dwblaikie · 2024-01-26T23:30:34Z

How does one assign a reviewer if you don't have write access? I'm 98% certain I'd like @dwblaikie to see this one...

mentioning will do just fine :)

llvmbot · 2024-01-26T23:31:17Z

@llvm/pr-subscribers-debuginfo

Author: Kevin Frei (kevinfrei)

Changes

The amount and format of output from llvm-dwarfutil --verify makes it quite difficult to know if a change to a tool that produces or modifies DWARF is causing new problems, or is fixing existing problems. This diff adds a categorized summary of issues found by the DWARF verifier, on by default, at the bottom of the error output.

The change includes two new options: --only-aggregate-errors and --no-aggregate-errors both of which default to 'off'. The aggregated error count is, by default, displayed at the bottom of the verification output (this is a change to previous output from the tool). If you want the previous output, use --no-aggregate-errors. If you only want the aggregated information, use --only-aggregate-errors.

I changed a handful of tests that were failing due to new output, adding the flag to use the old behavior for all but a couple. For those two I added the new aggregated output to the expected output of the test.

The OutputCategoryAggregator is a pretty simple little class that @clayborg suggested to allow code to only be run to dump detail if it's enabled, while still collating counts of the category. Knowing that the lambda passed in is only conditionally executed is pretty important (handling errors has to be done outside the lambda). I'm happy to move this somewhere else (and change/improve it) to be more broadly useful if folks would like.

Patch is 66.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79648.diff

13 Files Affected:

(modified) llvm/include/llvm/DebugInfo/DIContext.h (+1)
(modified) llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h (+18)
(modified) llvm/lib/DebugInfo/DWARF/DWARFContext.cpp (+1)
(modified) llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp (+482-277)
(modified) llvm/test/DebugInfo/X86/skeleton-unit-verify.s (+1-3)
(modified) llvm/test/DebugInfo/dwarfdump-accel.test (+1-1)
(modified) llvm/test/tools/llvm-dwarfdump/X86/verify_attr_file_indexes.yaml (+1-1)
(modified) llvm/test/tools/llvm-dwarfdump/X86/verify_attr_file_indexes_no_files.yaml (+1-1)
(modified) llvm/test/tools/llvm-dwarfdump/X86/verify_file_encoding.yaml (+5-1)
(modified) llvm/test/tools/llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml (+1-1)
(modified) llvm/test/tools/llvm-dwarfdump/X86/verify_parent_zero_length.yaml (+1-1)
(modified) llvm/test/tools/llvm-dwarfdump/X86/verify_split_cu.s (+5)
(modified) llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp (+19-1)

diff --git a/llvm/include/llvm/DebugInfo/DIContext.h b/llvm/include/llvm/DebugInfo/DIContext.h
index 78ac34e5f0d26c..288ddf77bdfda7 100644
--- a/llvm/include/llvm/DebugInfo/DIContext.h
+++ b/llvm/include/llvm/DebugInfo/DIContext.h
@@ -205,6 +205,7 @@ struct DIDumpOptions {
   bool DisplayRawContents = false;
   bool IsEH = false;
   bool DumpNonSkeleton = false;
+  bool ShowAggregateErrors = false;
   std::function<llvm::StringRef(uint64_t DwarfRegNum, bool IsEH)>
       GetNameForDWARFReg;
 
diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
index e56d3781e824f3..ad50fa92665ce5 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
@@ -30,6 +30,20 @@ class DWARFDebugAbbrev;
 class DataExtractor;
 struct DWARFSection;
 
+class OutputCategoryAggregator {
+private:
+  std::map<std::string, unsigned> Aggregation;
+  bool IncludeDetail;
+
+public:
+  OutputCategoryAggregator(bool includeDetail = false)
+      : IncludeDetail(includeDetail) {}
+  void EnableDetail() { IncludeDetail = true; }
+  void DisableDetail() { IncludeDetail = false; }
+  void Report(StringRef s, std::function<void()> detailCallback);
+  void EnumerateResults(std::function<void(StringRef, unsigned)> handleCounts);
+};
+
 /// A class that verifies DWARF debug information given a DWARF Context.
 class DWARFVerifier {
 public:
@@ -81,6 +95,7 @@ class DWARFVerifier {
   DWARFContext &DCtx;
   DIDumpOptions DumpOpts;
   uint32_t NumDebugLineErrors = 0;
+  OutputCategoryAggregator ErrorCategory;
   // Used to relax some checks that do not currently work portably
   bool IsObjectFile;
   bool IsMachOObject;
@@ -348,6 +363,9 @@ class DWARFVerifier {
   bool verifyDebugStrOffsets(
       StringRef SectionName, const DWARFSection &Section, StringRef StrData,
       void (DWARFObject::*)(function_ref<void(const DWARFSection &)>) const);
+
+  /// Emits any aggregate information collected, depending on the dump options
+  void summarize(bool Success);
 };
 
 static inline bool operator<(const DWARFVerifier::DieRangeInfo &LHS,
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
index 792df53d304aa7..32a7fec0860be3 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
@@ -1408,6 +1408,7 @@ bool DWARFContext::verify(raw_ostream &OS, DIDumpOptions DumpOpts) {
   if (DumpOpts.DumpType & DIDT_DebugStrOffsets)
     Success &= verifier.handleDebugStrOffsets();
   Success &= verifier.handleAccelTables();
+  verifier.summarize(Success);
   return Success;
 }
 
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
index c4c14f5e2c9d36..a1f32c3e838d3c 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
@@ -167,20 +167,24 @@ bool DWARFVerifier::verifyUnitHeader(const DWARFDataExtractor DebugInfoData,
   if (!ValidLength || !ValidVersion || !ValidAddrSize || !ValidAbbrevOffset ||
       !ValidType) {
     Success = false;
-    error() << format("Units[%d] - start offset: 0x%08" PRIx64 " \n", UnitIndex,
-                      OffsetStart);
-    if (!ValidLength)
-      note() << "The length for this unit is too "
-                "large for the .debug_info provided.\n";
-    if (!ValidVersion)
-      note() << "The 16 bit unit header version is not valid.\n";
-    if (!ValidType)
-      note() << "The unit type encoding is not valid.\n";
-    if (!ValidAbbrevOffset)
-      note() << "The offset into the .debug_abbrev section is "
-                "not valid.\n";
-    if (!ValidAddrSize)
-      note() << "The address size is unsupported.\n";
+    ErrorCategory.Report("Invalid Length in Unit Header", [&]() {
+      error() << format("Units[%d] - start offset: 0x%08" PRIx64 " \n",
+                        UnitIndex, OffsetStart);
+      if (!ValidLength)
+        note() << "The length for this unit is too "
+                  "large for the .debug_info provided.\n";
+      if (!ValidVersion) {
+        note() << "The 16 bit unit header version is not valid.\n";
+      }
+      if (!ValidType) {
+        note() << "The unit type encoding is not valid.\n";
+      }
+      if (!ValidAbbrevOffset)
+        note() << "The offset into the .debug_abbrev section is "
+                  "not valid.\n";
+      if (!ValidAddrSize)
+        note() << "The address size is unsupported.\n";
+    });
   }
   *Offset = OffsetStart + Length + (isUnitDWARF64 ? 12 : 4);
   return Success;
@@ -198,12 +202,16 @@ bool DWARFVerifier::verifyName(const DWARFDie &Die) {
   if (OriginalFullName.empty() || OriginalFullName == ReconstructedName)
     return false;
 
-  error() << "Simplified template DW_AT_name could not be reconstituted:\n"
-          << formatv("         original: {0}\n"
-                     "    reconstituted: {1}\n",
-                     OriginalFullName, ReconstructedName);
-  dump(Die) << '\n';
-  dump(Die.getDwarfUnit()->getUnitDIE()) << '\n';
+  ErrorCategory.Report(
+      "Simplified template DW_AT_name could not be reconstituted", [&]() {
+        error()
+            << "Simplified template DW_AT_name could not be reconstituted:\n"
+            << formatv("         original: {0}\n"
+                       "    reconstituted: {1}\n",
+                       OriginalFullName, ReconstructedName);
+        dump(Die) << '\n';
+        dump(Die.getDwarfUnit()->getUnitDIE()) << '\n';
+      });
   return true;
 }
 
@@ -240,22 +248,28 @@ unsigned DWARFVerifier::verifyUnitContents(DWARFUnit &Unit,
 
   DWARFDie Die = Unit.getUnitDIE(/* ExtractUnitDIEOnly = */ false);
   if (!Die) {
-    error() << "Compilation unit without DIE.\n";
+    ErrorCategory.Report("Compilation unit missing DIE", [&]() {
+      error() << "Compilation unit without DIE.\n";
+    });
     NumUnitErrors++;
     return NumUnitErrors;
   }
 
   if (!dwarf::isUnitType(Die.getTag())) {
-    error() << "Compilation unit root DIE is not a unit DIE: "
-            << dwarf::TagString(Die.getTag()) << ".\n";
+    ErrorCategory.Report("Compilation unit root DIE is not a unit DIE", [&]() {
+      error() << "Compilation unit root DIE is not a unit DIE: "
+              << dwarf::TagString(Die.getTag()) << ".\n";
+    });
     NumUnitErrors++;
   }
 
   uint8_t UnitType = Unit.getUnitType();
   if (!DWARFUnit::isMatchingUnitTypeAndTag(UnitType, Die.getTag())) {
-    error() << "Compilation unit type (" << dwarf::UnitTypeString(UnitType)
-            << ") and root DIE (" << dwarf::TagString(Die.getTag())
-            << ") do not match.\n";
+    ErrorCategory.Report("Mismatched unit type", [&]() {
+      error() << "Compilation unit type (" << dwarf::UnitTypeString(UnitType)
+              << ") and root DIE (" << dwarf::TagString(Die.getTag())
+              << ") do not match.\n";
+    });
     NumUnitErrors++;
   }
 
@@ -263,7 +277,9 @@ unsigned DWARFVerifier::verifyUnitContents(DWARFUnit &Unit,
   //  3.1.2 Skeleton Compilation Unit Entries:
   //  "A skeleton compilation unit has no children."
   if (Die.getTag() == dwarf::DW_TAG_skeleton_unit && Die.hasChildren()) {
-    error() << "Skeleton compilation unit has children.\n";
+    ErrorCategory.Report("Skeleton CU has children", [&]() {
+      error() << "Skeleton compilation unit has children.\n";
+    });
     NumUnitErrors++;
   }
 
@@ -280,15 +296,21 @@ unsigned DWARFVerifier::verifyDebugInfoCallSite(const DWARFDie &Die) {
   DWARFDie Curr = Die.getParent();
   for (; Curr.isValid() && !Curr.isSubprogramDIE(); Curr = Die.getParent()) {
     if (Curr.getTag() == DW_TAG_inlined_subroutine) {
-      error() << "Call site entry nested within inlined subroutine:";
-      Curr.dump(OS);
+      ErrorCategory.Report(
+          "Call site nested entry within inlined subroutine", [&]() {
+            error() << "Call site entry nested within inlined subroutine:";
+            Curr.dump(OS);
+          });
       return 1;
     }
   }
 
   if (!Curr.isValid()) {
-    error() << "Call site entry not nested within a valid subprogram:";
-    Die.dump(OS);
+    ErrorCategory.Report(
+        "Call site entry not nested within valid subprogram", [&]() {
+          error() << "Call site entry not nested within a valid subprogram:";
+          Die.dump(OS);
+        });
     return 1;
   }
 
@@ -297,9 +319,13 @@ unsigned DWARFVerifier::verifyDebugInfoCallSite(const DWARFDie &Die) {
        DW_AT_call_all_tail_calls, DW_AT_GNU_all_call_sites,
        DW_AT_GNU_all_source_call_sites, DW_AT_GNU_all_tail_call_sites});
   if (!CallAttr) {
-    error() << "Subprogram with call site entry has no DW_AT_call attribute:";
-    Curr.dump(OS);
-    Die.dump(OS, /*indent*/ 1);
+    ErrorCategory.Report(
+        "Subprogram with call site entry has no DW_AT_call attribute", [&]() {
+          error()
+              << "Subprogram with call site entry has no DW_AT_call attribute:";
+          Curr.dump(OS);
+          Die.dump(OS, /*indent*/ 1);
+        });
     return 1;
   }
 
@@ -313,7 +339,9 @@ unsigned DWARFVerifier::verifyAbbrevSection(const DWARFDebugAbbrev *Abbrev) {
   Expected<const DWARFAbbreviationDeclarationSet *> AbbrDeclsOrErr =
       Abbrev->getAbbreviationDeclarationSet(0);
   if (!AbbrDeclsOrErr) {
-    error() << toString(AbbrDeclsOrErr.takeError()) << "\n";
+    std::string ErrMsg = toString(AbbrDeclsOrErr.takeError());
+    ErrorCategory.Report("Abbreviation Declaration error",
+                         [&]() { error() << ErrMsg << "\n"; });
     return 1;
   }
 
@@ -324,9 +352,12 @@ unsigned DWARFVerifier::verifyAbbrevSection(const DWARFDebugAbbrev *Abbrev) {
     for (auto Attribute : AbbrDecl.attributes()) {
       auto Result = AttributeSet.insert(Attribute.Attr);
       if (!Result.second) {
-        error() << "Abbreviation declaration contains multiple "
-                << AttributeString(Attribute.Attr) << " attributes.\n";
-        AbbrDecl.dump(OS);
+        ErrorCategory.Report(
+            "Abbreviation declartion contains multiple attributes", [&]() {
+              error() << "Abbreviation declaration contains multiple "
+                      << AttributeString(Attribute.Attr) << " attributes.\n";
+              AbbrDecl.dump(OS);
+            });
         ++NumErrors;
       }
     }
@@ -440,10 +471,15 @@ unsigned DWARFVerifier::verifyIndex(StringRef Name,
       auto &M = *Sections[Col];
       auto I = M.find(SC.getOffset());
       if (I != M.end() && I.start() < (SC.getOffset() + SC.getLength())) {
-        error() << llvm::formatv(
-            "overlapping index entries for entries {0:x16} "
-            "and {1:x16} for column {2}\n",
-            *I, Sig, toString(Index.getColumnKinds()[Col]));
+        StringRef Category = InfoColumnKind == DWARFSectionKind::DW_SECT_INFO
+                                 ? "Overlapping CU index entries"
+                                 : "Overlapping TU index entries";
+        ErrorCategory.Report(Category, [&]() {
+          error() << llvm::formatv(
+              "overlapping index entries for entries {0:x16} "
+              "and {1:x16} for column {2}\n",
+              *I, Sig, toString(Index.getColumnKinds()[Col]));
+        });
         return 1;
       }
       M.insert(SC.getOffset(), SC.getOffset() + SC.getLength() - 1, Sig);
@@ -532,8 +568,10 @@ unsigned DWARFVerifier::verifyDieRanges(const DWARFDie &Die,
     for (const auto &Range : Ranges) {
       if (!Range.valid()) {
         ++NumErrors;
-        error() << "Invalid address range " << Range << "\n";
-        DumpDieAfterError = true;
+        ErrorCategory.Report("Invalid address range", [&]() {
+          error() << "Invalid address range " << Range << "\n";
+          DumpDieAfterError = true;
+        });
         continue;
       }
 
@@ -545,9 +583,11 @@ unsigned DWARFVerifier::verifyDieRanges(const DWARFDie &Die,
       // address: 0 or -1.
       if (auto PrevRange = RI.insert(Range)) {
         ++NumErrors;
-        error() << "DIE has overlapping ranges in DW_AT_ranges attribute: "
-                << *PrevRange << " and " << Range << '\n';
-        DumpDieAfterError = true;
+        ErrorCategory.Report("DIE has overlapping DW_AT_ranges", [&]() {
+          error() << "DIE has overlapping ranges in DW_AT_ranges attribute: "
+                  << *PrevRange << " and " << Range << '\n';
+          DumpDieAfterError = true;
+        });
       }
     }
     if (DumpDieAfterError)
@@ -558,9 +598,11 @@ unsigned DWARFVerifier::verifyDieRanges(const DWARFDie &Die,
   const auto IntersectingChild = ParentRI.insert(RI);
   if (IntersectingChild != ParentRI.Children.end()) {
     ++NumErrors;
-    error() << "DIEs have overlapping address ranges:";
-    dump(Die);
-    dump(IntersectingChild->Die) << '\n';
+    ErrorCategory.Report("DIEs have overlapping address ranges", [&]() {
+      error() << "DIEs have overlapping address ranges:";
+      dump(Die);
+      dump(IntersectingChild->Die) << '\n';
+    });
   }
 
   // Verify that ranges are contained within their parent.
@@ -569,9 +611,13 @@ unsigned DWARFVerifier::verifyDieRanges(const DWARFDie &Die,
                              ParentRI.Die.getTag() == DW_TAG_subprogram);
   if (ShouldBeContained && !ParentRI.contains(RI)) {
     ++NumErrors;
-    error() << "DIE address ranges are not contained in its parent's ranges:";
-    dump(ParentRI.Die);
-    dump(Die, 2) << '\n';
+    ErrorCategory.Report(
+        "DIE address ranges are not contained by parent ranges", [&]() {
+          error()
+              << "DIE address ranges are not contained in its parent's ranges:";
+          dump(ParentRI.Die);
+          dump(Die, 2) << '\n';
+        });
   }
 
   // Recursively check children.
@@ -584,10 +630,12 @@ unsigned DWARFVerifier::verifyDieRanges(const DWARFDie &Die,
 unsigned DWARFVerifier::verifyDebugInfoAttribute(const DWARFDie &Die,
                                                  DWARFAttribute &AttrValue) {
   unsigned NumErrors = 0;
-  auto ReportError = [&](const Twine &TitleMsg) {
+  auto ReportError = [&](StringRef category, const Twine &TitleMsg) {
     ++NumErrors;
-    error() << TitleMsg << '\n';
-    dump(Die) << '\n';
+    ErrorCategory.Report(category, [&]() {
+      error() << TitleMsg << '\n';
+      dump(Die) << '\n';
+    });
   };
 
   const DWARFObject &DObj = DCtx.getDWARFObj();
@@ -604,23 +652,27 @@ unsigned DWARFVerifier::verifyDebugInfoAttribute(const DWARFDie &Die,
       if (U->isDWOUnit() && RangeSection.Data.empty())
         break;
       if (*SectionOffset >= RangeSection.Data.size())
-        ReportError(
-            "DW_AT_ranges offset is beyond " +
-            StringRef(DwarfVersion < 5 ? ".debug_ranges" : ".debug_rnglists") +
-            " bounds: " + llvm::formatv("{0:x8}", *SectionOffset));
+        ReportError("DW_AT_ranges offset out of bounds",
+                    "DW_AT_ranges offset is beyond " +
+                        StringRef(DwarfVersion < 5 ? ".debug_ranges"
+                                                   : ".debug_rnglists") +
+                        " bounds: " + llvm::formatv("{0:x8}", *SectionOffset));
       break;
     }
-    ReportError("DIE has invalid DW_AT_ranges encoding:");
+    ReportError("Invalid DW_AT_ranges encoding",
+                "DIE has invalid DW_AT_ranges encoding:");
     break;
   case DW_AT_stmt_list:
     // Make sure the offset in the DW_AT_stmt_list attribute is valid.
     if (auto SectionOffset = AttrValue.Value.getAsSectionOffset()) {
       if (*SectionOffset >= U->getLineSection().Data.size())
-        ReportError("DW_AT_stmt_list offset is beyond .debug_line bounds: " +
-                    llvm::formatv("{0:x8}", *SectionOffset));
+        ReportError("DW_AT_stmt_list offset out of bounds",
+                    "DW_AT_stmt_list offset is beyond .debug_line bounds: " +
+                        llvm::formatv("{0:x8}", *SectionOffset));
       break;
     }
-    ReportError("DIE has invalid DW_AT_stmt_list encoding:");
+    ReportError("Invalid DW_AT_stmt_list encoding",
+                "DIE has invalid DW_AT_stmt_list encoding:");
     break;
   case DW_AT_location: {
     // FIXME: It might be nice if there's a way to walk location expressions
@@ -644,14 +696,15 @@ unsigned DWARFVerifier::verifyDebugInfoAttribute(const DWARFDie &Die,
               return Op.isError();
             });
         if (Error || !Expression.verify(U))
-          ReportError("DIE contains invalid DWARF expression:");
+          ReportError("Invalid DWARF expressions",
+                      "DIE contains invalid DWARF expression:");
       }
     } else if (Error Err = handleErrors(
                    Loc.takeError(), [&](std::unique_ptr<ResolverError> E) {
                      return U->isDWOUnit() ? Error::success()
                                            : Error(std::move(E));
                    }))
-      ReportError(toString(std::move(Err)));
+      ReportError("Invalid DW_AT_location", toString(std::move(Err)));
     break;
   }
   case DW_AT_specification:
@@ -668,19 +721,21 @@ unsigned DWARFVerifier::verifyDebugInfoAttribute(const DWARFDie &Die,
       // This might be reference to a function declaration.
       if (DieTag == DW_TAG_GNU_call_site && RefTag == DW_TAG_subprogram)
         break;
-      ReportError("DIE with tag " + TagString(DieTag) + " has " +
-                  AttributeString(Attr) +
-                  " that points to DIE with "
-                  "incompatible tag " +
-                  TagString(RefTag));
+      ReportError("Incompatible DW_AT_abstract_origin tag reference",
+                  "DIE with tag " + TagString(DieTag) + " has " +
+                      AttributeString(Attr) +
+                      " that points to DIE with "
+                      "incompatible tag " +
+                      TagString(RefTag));
     }
     break;
   }
   case DW_AT_type: {
     DWARFDie TypeDie = Die.getAttributeValueAsReferencedDie(DW_AT_type);
     if (TypeDie && !isType(TypeDie.getTag())) {
-      ReportError("DIE has " + AttributeString(Attr) +
-                  " with incompatible tag " + TagString(TypeDie.getTag()));
+      ReportError("Incompatible DW_AT_type attribute tag",
+                  "DIE has " + AttributeString(Attr) +
+                      " with incompatible tag " + TagString(TypeDie.getTag()));
     }
     break;
   }
@@ -695,35 +750,43 @@ unsigned DWARFVerifier::verifyDebugInfoAttribute(const DWARFDie &Die,
           bool IsZeroIndexed = LT->Prologue.getVersion() >= 5;
           if (std::optional<uint64_t> LastFileIdx =
                   LT->getLastValidFileIndex()) {
-            ReportError("DIE has " + AttributeString(Attr) +
-                        " with an invalid file index " +
-                        llvm::formatv("{0}", *FileIdx) +
-                        " (valid values are [" + (IsZeroIndexed ? "0-" : "1-") +
-                        llvm::formatv("{0}", *LastFileIdx) + "])");
+            ReportError("Invalid file index in DW_AT_decl_file",
+                        "DIE has " + AttributeString(Attr) +
+                            " with an invalid file index " +
+                            llvm::formatv("{0}", *FileIdx) +
+                            " (valid values are [" +
+                            (IsZeroIndexed ? "0-" : "1-") +
+                            llvm::formatv("{0}", *LastFileIdx) + "])");
           } else {
-            ReportError("DIE has " + AttributeString(Attr) +
-                        " with an invalid file index " +
-                        llvm::formatv("{0}", *FileIdx) +
-                        " (the file table in the prologue is empty)");
+            ReportError("Invalid file index in DW_AT_decl_file",
+                        "DIE has " + AttributeString(Attr) +
+                            " with an invalid file index " +
+                            llvm::formatv("{0}", *FileIdx) +
+                            " (the file table in the prologue is empty)");
           }
         }
       } else {
-        ReportError("DIE has " + AttributeString(Attr) +
-                    " that references a file with index " +
-                    llvm::formatv("{0}", *FileIdx) +
-                    " and the compile unit has no line table");
+        ReportError(
+            "File index in DW_AT_decl_file reference CU with no line table",
+            "DIE has " + ...
[truncated]

dwblaikie

General idea I'm more or less OK with (though, like, clang/compilers in general don't have this sort of feature - so I'm not sure it's especially strongly motivated for DWARF either) - though it starts to look more like compiler diagnostics, with diagnostic groups & at that point I wonder whether it should use syntax that's more similar to the way compiler diagnostics and diagnostic groups are rendered - and maybe even share some implementation (well, layering might be difficult there, llvm can't depend on clang - but maybe some of the diagnostics stuff got refactored out to be used by flang and so might be in some usable place... )

dwblaikie · 2024-01-26T23:17:45Z

llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h

@@ -30,6 +30,20 @@ class DWARFDebugAbbrev;
 class DataExtractor;
 struct DWARFSection;

+class OutputCategoryAggregator {
+private:
+  std::map<std::string, unsigned> Aggregation;


llvm::StringMap?

I reran tests with this change and then remembered why I went with std::map. It's sorted. llvm::StringMap isn't (hash-based). Add a single new category to the report and everything (might) get shuffled around, so I'd prefer to keep it a std::map.

llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h

llvm/test/tools/llvm-dwarfdump/X86/verify_file_encoding.yaml

llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp

kevinfrei · 2024-01-29T16:16:33Z

General idea I'm more or less OK with (though, like, clang/compilers in general don't have this sort of feature - so I'm not sure it's especially strongly motivated for DWARF either) - though it starts to look more like compiler diagnostics, with diagnostic groups & at that point I wonder whether it should use syntax that's more similar to the way compiler diagnostics and diagnostic groups are rendered - and maybe even share some implementation (well, layering might be difficult there, llvm can't depend on clang - but maybe some of the diagnostics stuff got refactored out to be used by flang and so might be in some usable place... )

Errors here are a little different than Clang/LLVM errors, IMO: There's no reasonable way to narrow down your output, and to make them useful, they contain a dramatic amount of detail that makes aggregating through simple scripts much harder. In general, the diagnostics output in this tool also feels pretty inconsistent (I wanted to aggregate before changing output, just for cleanliness of the work itself).

I do feel like aggregation is quite useful here, and much easier to do properly than in Clang/LLVM. This is (mostly) a very "one-to-one" usage scenario: Something has produced DWARF data in some binary somewhere. Validate that it's okay. (The output currently still requires you to know which problems are catastrophic, which can be ignore safely, etc...)

llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h

llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp

llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp

WenleiHe · 2024-01-30T17:11:10Z

Thanks for working on it -- this feels like a long missing feature. Perhaps one day --only-aggregate-errors would be the default behavior (unless higher verbosity is explicitly requested).

JDevlieghere · 2024-01-30T18:34:15Z

The amount and format of output from llvm-dwarfutil --verify makes it quite difficult to know if a change to a tool that produces or modifies DWARF is causing new problems, or is fixing existing problems.

The way dsymutil solves this problem is by having different verification modes.

  --verify-dwarf <verification mode>
                          Run the DWARF verifier on the input and/or output. Valid options are 'none, 'input', 'output', 'all' or 'auto' which runs the output verifier only if input verification passed.

Most of the time, invalid DWARF in means invalid DWARF out.

This diff adds a categorized summary of issues found by the DWARF verifier, on by default, at the bottom of the error output.

I didn't look at the patch, but from the description I'm not sure how aggregated error output solves this problem.

Speaking from experience with dsymutil, I don't think you want to have the verifier on by default. It's pretty slow and trades off performance to have better error reporting. For dsymutil, I made the auto verification mode the default in assert builds. It's way too slow to turn on in production.

kevinfrei · 2024-01-30T22:02:42Z

The amount and format of output from llvm-dwarfutil --verify makes it quite difficult to know if a change to a tool that produces or modifies DWARF is causing new problems, or is fixing existing problems.

The way dsymutil solves this problem is by having different verification modes.
  --verify-dwarf <verification mode>
                          Run the DWARF verifier on the input and/or output. Valid options are 'none, 'input', 'output', 'all' or 'auto' which runs the output verifier only if input verification passed.
Most of the time, invalid DWARF in means invalid DWARF out.

This diff adds a categorized summary of issues found by the DWARF verifier, on by default, at the bottom of the error output.

I didn't look at the patch, but from the description I'm not sure how aggregated error output solves this problem.

Speaking from experience with dsymutil, I don't think you want to have the verifier on by default. It's pretty slow and trades off performance to have better error reporting. For dsymutil, I made the auto verification mode the default in assert builds. It's way too slow to turn on in production.

dsymutil is a DWARF processor. dwarfdump -verify only does validation. The problem is that DWARF data appears almost always slightly bad. For example, running llvm-dwarfdump -verify on llvm-dwarfdump on my linux development box spits out over 160MB of log. The error information is both overwhelming, and due to the details necessary for diagnosis, nearly impossible to "diff". So, for a tool like Bolt, they can't easily know if they've done something horribly wrong with DWARF data, or if they just moved the errors around. With an error summary, they can figure out what things they might have made worse, what new errors they may have caused, and potentially also identified problems that came from the compiler.

BTW: Here's the bottom of that --verify log. 123 ranges aren't contained by parent ranges. I hope to improve the diagnostics so the output is a bit more opinionated/informative. I believe that overlapping DW_AT_ranges, for example, just aren't that big of a deal...

error: Aggregated error counts:
error: DIE address ranges are not contained by parent ranges occurred 123 time(s).
error: DIE has overlapping DW_AT_ranges occurred 119212 time(s).
error: DIEs have overlapping address ranges occurred 118323 time(s).
Errors detected.

dwblaikie · 2024-01-30T23:04:17Z

The amount and format of output from llvm-dwarfutil --verify makes it quite difficult to know if a change to a tool that produces or modifies DWARF is causing new problems, or is fixing existing problems.

The way dsymutil solves this problem is by having different verification modes.
  --verify-dwarf <verification mode>
                          Run the DWARF verifier on the input and/or output. Valid options are 'none, 'input', 'output', 'all' or 'auto' which runs the output verifier only if input verification passed.
Most of the time, invalid DWARF in means invalid DWARF out.

This diff adds a categorized summary of issues found by the DWARF verifier, on by default, at the bottom of the error output.

I didn't look at the patch, but from the description I'm not sure how aggregated error output solves this problem.
Speaking from experience with dsymutil, I don't think you want to have the verifier on by default. It's pretty slow and trades off performance to have better error reporting. For dsymutil, I made the auto verification mode the default in assert builds. It's way too slow to turn on in production.
dsymutil is a DWARF processor. dwarfdump -verify only does validation. The problem is that DWARF data appears almost always slightly bad. For example, running llvm-dwarfdump -verify on llvm-dwarfdump on my linux development box spits out over 160MB of log. The error information is both overwhelming, and due to the details necessary for diagnosis, nearly impossible to "diff". So, for a tool like Bolt, they can't easily know if they've done something horribly wrong with DWARF data, or if they just moved the errors around. With an error summary, they can figure out what things they might have made worse, what new errors they may have caused, and potentially also identified problems that came from the compiler.

BTW: Here's the bottom of that --verify log. 123 ranges aren't contained by parent ranges. I hope to improve the diagnostics so the output is a bit more opinionated/informative. I believe that overlapping DW_AT_ranges, for example, just aren't that big of a deal...
error: Aggregated error counts:
error: DIE address ranges are not contained by parent ranges occurred 123 time(s).
error: DIE has overlapping DW_AT_ranges occurred 119212 time(s).
error: DIEs have overlapping address ranges occurred 118323 time(s).
Errors detected.

These are mostly bugs that should be fixed in the verifier related to address tombstoning. (three CUs with a copy of an inline function - one gets picked, the other two get marked as "dead" usually by setting the starting address to 0 (depends on the linker) and then you have two CUs with functions that start at 0, so they're "overlapping" - or, if the inline functions happen to be truly identical, both CUs might point to the one copy, again, overlapping)

It might just not be the most useful tool to use for what you're trying to do without further work to fix these false positives.

JDevlieghere · 2024-01-31T00:48:53Z

dsymutil is a DWARF processor. dwarfdump -verify only does validation.

Ah, I think there's some confusion here. Your PR descriptions says llvm-dwarfutil but based on your reply you were talking about llvm-dwarfdump. The former is the equivalent of dsymutil for ELF and I thought it was using the DWARF verifier as a library (like dsymutil does) and all my comments were related to that. However you're trying to improve the output of llvm-dwarfdump (which is great!) so that explains why it wasn't clear to me how that would work :-)

clayborg · 2024-01-31T02:14:54Z

BTW: Here's the bottom of that --verify log. 123 ranges aren't contained by parent ranges. I hope to improve the diagnostics so the output is a bit more opinionated/informative. I believe that overlapping DW_AT_ranges, for example, just aren't that big of a deal...
error: Aggregated error counts:
error: DIE address ranges are not contained by parent ranges occurred 123 time(s).
error: DIE has overlapping DW_AT_ranges occurred 119212 time(s).
error: DIEs have overlapping address ranges occurred 118323 time(s).
Errors detected.
These are mostly bugs that should be fixed in the verifier related to address tombstoning. (three CUs with a copy of an inline function - one gets picked, the other two get marked as "dead" usually by setting the starting address to 0 (depends on the linker) and then you have two CUs with functions that start at 0, so they're "overlapping" - or, if the inline functions happen to be truly identical, both CUs might point to the one copy, again, overlapping)

We should fix the DWARFVerifier to do what llvm-gsymutil does: it gets the list of valid ".text" ranges. It does this by scanning the sections and making a set of address ranges that have read + execute permissions, and only checking addresses for any DIEs if they fall into these ranges. This keeps the GSYM file from having extra invalid addresses in the output file. Code is easy:

  AddressRanges TextRanges;
  for (const object::SectionRef &Sect : Obj.sections()) {
    if (!Sect.isText())
      continue;
    const uint64_t Size = Sect.getSize();
    if (Size == 0)
      continue;
    const uint64_t StartAddr = Sect.getAddress();
    TextRanges.insert(AddressRange(StartAddr, StartAddr + Size));
  }

It might just not be the most useful tool to use for what you're trying to do without further work to fix these false positives.

Yes, I can work with Kevin to fix these if you are ok with the above text ranges solution? I works really well for GSYM.

kevinfrei · 2024-01-31T16:45:34Z

Your PR descriptions says llvm-dwarfutil but based on your reply you were talking about llvm-dwarfdump

ZOMG, you are correct: I typed "llvm-dwarfutil" when I meant "llvm-dwarfdump". Fixed 🤡

kevinfrei · 2024-01-31T18:29:06Z

I also updated the PR description to include the changed flags that @clayborg suggested.

kevinfrei · 2024-02-01T00:38:02Z

Ping @JDevlieghere or @dwblaikie for an approval and clicking ye-old-purple-button-of-squashing-and-rebasing. I'm hoping to get something similar into llvm-gsymutil next (while procrastinating migration of Debuginfod tests from shell to api in LLDB🥇)

clayborg

LGTM, but I will let other LLVM folks give the final OK.

JDevlieghere

LGTM!

dwblaikie · 2024-02-01T16:17:12Z

I'm ok with @JDevlieghere's sign off here.

@clayborg

The amount and format of output from `llvm-dwarfdump --verify` makes it quite difficult to know if a change to a tool that produces or modifies DWARF is causing new problems, or is fixing existing problems. This diff adds a categorized summary of issues found by the DWARF verifier, on by default, at the bottom of the error output. The change includes a new `--error-display` option with 4 settings: * `--error-display=quiet`: Only display if errors occurred, but no details or summary are printed. * `--error-display=summary`: Only display the aggregated summary of errors with no error detail. * `--error-display=details`: Only display the detailed error messages with no summary (previous behavior) * `--error-display=full`: Display both the detailed error messages and the aggregated summary of errors (the default) I changed a handful of tests that were failing due to new output, adding the flag to use the old behavior for all but a couple. For those two I added the new aggregated output to the expected output of the test. The `OutputCategoryAggregator` is a pretty simple little class that @clayborg suggested to allow code to only be run to dump detail if it's enabled, while still collating counts of the category. Knowing that the lambda passed in is only conditionally executed is pretty important (handling errors has to be done *outside* the lambda). I'm happy to move this somewhere else (and change/improve it) to be more broadly useful if folks would like. --------- Co-authored-by: Kevin Frei <freik@meta.com>

@clayborg

The amount and format of output from `llvm-dwarfdump --verify` makes it quite difficult to know if a change to a tool that produces or modifies DWARF is causing new problems, or is fixing existing problems. This diff adds a categorized summary of issues found by the DWARF verifier, on by default, at the bottom of the error output. The change includes a new `--error-display` option with 4 settings: * `--error-display=quiet`: Only display if errors occurred, but no details or summary are printed. * `--error-display=summary`: Only display the aggregated summary of errors with no error detail. * `--error-display=details`: Only display the detailed error messages with no summary (previous behavior) * `--error-display=full`: Display both the detailed error messages and the aggregated summary of errors (the default) I changed a handful of tests that were failing due to new output, adding the flag to use the old behavior for all but a couple. For those two I added the new aggregated output to the expected output of the test. The `OutputCategoryAggregator` is a pretty simple little class that @clayborg suggested to allow code to only be run to dump detail if it's enabled, while still collating counts of the category. Knowing that the lambda passed in is only conditionally executed is pretty important (handling errors has to be done *outside* the lambda). I'm happy to move this somewhere else (and change/improve it) to be more broadly useful if folks would like. --------- Co-authored-by: Kevin Frei <freik@meta.com>

@clayborg

The amount and format of output from `llvm-dwarfdump --verify` makes it quite difficult to know if a change to a tool that produces or modifies DWARF is causing new problems, or is fixing existing problems. This diff adds a categorized summary of issues found by the DWARF verifier, on by default, at the bottom of the error output. The change includes a new `--error-display` option with 4 settings: * `--error-display=quiet`: Only display if errors occurred, but no details or summary are printed. * `--error-display=summary`: Only display the aggregated summary of errors with no error detail. * `--error-display=details`: Only display the detailed error messages with no summary (previous behavior) * `--error-display=full`: Display both the detailed error messages and the aggregated summary of errors (the default) I changed a handful of tests that were failing due to new output, adding the flag to use the old behavior for all but a couple. For those two I added the new aggregated output to the expected output of the test. The `OutputCategoryAggregator` is a pretty simple little class that @clayborg suggested to allow code to only be run to dump detail if it's enabled, while still collating counts of the category. Knowing that the lambda passed in is only conditionally executed is pretty important (handling errors has to be done *outside* the lambda). I'm happy to move this somewhere else (and change/improve it) to be more broadly useful if folks would like. --------- Co-authored-by: Kevin Frei <freik@meta.com>

GsymUtil, like DwarfDump --verify, spews a *lot* of data necessary to understand/diagnose issues with DWARF data. The trouble is that the kind of information necessary to make the messages useful also makes them nearly impossible to easily categorize. I put together a similar output categorizer (#79648) that will emit a summary of issues identified at the bottom of the (very verbose) output, enabling easier tracking of issues as they arise or are addressed. There's a single output change, where a message "warning: Unable to retrieve DWO .debug_info section for some object files. (Remove the --quiet flag for full output)" was being dumped the first time it was encountered (in what looks like an attempt to make something easily grep-able), but rather than keep the output in the same order, that message is now a 'category' so gets emitted at the end of the output. The test 'tools/llvm-gsymutil/X86/elf-dwo.yaml' was changed to reflect this difference. --------- Co-authored-by: Kevin Frei <freik@meta.com>

This adds documentation for --error-display, see: #79648

This adds documentation for --error-display, see: llvm#79648

kevinfrei force-pushed the aggregate-errors branch from 1d07b21 to 1b726a3 Compare January 26, 2024 21:15

kevinfrei marked this pull request as ready for review January 26, 2024 21:15

kevinfrei force-pushed the aggregate-errors branch from 1b726a3 to 1517df0 Compare January 26, 2024 21:18

dwblaikie self-requested a review January 26, 2024 22:13

dwblaikie added the debuginfo label Jan 26, 2024

dwblaikie reviewed Jan 26, 2024

View reviewed changes

clayborg reviewed Jan 30, 2024

View reviewed changes

JDevlieghere self-requested a review January 30, 2024 18:34

kevinfrei force-pushed the aggregate-errors branch from 303109b to 15290cf Compare January 30, 2024 22:14

kevinfrei changed the title ~~Aggregate errors from llvm-dwarfutil --verify~~ Aggregate errors from llvm-dwarfdump --verify Jan 31, 2024

Kevin Frei added 8 commits January 31, 2024 08:47

Not formatted, mostly to keep the current model easy to see

3733d88

Added usage comments

2d717a7

Finished first pass over the error()'s (inserted some TODO's)

01c5227

Updated with Greg's suggestion for the reporter

b33a3c6

Cleaned up: Non-verbose output is downright succinct!

2bd08dc

Set up the flags in a reasonable way

a6613ac

A few final touches before tests

c171a95

Adding flags to make output the same as before for tests

e3176b4

Kevin Frei added 5 commits January 31, 2024 08:47

A couple minor tweaks to keep the previous behavior 'available'

064bef5

Added checking for aggregate counts for a dwarfdump test

c09abdc

Added a warning for the new flags

5a7d23e

First chunk of PR feedback addressed

5269710

Response to PR feedback #2

617670b

kevinfrei force-pushed the aggregate-errors branch from 15290cf to 617670b Compare January 31, 2024 16:47

clayborg reviewed Feb 1, 2024

View reviewed changes

JDevlieghere approved these changes Feb 1, 2024

View reviewed changes

JDevlieghere merged commit bfdd782 into llvm:main Feb 1, 2024
4 checks passed

kevinfrei deleted the aggregate-errors branch February 1, 2024 17:08

kevinfrei mentioned this pull request Feb 8, 2024

Gsymutil aggregation similar to DwarfDump --verify #81154

Merged

bd1976bris added a commit that referenced this pull request Oct 4, 2024

[doc] Add llvm --error-display to dwarfdump documentation (#110922)

23c02a4

This adds documentation for --error-display, see: #79648

xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024

[doc] Add llvm --error-display to dwarfdump documentation (llvm#110922)

d058feb

This adds documentation for --error-display, see: llvm#79648

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregate errors from llvm-dwarfdump --verify #79648

Aggregate errors from llvm-dwarfdump --verify #79648

kevinfrei commented Jan 26, 2024 •

edited

Loading

github-actions bot commented Jan 26, 2024 •

edited

Loading

kevinfrei commented Jan 26, 2024

dwblaikie commented Jan 26, 2024

llvmbot commented Jan 26, 2024

dwblaikie left a comment

dwblaikie Jan 26, 2024

kevinfrei Jan 29, 2024

kevinfrei commented Jan 29, 2024 •

edited

Loading

WenleiHe commented Jan 30, 2024

JDevlieghere commented Jan 30, 2024

kevinfrei commented Jan 30, 2024

dwblaikie commented Jan 30, 2024

JDevlieghere commented Jan 31, 2024

clayborg commented Jan 31, 2024

kevinfrei commented Jan 31, 2024

kevinfrei commented Jan 31, 2024

kevinfrei commented Feb 1, 2024

clayborg left a comment

JDevlieghere left a comment

dwblaikie commented Feb 1, 2024

Aggregate errors from llvm-dwarfdump --verify #79648

Aggregate errors from llvm-dwarfdump --verify #79648

Conversation

kevinfrei commented Jan 26, 2024 • edited Loading

github-actions bot commented Jan 26, 2024 • edited Loading

kevinfrei commented Jan 26, 2024

dwblaikie commented Jan 26, 2024

llvmbot commented Jan 26, 2024

dwblaikie left a comment

Choose a reason for hiding this comment

dwblaikie Jan 26, 2024

Choose a reason for hiding this comment

kevinfrei Jan 29, 2024

Choose a reason for hiding this comment

kevinfrei commented Jan 29, 2024 • edited Loading

WenleiHe commented Jan 30, 2024

JDevlieghere commented Jan 30, 2024

kevinfrei commented Jan 30, 2024

dwblaikie commented Jan 30, 2024

JDevlieghere commented Jan 31, 2024

clayborg commented Jan 31, 2024

kevinfrei commented Jan 31, 2024

kevinfrei commented Jan 31, 2024

kevinfrei commented Feb 1, 2024

clayborg left a comment

Choose a reason for hiding this comment

JDevlieghere left a comment

Choose a reason for hiding this comment

dwblaikie commented Feb 1, 2024

kevinfrei commented Jan 26, 2024 •

edited

Loading

github-actions bot commented Jan 26, 2024 •

edited

Loading

kevinfrei commented Jan 29, 2024 •

edited

Loading