Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RemoveDIs] Read/write DbgRecords directly from/to bitcode #83251

Merged
merged 2 commits into from
Mar 15, 2024

Conversation

OCHyams
Copy link
Contributor

@OCHyams OCHyams commented Feb 28, 2024

If --write-experimental-debuginfo-iterators-to-bitcode is true (default true) and --expermental-debuginfo-iterators is also true then the new debug info format (non-instruction records) is written to bitcode directly.

Added the following records:

FUNC_CODE_DEBUG_RECORD_LABEL
FUNC_CODE_DEBUG_RECORD_VALUE
FUNC_CODE_DEBUG_RECORD_DECLARE
FUNC_CODE_DEBUG_RECORD_ASSIGN
FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE

The last one has an abbrev in FUNCTION_BLOCK BLOCK_INFO. Incidentally, this uses the last value available without widening the code-length for FUNCTION_BLOCK from 4 to 5 bits.

Records are formatted as follows:

All DbgRecord start with:
  1. DILocation

  FUNC_CODE_DEBUG_RECORD_LABEL
    2. DILabel

  DPValues then share common fields:
    2. DILocalVariable
    3. DIExpression

    FUNC_CODE_DEBUG_RECORD_VALUE
      4. Location Metadata

    FUNC_CODE_DEBUG_RECORD_DECLARE
      4. Location Metadata

    FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE
      4. Location Value (single)

    FUNC_CODE_DEBUG_RECORD_ASSIGN
      4. Location Metadata
      5. DIAssignID
      6. DIExpression (address)
      7. Location Metadata (address)

Encoding the DILocation metadata reference directly appeared to yield smaller bitcode files than encoding the operands seperately (as is done with instruction DILocations).

FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE is by far the most common DbgRecord record in optimized code (order of 5x-10x over other kinds). Unoptimized code should only contain FUNC_CODE_DEBUG_RECORD_DECLARE.


The bitcode format is converted back to the old debug mode (intrinsics) after loading as not all tools support RemoveDIs yet. This mirrors the approach taken for textual IR. We will eventually remove this back-and-forth conversion, but it's necessary to be able to incrementally update tools.


This change introduces a decent file size reduction for optimized-code bitcode files -- see the stage1-ReleaseLTO-g section in compile time tracker per-file sizes, savings ranging from 5-30%. LTO with debug goes a tiny bit faster too (I imagine as a result of the IO savings) (compile time tracker), even with the conversion roundtrip after reading.

I experimented with a few different schemas and this seemed to work best. Using the unwrapped-value-reference trick used in FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE (thanks @jmorse) doesn't have the same payoff for dbg.declares in an unoptimized bitcode file as it does for dbg.values in an optimized one (additional ~1% savings over the intrinsic version instead of an additional ~3%, for one test case, which is obviously only anecdotal, however this makes sense based on the nature of dbg.values/declares too; one dbg.declare is used per variable, which may be replaced with many dbg.values).

@OCHyams OCHyams requested review from nikic, jmorse and SLTozer February 28, 2024 11:42
@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:ir llvm:transforms labels Feb 28, 2024
@llvmbot
Copy link
Member

llvmbot commented Feb 28, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-lto

Author: Orlando Cazalet-Hyams (OCHyams)

Changes

If --write-experimental-debuginfo-iterators-to-bitcode is true (default true) and --expermental-debuginfo-iterators is also true then the new debug info format (non-instruction records) is written to bitcode directly.

Added the following records:

FUNC_CODE_DEBUG_RECORD_LABEL
FUNC_CODE_DEBUG_RECORD_VALUE
FUNC_CODE_DEBUG_RECORD_DECLARE
FUNC_CODE_DEBUG_RECORD_ASSIGN
FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE

The last one has an abbrev in FUNCTION_BLOCK BLOCK_INFO. Incidentally, this uses the last value available without widening the code-length for FUNCTION_BLOCK from 4 to 5 bits.

Records are formatted as follows:

All DbgRecord start with:
  1. DILocation

  FUNC_CODE_DEBUG_RECORD_LABEL
    2. DILabel

  DPValues then share common fields:
    2. DILocalVariable
    3. DIExpression

    FUNC_CODE_DEBUG_RECORD_VALUE
      4. Location Metadata

    FUNC_CODE_DEBUG_RECORD_DECLARE
      4. Location Metadata

    FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE
      4. Location Value (single)

    FUNC_CODE_DEBUG_RECORD_ASSIGN
      4. Location Metadata
      5. DIAssignID
      6. DIExpression (address)
      7. Location Metadata (address)

Encoding the DILocation metadata reference directly appeared to yield smaller bitcode files than encoding the operands seperately (as is done with instruction DILocations).

FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE is by far the most common DbgRecord record in optimized code (order of 5x-10x over other kinds). Unoptimized code should only contain FUNC_CODE_DEBUG_RECORD_DECLARE.


The bitcode format is converted back to the old debug mode (intrinsics) after loading as not all tools support RemoveDIs yet. This mirrors the approach taken for textual IR. We will eventually remove this back-and-forth conversion, but it's necessary to be able to incrementally update tools.


I experimented with a few different schemas and this seemed to work best. Using the unwrapped-value-reference trick used in FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE (thanks @jmorse) doesn't have the same payoff for dbg.declares in an unoptimized bitcode file as it does for dbg.values in an optimized one (~1% savings over the intrinsic version instead of ~3%, for one test case, which is obviously only anecdotal, however this makes sense based on the nature of dbg.values/declares too; one dbg.declare is used per variable, which may be replaced with many dbg.values).


Patch is 40.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83251.diff

13 Files Affected:

  • (modified) llvm/include/llvm/Bitcode/LLVMBitCodes.h (+11)
  • (modified) llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp (+4)
  • (modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+86-3)
  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+100-18)
  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriterPass.cpp (+10-10)
  • (modified) llvm/lib/Bitcode/Writer/ValueEnumerator.cpp (+100-49)
  • (modified) llvm/lib/IR/BasicBlock.cpp (+5)
  • (modified) llvm/lib/IR/Verifier.cpp (+11)
  • (modified) llvm/lib/Linker/IRMover.cpp (+20-3)
  • (modified) llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp (+5-4)
  • (added) llvm/test/Bitcode/dbg-record-roundtrip.ll (+127)
  • (modified) llvm/tools/llvm-as/llvm-as.cpp (+7)
  • (modified) llvm/tools/verify-uselistorder/verify-uselistorder.cpp (+11)
diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
index c6f0ddf29a6da8..528a6baa4870ee 100644
--- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h
+++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
@@ -624,6 +624,17 @@ enum FunctionCodes {
                                   //             operation, align, vol,
                                   //             ordering, synchscope]
   FUNC_CODE_BLOCKADDR_USERS = 60, // BLOCKADDR_USERS: [value...]
+
+  FUNC_CODE_DEBUG_RECORD_VALUE =
+      61, // [DILocation, DILocalVariable, DIExpression, ValueAsMetadata]
+  FUNC_CODE_DEBUG_RECORD_DECLARE =
+      62, // [DILocation, DILocalVariable, DIExpression, ValueAsMetadata]
+  FUNC_CODE_DEBUG_RECORD_ASSIGN =
+      63, // [DILocation, DILocalVariable, DIExpression, ValueAsMetadata,
+          //  DIAssignID, DIExpression (addr), ValueAsMetadata (addr)]
+  FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE =
+      64,  // [DILocation, DILocalVariable, DIExpression, Value]
+  FUNC_CODE_DEBUG_RECORD_LABEL = 65, // DPVALUE: [DILocation, DILabel]
 };
 
 enum UseListCodes {
diff --git a/llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp b/llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp
index 7005011980ebc9..81579a8469ddb0 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp
@@ -270,6 +270,10 @@ GetCodeName(unsigned CodeID, unsigned BlockID,
       STRINGIFY_CODE(FUNC_CODE, INST_CMPXCHG)
       STRINGIFY_CODE(FUNC_CODE, INST_CALLBR)
       STRINGIFY_CODE(FUNC_CODE, BLOCKADDR_USERS)
+      STRINGIFY_CODE(FUNC_CODE, DEBUG_RECORD_DECLARE)
+      STRINGIFY_CODE(FUNC_CODE, DEBUG_RECORD_VALUE)
+      STRINGIFY_CODE(FUNC_CODE, DEBUG_RECORD_ASSIGN)
+      STRINGIFY_CODE(FUNC_CODE, DEBUG_RECORD_VALUE_SIMPLE)
     }
   case bitc::VALUE_SYMTAB_BLOCK_ID:
     switch (CodeID) {
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index 832907a3f53f5f..f223864a4a6605 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -100,9 +100,6 @@ static cl::opt<bool> ExpandConstantExprs(
     cl::desc(
         "Expand constant expressions to instructions for testing purposes"));
 
-// Declare external flag for whether we're using the new debug-info format.
-extern llvm::cl::opt<bool> UseNewDbgInfoFormat;
-
 namespace {
 
 enum {
@@ -6357,6 +6354,81 @@ Error BitcodeReader::parseFunctionBody(Function *F) {
       InstructionList.push_back(I);
       break;
     }
+    case bitc::FUNC_CODE_DEBUG_RECORD_LABEL: {
+      // DPLabels are placed after the Instructions that they are attached to.
+      Instruction *Inst = getLastInstruction();
+      if (!Inst)
+        return error("Invalid record");
+      DILocation *DIL = cast<DILocation>(getFnMetadataByID(Record[0]));
+      DILabel *Label = cast<DILabel>(getFnMetadataByID(Record[1]));
+      Inst->getParent()->insertDPValueBefore(new DPLabel(Label, DIL),
+                                             Inst->getIterator());
+      continue; // This isn't an instruction.
+    }
+    case bitc::FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE:
+    case bitc::FUNC_CODE_DEBUG_RECORD_VALUE:
+    case bitc::FUNC_CODE_DEBUG_RECORD_DECLARE:
+    case bitc::FUNC_CODE_DEBUG_RECORD_ASSIGN: {
+      // DPValues are placed after the Instructions that they are attached to.
+      Instruction *Inst = getLastInstruction();
+      if (!Inst)
+        return error("Invalid record");
+
+      // First 3 fields are common to all kinds:
+      //   DILocation, DILocalVariable, DIExpression
+      // dbg_value (FUNC_CODE_DEBUG_RECORD_VALUE)
+      //   ..., LocationMetadata
+      // dbg_value (FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE - abbrev'd)
+      //   ..., Value
+      // dbg_declare (FUNC_CODE_DEBUG_RECORD_DECLARE)
+      //   ..., LocationMetadata
+      // dbg_assign (FUNC_CODE_DEBUG_RECORD_ASSIGN)
+      //   ..., LocationMetadata, DIAssignID, DIExpression, LocationMetadata
+      unsigned Slot = 0;
+      // Common fields (0-2).
+      DILocation *DIL = cast<DILocation>(getFnMetadataByID(Record[Slot++]));
+      DILocalVariable *Var =
+          cast<DILocalVariable>(getFnMetadataByID(Record[Slot++]));
+      DIExpression *Expr =
+          cast<DIExpression>(getFnMetadataByID(Record[Slot++]));
+
+      // Union field (3: LocationMetadata | Value).
+      Metadata *RawLocation = nullptr;
+      if (BitCode == bitc::FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE) {
+        Value *V = nullptr;
+        unsigned TyID = 0;
+        if (getValueTypePair(Record, Slot, NextValueNo, V, TyID, CurBB))
+          return error("Invalid record");
+        RawLocation = ValueAsMetadata::get(V);
+      } else {
+        RawLocation = getFnMetadataByID(Record[Slot++]);
+      }
+
+      DPValue *DPV = nullptr;
+      switch (BitCode) {
+      case bitc::FUNC_CODE_DEBUG_RECORD_VALUE:
+      case bitc::FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE:
+        DPV = new DPValue(RawLocation, Var, Expr, DIL,
+                          DPValue::LocationType::Value);
+        break;
+      case bitc::FUNC_CODE_DEBUG_RECORD_DECLARE:
+        DPV = new DPValue(RawLocation, Var, Expr, DIL,
+                          DPValue::LocationType::Declare);
+        break;
+      case bitc::FUNC_CODE_DEBUG_RECORD_ASSIGN: {
+        DIAssignID *ID = cast<DIAssignID>(getFnMetadataByID(Record[Slot++]));
+        DIExpression *AddrExpr =
+            cast<DIExpression>(getFnMetadataByID(Record[Slot++]));
+        Metadata *Addr = getFnMetadataByID(Record[Slot++]);
+        DPV = new DPValue(RawLocation, Var, Expr, ID, Addr, AddrExpr, DIL);
+        break;
+      }
+      default:
+        llvm_unreachable("Unknown DPValue bitcode");
+      }
+      Inst->getParent()->insertDPValueBefore(DPV, Inst->getIterator());
+      continue; // This isn't an instruction.
+    }
     case bitc::FUNC_CODE_INST_CALL: {
       // CALL: [paramattrs, cc, fmf, fnty, fnid, arg0, arg1...]
       if (Record.size() < 3)
@@ -6636,10 +6708,21 @@ Error BitcodeReader::materialize(GlobalValue *GV) {
   // Move the bit stream to the saved position of the deferred function body.
   if (Error JumpFailed = Stream.JumpToBit(DFII->second))
     return JumpFailed;
+
+  // Set the debug info mode to "new", forcing a mismatch between
+  // module and function debug modes. This is okay because we'll convert
+  // everything back to the old mode after parsing.
+  // FIXME: Remove this once all tools support RemoveDIs.
+  F->IsNewDbgInfoFormat = true;
+
   if (Error Err = parseFunctionBody(F))
     return Err;
   F->setIsMaterializable(false);
 
+  // Convert new debug info records into intrinsics.
+  // FIXME: Remove this once all tools support RemoveDIs.
+  F->convertFromNewDbgValues();
+
   if (StripDebugInfo)
     stripDebugInfo(*F);
 
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index 656f2a6ce870f5..8bd5913c7ea79e 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -99,6 +99,9 @@ namespace llvm {
 extern FunctionSummary::ForceSummaryHotnessType ForceSummaryEdgesCold;
 }
 
+extern bool WriteNewDbgInfoFormatToBitcode;
+extern llvm::cl::opt<bool> UseNewDbgInfoFormat;
+
 namespace {
 
 /// These are manifest constants used by the bitcode writer. They do not need to
@@ -128,6 +131,7 @@ enum {
   FUNCTION_INST_RET_VAL_ABBREV,
   FUNCTION_INST_UNREACHABLE_ABBREV,
   FUNCTION_INST_GEP_ABBREV,
+  FUNCTION_DEBUG_RECORD_VALUE_ABBREV,
 };
 
 /// Abstract class to manage the bitcode writing, subclassed for each bitcode
@@ -3491,25 +3495,93 @@ void ModuleBitcodeWriter::writeFunction(
       NeedsMetadataAttachment |= I.hasMetadataOtherThanDebugLoc();
 
       // If the instruction has a debug location, emit it.
-      DILocation *DL = I.getDebugLoc();
-      if (!DL)
-        continue;
-
-      if (DL == LastDL) {
-        // Just repeat the same debug loc as last time.
-        Stream.EmitRecord(bitc::FUNC_CODE_DEBUG_LOC_AGAIN, Vals);
-        continue;
+      if (DILocation *DL = I.getDebugLoc()) {
+        if (DL == LastDL) {
+          // Just repeat the same debug loc as last time.
+          Stream.EmitRecord(bitc::FUNC_CODE_DEBUG_LOC_AGAIN, Vals);
+        } else {
+          Vals.push_back(DL->getLine());
+          Vals.push_back(DL->getColumn());
+          Vals.push_back(VE.getMetadataOrNullID(DL->getScope()));
+          Vals.push_back(VE.getMetadataOrNullID(DL->getInlinedAt()));
+          Vals.push_back(DL->isImplicitCode());
+          Stream.EmitRecord(bitc::FUNC_CODE_DEBUG_LOC, Vals);
+          Vals.clear();
+          LastDL = DL;
+        }
       }
 
-      Vals.push_back(DL->getLine());
-      Vals.push_back(DL->getColumn());
-      Vals.push_back(VE.getMetadataOrNullID(DL->getScope()));
-      Vals.push_back(VE.getMetadataOrNullID(DL->getInlinedAt()));
-      Vals.push_back(DL->isImplicitCode());
-      Stream.EmitRecord(bitc::FUNC_CODE_DEBUG_LOC, Vals);
-      Vals.clear();
-
-      LastDL = DL;
+      // If the instruction has DPValues attached to it, emit them. Note that
+      // they come after the instruction so that it's easy to attach them again
+      // when reading the bitcode, even though conceptually the debug locations
+      // start "before" the instruction.
+      if (I.hasDbgValues() && WriteNewDbgInfoFormatToBitcode) {
+        /// Try to push the value only (unwrapped), otherwise push the
+        /// metadata wrapped value. Returns true if the value was pushed
+        /// without the ValueAsMetadata wrapper.
+        auto PushValueOrMetadata = [&Vals, InstID, this](Metadata *RawLocation) {
+          assert(RawLocation && "RawLocation unexpectedly null in DPValue");
+          if (ValueAsMetadata *VAM = dyn_cast<ValueAsMetadata>(RawLocation)) {
+            SmallVector<unsigned, 2> ValAndType;
+            // If the value is a fwd-ref the type is also pushed. We don't
+            // want the type, so fwd-refs are kept wrapped (pushValueAndType
+            // returns false if the value is pushed without type).
+            if (!pushValueAndType(VAM->getValue(), InstID, ValAndType)) {
+              Vals.push_back(ValAndType[0]);
+              return true;
+            }
+          }
+          // The metadata is a DIArgList, or ValueAsMetadata wrapping a
+          // fwd-ref. Push the metadata ID.
+          Vals.push_back(VE.getMetadataID(RawLocation));
+          return false;
+        };
+
+        // Write out non-instruction debug information attached to this
+        // instruction. Write it after the instruction so that it's easy to
+        // re-attach to the instruction reading the records in.
+        for (DbgRecord &DR : I.DbgMarker->getDbgValueRange()) {
+          if (DPLabel *DPL = dyn_cast<DPLabel>(&DR)) {
+            Vals.push_back(VE.getMetadataID(&*DPL->getDebugLoc()));
+            Vals.push_back(VE.getMetadataID(DPL->getLabel()));
+            Stream.EmitRecord(bitc::FUNC_CODE_DEBUG_RECORD_LABEL, Vals);
+            Vals.clear();
+            continue;
+          }
+
+          // First 3 fields are common to all kinds:
+          //   DILocation, DILocalVariable, DIExpression
+          // dbg_value (FUNC_CODE_DEBUG_RECORD_VALUE)
+          //   ..., LocationMetadata
+          // dbg_value (FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE - abbrev'd)
+          //   ..., Value
+          // dbg_declare (FUNC_CODE_DEBUG_RECORD_DECLARE)
+          //   ..., LocationMetadata
+          // dbg_assign (FUNC_CODE_DEBUG_RECORD_ASSIGN)
+          //   ..., LocationMetadata, DIAssignID, DIExpression, LocationMetadata
+          DPValue &DPV = cast<DPValue>(DR);
+          Vals.push_back(VE.getMetadataID(&*DPV.getDebugLoc()));
+          Vals.push_back(VE.getMetadataID(DPV.getVariable()));
+          Vals.push_back(VE.getMetadataID(DPV.getExpression()));
+          if (DPV.isDbgValue()) {
+            if (PushValueOrMetadata(DPV.getRawLocation()))
+              Stream.EmitRecord(bitc::FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE, Vals, FUNCTION_DEBUG_RECORD_VALUE_ABBREV);
+            else
+              Stream.EmitRecord(bitc::FUNC_CODE_DEBUG_RECORD_VALUE, Vals);
+          } else if (DPV.isDbgDeclare()) {
+            Vals.push_back(VE.getMetadataID(DPV.getRawLocation()));
+            Stream.EmitRecord(bitc::FUNC_CODE_DEBUG_RECORD_DECLARE, Vals);
+          } else {
+            assert(DPV.isDbgAssign() && "Unexpected DbgRecord kind");
+            Vals.push_back(VE.getMetadataID(DPV.getRawLocation()));
+            Vals.push_back(VE.getMetadataID(DPV.getAssignID()));
+            Vals.push_back(VE.getMetadataID(DPV.getAddressExpression()));
+            Vals.push_back(VE.getMetadataID(DPV.getRawAddress()));
+            Stream.EmitRecord(bitc::FUNC_CODE_DEBUG_RECORD_ASSIGN, Vals);
+          }
+          Vals.clear();
+        }
+      }
     }
 
     if (BlockAddress *BA = BlockAddress::lookup(&BB)) {
@@ -3750,7 +3822,17 @@ void ModuleBitcodeWriter::writeBlockInfo() {
         FUNCTION_INST_GEP_ABBREV)
       llvm_unreachable("Unexpected abbrev ordering!");
   }
-
+  {
+    auto Abbv = std::make_shared<BitCodeAbbrev>();
+    Abbv->Add(BitCodeAbbrevOp(bitc::FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE));
+    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 7)); // dbgloc
+    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 7)); // var
+    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 7)); // expr
+    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // val
+    if (Stream.EmitBlockInfoAbbrev(bitc::FUNCTION_BLOCK_ID, Abbv) !=
+        FUNCTION_DEBUG_RECORD_VALUE_ABBREV)
+      llvm_unreachable("Unexpected abbrev ordering! 1");
+  }
   Stream.ExitBlock();
 }
 
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriterPass.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriterPass.cpp
index 93fb2a821dee74..de2396f31f6669 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriterPass.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriterPass.cpp
@@ -18,11 +18,12 @@
 #include "llvm/Pass.h"
 using namespace llvm;
 
+extern bool WriteNewDbgInfoFormatToBitcode;
+
 PreservedAnalyses BitcodeWriterPass::run(Module &M, ModuleAnalysisManager &AM) {
-  // RemoveDIs: there's no bitcode representation of the DPValue debug-info,
-  // convert to dbg.values before writing out.
-  bool IsNewDbgInfoFormat = M.IsNewDbgInfoFormat;
-  if (IsNewDbgInfoFormat)
+  bool ConvertToOldDbgFormatForWrite =
+      M.IsNewDbgInfoFormat && !WriteNewDbgInfoFormatToBitcode;
+  if (ConvertToOldDbgFormatForWrite)
     M.convertFromNewDbgValues();
 
   const ModuleSummaryIndex *Index =
@@ -30,7 +31,7 @@ PreservedAnalyses BitcodeWriterPass::run(Module &M, ModuleAnalysisManager &AM) {
                        : nullptr;
   WriteBitcodeToFile(M, OS, ShouldPreserveUseListOrder, Index, EmitModuleHash);
 
-  if (IsNewDbgInfoFormat)
+  if (ConvertToOldDbgFormatForWrite)
     M.convertToNewDbgValues();
 
   return PreservedAnalyses::all();
@@ -56,16 +57,15 @@ namespace {
     StringRef getPassName() const override { return "Bitcode Writer"; }
 
     bool runOnModule(Module &M) override {
-      // RemoveDIs: there's no bitcode representation of the DPValue debug-info,
-      // convert to dbg.values before writing out.
-      bool IsNewDbgInfoFormat = M.IsNewDbgInfoFormat;
-      if (IsNewDbgInfoFormat)
+      bool ConvertToOldDbgFormatForWrite =
+          M.IsNewDbgInfoFormat && !WriteNewDbgInfoFormatToBitcode;
+      if (ConvertToOldDbgFormatForWrite)
         M.convertFromNewDbgValues();
 
       WriteBitcodeToFile(M, OS, ShouldPreserveUseListOrder, /*Index=*/nullptr,
                          /*EmitModuleHash=*/false);
 
-      if (IsNewDbgInfoFormat)
+      if (ConvertToOldDbgFormatForWrite)
         M.convertToNewDbgValues();
       return false;
     }
diff --git a/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp b/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp
index fccb2a606f7ed9..b2edb75eec8c6e 100644
--- a/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp
+++ b/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp
@@ -134,20 +134,28 @@ static OrderMap orderModule(const Module &M) {
     // Metadata used by instructions is decoded before the actual instructions,
     // so visit any constants used by it beforehand.
     for (const BasicBlock &BB : F)
-      for (const Instruction &I : BB)
-        for (const Value *V : I.operands()) {
-          if (const auto *MAV = dyn_cast<MetadataAsValue>(V)) {
-            if (const auto *VAM =
-                    dyn_cast<ValueAsMetadata>(MAV->getMetadata())) {
+      for (const Instruction &I : BB) {
+        auto OrderConstantFromMetadata = [&](Metadata *MD) {
+          if (const auto *VAM = dyn_cast<ValueAsMetadata>(MD)) {
+            orderConstantValue(VAM->getValue());
+          } else if (const auto *AL = dyn_cast<DIArgList>(MD)) {
+            for (const auto *VAM : AL->getArgs())
               orderConstantValue(VAM->getValue());
-            } else if (const auto *AL =
-                           dyn_cast<DIArgList>(MAV->getMetadata())) {
-              for (const auto *VAM : AL->getArgs())
-                orderConstantValue(VAM->getValue());
-            }
           }
+        };
+
+        for (DPValue &DPV : DPValue::filter(I.getDbgValueRange())) {
+          OrderConstantFromMetadata(DPV.getRawLocation());
+          if (DPV.isDbgAssign())
+            OrderConstantFromMetadata(DPV.getRawAddress());
         }
 
+        for (const Value *V : I.operands()) {
+          if (const auto *MAV = dyn_cast<MetadataAsValue>(V))
+            OrderConstantFromMetadata(MAV->getMetadata());
+        }
+      }
+
     for (const Argument &A : F.args())
       orderValue(&A, OM);
     for (const BasicBlock &BB : F)
@@ -261,33 +269,39 @@ static UseListOrderStack predictUseListOrder(const Module &M) {
   // constants in the last Function they're used in.  Module-level constants
   // have already been visited above.
   for (const Function &F : llvm::reverse(M)) {
+    auto PredictValueOrderFromMetadata = [&](Metadata *MD) {
+      if (const auto *VAM = dyn_cast<ValueAsMetadata>(MD)) {
+        predictValueUseListOrder(VAM->getValue(), &F, OM, Stack);
+      } else if (const auto *AL = dyn_cast<DIArgList>(MD)) {
+        for (const auto *VAM : AL->getArgs())
+          predictValueUseListOrder(VAM->getValue(), &F, OM, Stack);
+      }
+    };
     if (F.isDeclaration())
       continue;
     for (const BasicBlock &BB : F)
       predictValueUseListOrder(&BB, &F, OM, Stack);
     for (const Argument &A : F.args())
       predictValueUseListOrder(&A, &F, OM, Stack);
-    for (const BasicBlock &BB : F)
+    for (const BasicBlock &BB : F) {
       for (const Instruction &I : BB) {
+        for (DPValue &DPV : DPValue::filter(I.getDbgValueRange())) {
+          PredictValueOrderFromMetadata(DPV.getRawLocation());
+          if (DPV.isDbgAssign())
+            PredictValueOrderFromMetadata(DPV.getRawAddress());
+        }
         for (const Value *Op : I.operands()) {
           if (isa<Constant>(*Op) || isa<InlineAsm>(*Op)) // Visit GlobalValues.
             predictValueUseListOrder(Op, &F, OM, Stack);
-          if (const auto *MAV = dyn_cast<MetadataAsValue>(Op)) {
-            if (const auto *VAM =
-                    dyn_cast<ValueAsMetadata>(MAV->getMetadata())) {
-              predictValueUseListOrder(VAM->getValue(), &F, OM, Stack);
-            } else if (const auto *AL =
-                           dyn_cast<DIArgList>(MAV->getMetadata())) {
-              for (const auto *VAM : AL->getArgs())
-                predictValueUseListOrder(VAM->getValue(), &F, OM, Stack);
-            }
-          }
+          if (const auto *MAV = dyn_cast<MetadataAsValue>(Op))
+            PredictValueOrderFromMetadata(MAV->getMetadata());
         }
         if (auto *SVI = dyn_cast<ShuffleVectorInst>(&I))
           predictValueUseListOrder(SVI->getShuffleMaskForBitcode(), &F, OM,
                                    Stack);
         predictValueUseListOrder(&I, &F, OM, Stack);
       }
+    }
   }
 
   // Visit globals last, since the module-level use-list block will be seen
@@ -409,6 +423,41 @@ ValueEnumerator::ValueEnumerator(const Module &M,
 
     for (const BasicBlock &BB : F)
       for (const Instruction &I : BB) {
+        // Local metadata is enumerated during function-incorporation, but
+        // any ConstantAsMetadata arguments in a DIArgList should be examined
+        // now.
+        auto EnumerateNonLocalValuesFromMetadata = [&](Metadata *MD) {
+   ...
[truncated]

Copy link

github-actions bot commented Feb 28, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@OCHyams OCHyams changed the title [RemoveDIs] Add bitcode format [RemoveDIs] Read/write DbgRecords directly to/from bitcode Feb 28, 2024
@OCHyams OCHyams changed the title [RemoveDIs] Read/write DbgRecords directly to/from bitcode [RemoveDIs] Read/write DbgRecords directly from/to bitcode Feb 28, 2024
@teresajohnson
Copy link
Contributor

Great to hear that this will be a size and performance improvement for LTO! I have added @dwblaikie as a reviewer, as he is way more familiar with debug info internals and probably a better person to take a look at the changes.

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done some review of the structure and implementation here -- higher level thoughts such as "should we be using this number-space" and so forth are beyond me.

IMO: this patch should be landed in a default-off state for the write-out-removedis-bitcode flag, with the test tweaked to enable it on the command line. You can then turn the whole thing on-by-default in a follow up patch. The benefit of doing this is that it's very clear to downstream projects (which could even include us) how to turn it off if there's some difficulty in their own code. Given the bug reports we've had, a couple of projects are working through issues with the rest of RemoveDIs, and making it easy to turn off will have helped them.

The space savings here are great -- the metadata reference sizes in the abbrev seem small though, is metadata packed in around each function? (I'd always imagined it was one massive block with an average node-count of several thousands).

The test coverage seems good. I don't know what else we'd do to test, as examining the exact record layout seems like overkill.

// DPLabels are placed after the Instructions that they are attached to.
Instruction *Inst = getLastInstruction();
if (!Inst)
return error("Invalid record");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More enlightening error messages appreciated; it'll save us misery in the future. (Same further down).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, wdyt?

Comment on lines 6400 to 6455
if (getValueTypePair(Record, Slot, NextValueNo, V, TyID, CurBB))
return error("Invalid record");
RawLocation = ValueAsMetadata::get(V);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth recording in a comment that we never expect to see a fwd-dec out of getValueTypePair as we never emit them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// Set the debug info mode to "new", forcing a mismatch between
// module and function debug modes. This is okay because we'll convert
// everything back to the old mode after parsing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it alright to feed intrinsic-mode code into the convert-from-removedis-mode utilities? I can see how it'd work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, convertFromNewDbgValues doesn't do any checking. It just converts DbgRecords to intrinsics. It doesn't do anything / doesn't care if it sees existing intrinsics.

Comment on lines +3526 to +3551
// If the value is a fwd-ref the type is also pushed. We don't
// want the type, so fwd-refs are kept wrapped (pushValueAndType
// returns false if the value is pushed without type).
if (!pushValueAndType(VAM->getValue(), InstID, ValAndType)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, we'll only encode a Value reference if it wouldn't have the type, and so isn't a fwd-dec? Seems alright to me; IMO we have the ability to choose how these things get encoded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's right. It saves us needing to encode the fwd-ref type, which would result in an extra field in the abbrev that is nonzero for < 1% of records.

Comment on lines +272 to +279
auto PredictValueOrderFromMetadata = [&](Metadata *MD) {
if (const auto *VAM = dyn_cast<ValueAsMetadata>(MD)) {
predictValueUseListOrder(VAM->getValue(), &F, OM, Stack);
} else if (const auto *AL = dyn_cast<DIArgList>(MD)) {
for (const auto *VAM : AL->getArgs())
predictValueUseListOrder(VAM->getValue(), &F, OM, Stack);
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on why this is needed -- as we have our own use-list situation for RemoveDIs, and debug-info has never appeared in normal Value use-lists, is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. I added it because we do it for debug intrinsics already (see below, peering through ValueAsMetadatas). I think you might be right, that it's not needed. But I'm not 100% certain. I'm going to push the rest of the changes for now and dig deeper into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have thought/looked a bit closer, I don't think the existing behaviour makes a lot of sense. The uses of the MetadataAsValue(ValueAsMetadata(V)) wrapped values looked at by the existing code (line 275 in the old file) don't get added to the use list order map because predictValueUseListOrderImpl is only concerned with Value::uses (these wrapped values don't show up there).

I think @nikic added this code - do you remember why it's necessary to look through MetadataAsValue(ValueAsMetadata(V)) uses here?

I agree with @jmorse that the new code isn't needed. However, I would prefer to keep the RemoveDIs/existing code in sync (removing or keeping both). wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that this is flushing out the order of the use list /inside/ the MAV/VAM pairing? I think it's technically possible for that to become visible, if you findDbgUsers on a value then move all of it's users to a position, you enumerate them in the use-list order. It's not clear that that's what this code preserves though (perhaps nothing preserves it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah I don't think so. The MAV/VAM is unwrapped, and predictValueUseListOrder only maps uses() of the Value, so the MAV/VAM "use" isn't considered at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping things in-sync makes sense to me; avoids any unexpected differences.

Comment on lines -419 to -431
// Local metadata is enumerated during function-incorporation, but
// any ConstantAsMetadata arguments in a DIArgList should be examined
// now.
if (isa<LocalAsMetadata>(MD->getMetadata()))
continue;
if (auto *AL = dyn_cast<DIArgList>(MD->getMetadata())) {
for (auto *VAM : AL->getArgs())
if (isa<ConstantAsMetadata>(VAM))
EnumerateMetadata(&F, VAM);
continue;
}

EnumerateMetadata(&F, MD->getMetadata());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting this hunk is going to knacker anyone not running in RemoveDIs mode right now, yes? IMO it's worth stick a comment in saying it's scheduled for removal instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not deleted, it's been lambda-ified; EnumerateNonLocalValuesFromMetadata, called on line 468.

if (auto *MD = dyn_cast<MetadataAsValue>(&OI))
AddFnLocalMetadata(MD->getMetadata());
}
/// RemoteDIs: Add non-instruction function-local metadata uses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemoveDIs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 2982 to 2987
Check(BB.getParent()->IsNewDbgInfoFormat ==
BB.getParent()->getParent()->IsNewDbgInfoFormat,
"Fn debug format (new=" +
std::to_string(BB.getParent()->IsNewDbgInfoFormat) +
") should match parent " + BB.getParent()->getName() + " in " +
BB.getParent()->getParent()->getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More appropriate in the Function visitor than the block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - I found #84292with this, which makes me worried that this is going to cause some friction when landing. I think it makes sense to move the verifier change to a separate patch -- #84308

Comment on lines 619 to 620
if (auto *OldFunc = dyn_cast<Function>(SGV))
F->IsNewDbgInfoFormat = OldFunc->IsNewDbgInfoFormat;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had this hunk floating around in my worktrees for a while and over time I've grown worried about it -- AFAIUI this function hits the bitcode materializer, presumably that's the correct point to work out what the correct RemoveDIs/DDD mode we should be in, rather than here? This is part of the wider problem of drawing a line through LLVM that we want to loop-in to using RemoveDIs, perhaps there's just no good solution at this stage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was left over from moving the "line through llvm" around a bit. It's not needed anymore, deleted. Good catch!

Comment on lines 17 to 18
; RUN: llvm-link %s --experimental-debuginfo-iterators=false < llvm-as %s --experimental-debuginfo-iterators=true
; RUN: llvm-link %s --experimental-debuginfo-iterators=true < llvm-as %s --experimental-debuginfo-iterators=false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipe rather than '<'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

OCHyams added a commit to OCHyams/llvm-project that referenced this pull request Mar 7, 2024
OCHyams added a commit that referenced this pull request Mar 7, 2024
…4292)

This trips the verifier changes added in #83251

Stimulated by llvm/test/MC/WebAssembly/extern-functype-intrinsic.ll
Copy link
Contributor Author

@OCHyams OCHyams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata reference sizes in the abbrev seem small though, is metadata packed in around each function? (I'd always imagined it was one massive block with an average node-count of several thousands).

VBR 7 yielded the best result for the samples I was looking, over a few different sizes that I tried. I've pushed some different VBR widths to OCHyams/perf/ddd-bitcode2 on the compiletime tracker, which agrees that 6 or 7 is best for CTMark (look at the per-file size comparisons).

IMO: this patch should be landed in a default-off state for the write-out-removedis-bitcode flag

SGTM, done

I've done some review of the structure and implementation here -- higher level thoughts such as "should we be using this number-space" and so forth are beyond me.

@dblaikie do you have any strong feelings about this? Is it alright to add 5 record types and a function block-info abbrev?


// Set the debug info mode to "new", forcing a mismatch between
// module and function debug modes. This is okay because we'll convert
// everything back to the old mode after parsing.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, convertFromNewDbgValues doesn't do any checking. It just converts DbgRecords to intrinsics. It doesn't do anything / doesn't care if it sees existing intrinsics.

Comment on lines +3526 to +3551
// If the value is a fwd-ref the type is also pushed. We don't
// want the type, so fwd-refs are kept wrapped (pushValueAndType
// returns false if the value is pushed without type).
if (!pushValueAndType(VAM->getValue(), InstID, ValAndType)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's right. It saves us needing to encode the fwd-ref type, which would result in an extra field in the abbrev that is nonzero for < 1% of records.

Comment on lines -419 to -431
// Local metadata is enumerated during function-incorporation, but
// any ConstantAsMetadata arguments in a DIArgList should be examined
// now.
if (isa<LocalAsMetadata>(MD->getMetadata()))
continue;
if (auto *AL = dyn_cast<DIArgList>(MD->getMetadata())) {
for (auto *VAM : AL->getArgs())
if (isa<ConstantAsMetadata>(VAM))
EnumerateMetadata(&F, VAM);
continue;
}

EnumerateMetadata(&F, MD->getMetadata());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not deleted, it's been lambda-ified; EnumerateNonLocalValuesFromMetadata, called on line 468.

// DPLabels are placed after the Instructions that they are attached to.
Instruction *Inst = getLastInstruction();
if (!Inst)
return error("Invalid record");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, wdyt?

Comment on lines 6400 to 6455
if (getValueTypePair(Record, Slot, NextValueNo, V, TyID, CurBB))
return error("Invalid record");
RawLocation = ValueAsMetadata::get(V);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (auto *MD = dyn_cast<MetadataAsValue>(&OI))
AddFnLocalMetadata(MD->getMetadata());
}
/// RemoteDIs: Add non-instruction function-local metadata uses.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 2982 to 2987
Check(BB.getParent()->IsNewDbgInfoFormat ==
BB.getParent()->getParent()->IsNewDbgInfoFormat,
"Fn debug format (new=" +
std::to_string(BB.getParent()->IsNewDbgInfoFormat) +
") should match parent " + BB.getParent()->getName() + " in " +
BB.getParent()->getParent()->getName());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - I found #84292with this, which makes me worried that this is going to cause some friction when landing. I think it makes sense to move the verifier change to a separate patch -- #84308

Comment on lines 17 to 18
; RUN: llvm-link %s --experimental-debuginfo-iterators=false < llvm-as %s --experimental-debuginfo-iterators=true
; RUN: llvm-link %s --experimental-debuginfo-iterators=true < llvm-as %s --experimental-debuginfo-iterators=false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 619 to 620
if (auto *OldFunc = dyn_cast<Function>(SGV))
F->IsNewDbgInfoFormat = OldFunc->IsNewDbgInfoFormat;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was left over from moving the "line through llvm" around a bit. It's not needed anymore, deleted. Good catch!

Comment on lines +272 to +279
auto PredictValueOrderFromMetadata = [&](Metadata *MD) {
if (const auto *VAM = dyn_cast<ValueAsMetadata>(MD)) {
predictValueUseListOrder(VAM->getValue(), &F, OM, Stack);
} else if (const auto *AL = dyn_cast<DIArgList>(MD)) {
for (const auto *VAM : AL->getArgs())
predictValueUseListOrder(VAM->getValue(), &F, OM, Stack);
}
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. I added it because we do it for debug intrinsics already (see below, peering through ValueAsMetadatas). I think you might be right, that it's not needed. But I'm not 100% certain. I'm going to push the rest of the changes for now and dig deeper into this.

OCHyams added a commit that referenced this pull request Mar 11, 2024
Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all sounds good now; LGTM. I still worry about allocating parts of the number-space without getting any further input, but possibly the fact that there aren't any warnings / processes / people scanning for this patch indicates that it's alright. It might be worth waiting a couple of days before landing to see if there's anyone else out there, and/or putting a dedicated announcement on discourse (one imagines not everyone had read the ten or so updates we've done in the same topic).

Comment on lines +272 to +279
auto PredictValueOrderFromMetadata = [&](Metadata *MD) {
if (const auto *VAM = dyn_cast<ValueAsMetadata>(MD)) {
predictValueUseListOrder(VAM->getValue(), &F, OM, Stack);
} else if (const auto *AL = dyn_cast<DIArgList>(MD)) {
for (const auto *VAM : AL->getArgs())
predictValueUseListOrder(VAM->getValue(), &F, OM, Stack);
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping things in-sync makes sense to me; avoids any unexpected differences.

@OCHyams
Copy link
Contributor Author

OCHyams commented Mar 13, 2024

Thanks! And good idea, I've asked for additional opinions on discourse here.

@OCHyams
Copy link
Contributor Author

OCHyams commented Mar 15, 2024

Squashed and formatted. I'll rebase then merge this (note that this now lands disabled by default so that we can have a more easily revertible capstone patch which just changes the default -- I'll follow up with that patch shortly). Thanks all!

OCHyams added 2 commits March 15, 2024 09:52
If --write-experimental-debuginfo-iterators-to-bitcode is true (default false)
and --expermental-debuginfo-iterators is also true then the new debug info
format (non-instruction records) is written to bitcode directly.

Added the following records:

    FUNC_CODE_DEBUG_RECORD_LABEL
    FUNC_CODE_DEBUG_RECORD_VALUE
    FUNC_CODE_DEBUG_RECORD_DECLARE
    FUNC_CODE_DEBUG_RECORD_ASSIGN
    FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE

The last one has an abbrev in FUNCTION_BLOCK BLOCK_INFO. Incidentally, this uses
the last value available without widening the code-length for FUNCTION_BLOCK
from 4 to 5 bits.

Records are formatted as follows:

    All DbgRecord start with:
      1. DILocation

      FUNC_CODE_DEBUG_RECORD_LABEL
        2. DILabel

      DPValues then share common fields:
        2. DILocalVariable
        3. DIExpression

        FUNC_CODE_DEBUG_RECORD_VALUE
          4. Location Metadata

        FUNC_CODE_DEBUG_RECORD_DECLARE
          4. Location Metadata

        FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE
	  4. Location Value (single)

        FUNC_CODE_DEBUG_RECORD_ASSIGN
	  4. Location Metadata
	  5. DIAssignID
	  6. DIExpression (address)
	  7. Location Metadata (address)

Encoding the DILocation metadata reference directly appeared to yield smaller
bitcode files than encoding the operands seperately (as is done with instruction
DILocations).

FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE is by far the most common DbgRecord record
in optimized code (order of 5x-10x over other kinds). Unoptimized code should
only contain FUNC_CODE_DEBUG_RECORD_DECLARE.
@OCHyams OCHyams merged commit d6d3d96 into llvm:main Mar 15, 2024
3 of 4 checks passed
OCHyams added a commit that referenced this pull request Mar 15, 2024
OCHyams added a commit that referenced this pull request Mar 15, 2024
…83251)

Reaplying after revert in #85382 (861ebe6).
Fixed intermittent test failure by avoiding piping output in some RUN lines.

If --write-experimental-debuginfo-iterators-to-bitcode is true (default false)
and --expermental-debuginfo-iterators is also true then the new debug info
format (non-instruction records) is written to bitcode directly.

Added the following records:

    FUNC_CODE_DEBUG_RECORD_LABEL
    FUNC_CODE_DEBUG_RECORD_VALUE
    FUNC_CODE_DEBUG_RECORD_DECLARE
    FUNC_CODE_DEBUG_RECORD_ASSIGN
    FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE

The last one has an abbrev in FUNCTION_BLOCK BLOCK_INFO. Incidentally, this uses
the last value available without widening the code-length for FUNCTION_BLOCK
from 4 to 5 bits.

Records are formatted as follows:

    All DbgRecord start with:
      1. DILocation

      FUNC_CODE_DEBUG_RECORD_LABEL
        2. DILabel

      DPValues then share common fields:
        2. DILocalVariable
        3. DIExpression

        FUNC_CODE_DEBUG_RECORD_VALUE
          4. Location Metadata

        FUNC_CODE_DEBUG_RECORD_DECLARE
          4. Location Metadata

        FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE
	  4. Location Value (single)

        FUNC_CODE_DEBUG_RECORD_ASSIGN
	  4. Location Metadata
	  5. DIAssignID
	  6. DIExpression (address)
	  7. Location Metadata (address)

Encoding the DILocation metadata reference directly appeared to yield smaller
bitcode files than encoding the operands seperately (as is done with instruction
DILocations).

FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE is by far the most common DbgRecord record
in optimized code (order of 5x-10x over other kinds). Unoptimized code should
only contain FUNC_CODE_DEBUG_RECORD_DECLARE.
OCHyams added a commit that referenced this pull request Mar 15, 2024
Follow on from #83251. This patch simply enables the behaviour by default in
order to provide an easily revertible capstone.
OCHyams added a commit that referenced this pull request Mar 15, 2024
OCHyams added a commit that referenced this pull request Mar 15, 2024
Note: This wasn't the cause of the strange behaviour mentioned in the NOTE
comment in the test.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 16, 2024
Reverts:  breaks comgr test
435d4c1 Reapply [RemoveDIs] Read/write DbgRecords directly from/to bitcode (llvm#83251)

Change-Id: Ic1423e133e553e0ae22db56b25773f76c8d72bcb
@antmox
Copy link
Contributor

antmox commented Mar 18, 2024

Hello @OCHyams,

it looks like commit e419084
broke clang-aarch64-full-2stage bot: https://lab.llvm.org/buildbot/#/builders/179/builds/9629

Several (20) cfi tests now fail with this error:

ld.lld: error: Invalid value (Producer: 'LLVM19.0.0git' Reader: 'LLVM 17.0.6')

Any idea how to fix this?

@OCHyams
Copy link
Contributor Author

OCHyams commented Mar 18, 2024

Hi @antmox, thanks for the heads up. I'm not at my desk right now (its after work hours here). Did you bisect the issue to this patch? This one adds some new behaviour but keeps it off by default. I can't check right now but I suspect it's e419084 that is causing the issue, which turns on the new behaviour by default.

If you've got the bandwidth, please can you revert e419084 (or try adding the -mllvm option write-experimental-debuginfo-iterators-to-bitcode=false to those tests). Otherwise I will be able to come back and do it in an hour or so.

@OCHyams
Copy link
Contributor Author

OCHyams commented Mar 18, 2024

I reverted the commit mentioned above in 9a5c0d6. Hopefully that gets the bots green again.

@antmox
Copy link
Contributor

antmox commented Mar 19, 2024

Thanks @OCHyams, the bots are green again

@OCHyams
Copy link
Contributor Author

OCHyams commented Mar 19, 2024

I think this is actually a bot config issue - see llvm/llvm-zorg#139

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 21, 2024
…lvm#83251)

RL: rebased on top of latest amd-staging

Reaplying after revert in llvm#85382 (861ebe6).
Fixed intermittent test failure by avoiding piping output in some RUN lines.

If --write-experimental-debuginfo-iterators-to-bitcode is true (default false)
and --expermental-debuginfo-iterators is also true then the new debug info
format (non-instruction records) is written to bitcode directly.

Added the following records:

    FUNC_CODE_DEBUG_RECORD_LABEL
    FUNC_CODE_DEBUG_RECORD_VALUE
    FUNC_CODE_DEBUG_RECORD_DECLARE
    FUNC_CODE_DEBUG_RECORD_ASSIGN
    FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE

The last one has an abbrev in FUNCTION_BLOCK BLOCK_INFO. Incidentally, this uses
the last value available without widening the code-length for FUNCTION_BLOCK
from 4 to 5 bits.

Records are formatted as follows:

    All DbgRecord start with:
      1. DILocation

      FUNC_CODE_DEBUG_RECORD_LABEL
        2. DILabel

      DPValues then share common fields:
        2. DILocalVariable
        3. DIExpression

        FUNC_CODE_DEBUG_RECORD_VALUE
          4. Location Metadata

        FUNC_CODE_DEBUG_RECORD_DECLARE
          4. Location Metadata

        FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE
	  4. Location Value (single)

        FUNC_CODE_DEBUG_RECORD_ASSIGN
	  4. Location Metadata
	  5. DIAssignID
	  6. DIExpression (address)
	  7. Location Metadata (address)

Encoding the DILocation metadata reference directly appeared to yield smaller
bitcode files than encoding the operands seperately (as is done with instruction
DILocations).

FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE is by far the most common DbgRecord record
in optimized code (order of 5x-10x over other kinds). Unoptimized code should
only contain FUNC_CODE_DEBUG_RECORD_DECLARE.

Change-Id: I1bbd2237f9d6a80eeedfaf155922b972249dc78a
OCHyams added a commit that referenced this pull request Mar 25, 2024
Follow on from #83251. This patch simply enables the behaviour by default in
order to provide an easily revertible capstone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:ir llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants