Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DebugInfo][InstCombine] Do not overwrite prior DILocation for new Insts #108565

Merged
merged 2 commits into from
Oct 3, 2024

Conversation

SLTozer
Copy link
Contributor

@SLTozer SLTozer commented Sep 13, 2024

When InstCombine replaces an old instruction with a new instruction, it copies !dbg and !annotation metadata from old to new. For some InstCombine patterns we set a specific DILocation on the new instruction prior to insertion, however, which more accurately reflects the new instruction. This more specific DILocation may be overwritten on insertion by a less appropriate one, resulting in a less correct line mapping. This patch changes this behaviour to only copy the DILocation from old to new if the new instruction has no existing DILocation (which will always be the case for a new instruction unless InstCombine has specifically set one).

This isn't a perfect solution. In the test case given, we create a load instruction and give it the merged location of the two existing load instructions it replaces. Prior to this patch, if the PHI that is being RAUWd has a DILocation, it will overwrite the merged location on the new load, which is bad. However even with this patch, if either of the load instructions has no DILocation, then the new load won't have one either, and it will thus receive the DILocation from the PHI. This is probably bad - the PHI could have a valid DILocation corresponding to a different line, meaning that if the load results in an invalid memory access then we will attribute the resulting crash to the PHI's line, which could be worse than giving no attribution at all. I don't see any easy way to solve this problem however without fundamentally changing how InstCombine assigns DILocations; the most likely fix would be to exclusively assign DILocations within the InstCombine patterns, or to otherwise allow them to specify whether or not a DILocation ought to be copied from the original instruction or not.

@llvmbot
Copy link
Member

llvmbot commented Sep 13, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-debuginfo

Author: Stephen Tozer (SLTozer)

Changes

When InstCombine replaces an old instruction with a new instruction, it copies !dbg and !annotation metadata from old to new. For some InstCombine patterns we set a specific DILocation on the new instruction prior to insertion, however, which more accurately reflects the new instruction. This more specific DILocation may be overwritten on insertion by a less appropriate one, resulting in a less correct line mapping. This patch changes this behaviour to only copy the DILocation from old to new if the new instruction has no existing DILocation (which will always be the case for a new instruction unless InstCombine has specifically set one).

This isn't a perfect solution. In the test case given, we create a load instruction and give it the merged location of the two existing load instructions it replaces. Prior to this patch, if the PHI that is being RAUWd has a DILocation, it will overwrite the merged location on the new load, which is bad. However even with this patch, if either of the load instructions has no DILocation, then the new load won't have one either, and it will thus receive the DILocation from the PHI. This is probably bad - the PHI could have a valid DILocation corresponding to a different line, meaning that if the load results in an invalid memory access then we will attribute the resulting crash to the PHI's line, which could be worse than giving no attribution at all. I don't see any easy way to solve this problem however without fundamentally changing how InstCombine assigns DILocations; the most likely fix would be to exclusively assign DILocations within the InstCombine patterns, or to otherwise allow them to specify whether or not a DILocation ought to be copied from the original instruction or not.


Full diff: https://github.com/llvm/llvm-project/pull/108565.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+7-2)
  • (added) llvm/test/Transforms/InstCombine/new-inst-dbgloc-overwrite.ll (+77)
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index 8195e0539305cc..bd9af1ded431e9 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -5163,8 +5163,13 @@ bool InstCombinerImpl::run() {
         LLVM_DEBUG(dbgs() << "IC: Old = " << *I << '\n'
                           << "    New = " << *Result << '\n');
 
-        Result->copyMetadata(*I,
-                             {LLVMContext::MD_dbg, LLVMContext::MD_annotation});
+        // We copy the old instruction's DebugLoc to the new instruction, unless
+        // InstCombine already assigned a DebugLoc to it, in which case we
+        // should trust the more specifically selected DebugLoc.
+        if (!Result->getDebugLoc())
+          Result->setDebugLoc(I->getDebugLoc());
+        // We also copy annotation metadata to the new instruction.
+        Result->copyMetadata(*I, LLVMContext::MD_annotation);
         // Everything uses the new instruction now.
         I->replaceAllUsesWith(Result);
 
diff --git a/llvm/test/Transforms/InstCombine/new-inst-dbgloc-overwrite.ll b/llvm/test/Transforms/InstCombine/new-inst-dbgloc-overwrite.ll
new file mode 100644
index 00000000000000..c8ce8d3b433f4d
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/new-inst-dbgloc-overwrite.ll
@@ -0,0 +1,77 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals smart
+;; Tests that when InstCombine sets a DILocation on the new instruction when it
+;; is created, we do not try to overwrite that DILocation when we later insert
+;; the new instruction.
+;; In this test, InstCombine replaces two loads joined by a PHI into a PHI of
+;; the addresses followed by a load, and gives the new load a merge of the two
+;; incoming load DILocations. This test verifies that the new load keeps that
+;; DILocation, and doesn't have it overwritten by the DILocation of the original
+;; PHI. It will, however, receive the !annotation attached to the original PHI.
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+
+define void @test(ptr %xfA, ptr %xfB, i1 %cmp5) {
+; CHECK-LABEL: @test(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br i1 [[CMP5:%.*]], label [[IF_ELSE:%.*]], label [[IF_THEN6:%.*]]
+; CHECK:       if.then6:
+; CHECK-NEXT:    br label [[IF_END11:%.*]]
+; CHECK:       if.else:
+; CHECK-NEXT:    br label [[IF_END11]]
+; CHECK:       if.end11:
+; CHECK-NEXT:    [[XFA_PN:%.*]] = phi ptr [ [[XFA:%.*]], [[IF_ELSE]] ], [ [[XFB:%.*]], [[IF_THEN6]] ]
+; CHECK-NEXT:    [[XF1_SROA_8_0_IN:%.*]] = getelementptr i8, ptr [[XFA_PN]], i64 4
+; CHECK-NEXT:    [[XF1_SROA_8_0:%.*]] = load float, ptr [[XF1_SROA_8_0_IN]], align 4, !dbg [[DBG3:![0-9]+]], !annotation [[META7:![0-9]+]]
+; CHECK-NEXT:    [[CMP_I:%.*]] = fcmp ugt float [[XF1_SROA_8_0]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[IF_END_I:%.*]], label [[IF_THEN_I:%.*]]
+; CHECK:       if.then.i:
+; CHECK-NEXT:    br label [[IF_END_I]]
+; CHECK:       if.end.i:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br i1 %cmp5, label %if.else, label %if.then6
+
+if.then6:                                         ; preds = %entry
+  %xf1.sroa.8.0.xfB.sroa_idx = getelementptr i8, ptr %xfB, i64 4
+  %xf1.sroa.8.0.copyload = load float, ptr %xf1.sroa.8.0.xfB.sroa_idx, align 4, !dbg !3
+  br label %if.end11
+
+if.else:                                          ; preds = %entry
+  %xf1.sroa.8.0.xfA.sroa_idx = getelementptr i8, ptr %xfA, i64 4
+  %xf1.sroa.8.0.copyload494 = load float, ptr %xf1.sroa.8.0.xfA.sroa_idx, align 4, !dbg !7
+  br label %if.end11
+
+if.end11:                                         ; preds = %if.else, %if.then6
+  %xf1.sroa.8.0 = phi float [ %xf1.sroa.8.0.copyload494, %if.else ], [ %xf1.sroa.8.0.copyload, %if.then6 ], !dbg !8, !annotation !9
+  %cmp.i = fcmp ugt float %xf1.sroa.8.0, 0.000000e+00
+  br i1 %cmp.i, label %if.end.i, label %if.then.i
+
+if.then.i:                                        ; preds = %if.end11
+  br label %if.end.i
+
+if.end.i:                                         ; preds = %if.then.i, %if.end11
+  ret void
+}
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "clang version 20.0.0git", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
+!1 = !DIFile(filename: "test.cpp", directory: "/tmp")
+!2 = !{i32 2, !"Debug Info Version", i32 3}
+!3 = !DILocation(line: 63, column: 12, scope: !4)
+!4 = distinct !DISubprogram(name: "operator=", linkageName: "_ZN11btMatrix3x3aSERKS_", scope: null, file: !1, line: 61, type: !5, scopeLine: 62, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !6)
+!5 = distinct !DISubroutineType(types: !6)
+!6 = !{}
+!7 = !DILocation(line: 63, column: 15, scope: !4)
+!8 = !DILocation(line: 64, scope: !4)
+!9 = !{!"Test String"}
+;.
+; CHECK: [[META0:![0-9]+]] = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: [[META1:![0-9]+]], producer: "{{.*}}clang version {{.*}}", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
+; CHECK: [[META1]] = !DIFile(filename: "test.cpp", directory: {{.*}})
+; CHECK: [[DBG3]] = !DILocation(line: 64, scope: [[META4:![0-9]+]])
+; CHECK: [[META4]] = distinct !DISubprogram(name: "operator=", linkageName: "_ZN11btMatrix3x3aSERKS_", scope: null, file: [[META1]], line: 61, type: [[META5:![0-9]+]], scopeLine: 62, spFlags: DISPFlagDefinition, unit: [[META0]], retainedNodes: [[META6:![0-9]+]])
+; CHECK: [[META5]] = distinct !DISubroutineType(types: [[META6]])
+; CHECK: [[META6]] = !{}
+; CHECK: [[META7]] = !{!"Test String"}
+;.

@SLTozer SLTozer force-pushed the instcombine-preserve-debugloc branch from 28e2b39 to aa961ec Compare September 13, 2024 14:10
@pogo59
Copy link
Collaborator

pogo59 commented Sep 13, 2024

However even with this patch, if either of the load instructions has no DILocation, then the new load won't have one either,

If we merge locations on two instructions, but only one has a location, the merged result is no location?

How often do we get a load with no DILocation in practice? It seems like that would be unusual.

@SLTozer
Copy link
Contributor Author

SLTozer commented Sep 13, 2024

If we merge locations on two instructions, but only one has a location, the merged result is no location?

Correct; when we merge locations, we choose the innermost scope that contains both; if either has no location whatsoever (i.e. has no scope), then we can't assign any meaningful location that isn't lying about one of them.

How often do we get a load with no DILocation in practice? It seems like that would be unusual.

Hard to say, but I uploaded another patch earlier today that fixes a case where we generate loads without a location (#108531); there are also all sorts of ways that DILocations can be dropped right now, and although I'm trying to eliminate as many as possible right now it seems safer to be defensive about the possibility. As a small and not-totally-reliable data point, I found ~45 loads generated (at some point) without a DILocation while building bullet. Edit: And also, in at least one place we explicitly drop locations for a load instruction (hoisting a load into the loop header in LICM).

@jmorse
Copy link
Member

jmorse commented Oct 2, 2024

(the "unread" list in my inbox is looong,)

I'm not fully understanding the second paragraph about the bad case: in one mode the DILocation of a PHI is given to the new load instruction, and that's considered bad. In the second mode, there's a risk the new load "...will thus receive the DILocation from the PHI" -- so the same location as before, so there's no regression?

In principle I think this is a good patch that tries that keep source locations as specific as possible, I just want to understand the use case more.

@SLTozer
Copy link
Contributor Author

SLTozer commented Oct 3, 2024

so the same location as before, so there's no regression?

That's correct - there's no regression from this patch, I'm simply noting that it's not a perfect solution. A more perfect solution would be one where we could state explicitly in this case, "Do not copy the original instruction's DebugLoc to the replacement instruction", but that would require making more intrusive changes to InstCombine with probably a large diff footprint, hence I'm considering this a worthwhile partial solution.

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but with the dumbest nit: it looks like the test is checking we get line number 63, instead of line number 64, which is fine. Could we replace the line numbers with something much more obvious, such as line number 1 to line number 55555 perhaps: IMO it makes the test much more readable and the relevant parts stand out.

…w Insts

When InstCombine replaces an old instruction with a new instruction, it
copies !dbg and !annotation metadata from old to new. For some InstCombine
patterns we set a specific DILocation on the new instruction prior to
insertion, however, which more accurately reflects the new instruction.
This more specific DILocation may be overwritten on insertion by a less
appropriate one, resulting in a less correct line mapping. This patch
changes this behaviour to only copy the DILocation from old to new if
the new instruction has no existing DILocation (which will always be the
case for a new instruction unless InstCombine has specifically set one).
@SLTozer SLTozer force-pushed the instcombine-preserve-debugloc branch from aa961ec to 1860431 Compare October 3, 2024 16:08
@SLTozer SLTozer merged commit caa265e into llvm:main Oct 3, 2024
5 of 6 checks passed
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
…sts (llvm#108565)

When InstCombine replaces an old instruction with a new instruction, it
copies !dbg and !annotation metadata from old to new. For some
InstCombine patterns we set a specific DILocation on the new instruction
prior to insertion, however, which more accurately reflects the new
instruction. This more specific DILocation may be overwritten on
insertion by a less appropriate one, resulting in a less correct line
mapping. This patch changes this behaviour to only copy the DILocation
from old to new if the new instruction has no existing DILocation (which
will always be the case for a new instruction unless InstCombine has
specifically set one).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants