-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb/aarch64] Fix unwinding when signal interrupts a leaf function #91321
Conversation
A leaf function may not store the link register to stack, but we it can still end up being a non-zero frame if it gets interrupted by a signal. Currently, we were unable to unwind past this function because we could not read the link register value. To make this work, this patch: - changes the function-entry unwind plan to include the `fp|lr = <same>` rules. This in turn necessitated an adjustment in the generic instruction emulation logic to ensure that `lr=[sp-X]` can override the `<same>` rule. - allows the `<same>` rule for pc and lr in all `m_all_registers_available` frames (and not just frame zero). The test verifies that we can unwind in a situation like this, and that the backtrace matches the one we computed before getting a signal.
@llvm/pr-subscribers-lldb Author: Pavel Labath (labath) ChangesA leaf function may not store the link register to stack, but we it can still end up being a non-zero frame if it gets interrupted by a signal. Currently, we were unable to unwind past this function because we could not read the link register value. To make this work, this patch:
The test verifies that we can unwind in a situation like this, and that the backtrace matches the one we computed before getting a signal. Full diff: https://github.com/llvm/llvm-project/pull/91321.diff 6 Files Affected:
diff --git a/lldb/source/Plugins/Instruction/ARM64/EmulateInstructionARM64.cpp b/lldb/source/Plugins/Instruction/ARM64/EmulateInstructionARM64.cpp
index 6ca4fb052457e..62ecac3e0831d 100644
--- a/lldb/source/Plugins/Instruction/ARM64/EmulateInstructionARM64.cpp
+++ b/lldb/source/Plugins/Instruction/ARM64/EmulateInstructionARM64.cpp
@@ -444,6 +444,8 @@ bool EmulateInstructionARM64::CreateFunctionEntryUnwind(
// Our previous Call Frame Address is the stack pointer
row->GetCFAValue().SetIsRegisterPlusOffset(gpr_sp_arm64, 0);
+ row->SetRegisterLocationToSame(gpr_lr_arm64, /*must_replace=*/false);
+ row->SetRegisterLocationToSame(gpr_fp_arm64, /*must_replace=*/false);
unwind_plan.AppendRow(row);
unwind_plan.SetSourceName("EmulateInstructionARM64");
diff --git a/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp b/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
index c4a171ec7d01b..49edd40544e32 100644
--- a/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
+++ b/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
@@ -424,8 +424,6 @@ size_t UnwindAssemblyInstEmulation::WriteMemory(
log->PutString(strm.GetString());
}
- const bool cant_replace = false;
-
switch (context.type) {
default:
case EmulateInstruction::eContextInvalid:
@@ -467,7 +465,7 @@ size_t UnwindAssemblyInstEmulation::WriteMemory(
m_pushed_regs[reg_num] = addr;
const int32_t offset = addr - m_initial_sp;
m_curr_row->SetRegisterLocationToAtCFAPlusOffset(reg_num, offset,
- cant_replace);
+ /*can_replace=*/true);
m_curr_row_modified = true;
}
}
diff --git a/lldb/source/Target/RegisterContextUnwind.cpp b/lldb/source/Target/RegisterContextUnwind.cpp
index 13e101413a477..e2d712cb72eae 100644
--- a/lldb/source/Target/RegisterContextUnwind.cpp
+++ b/lldb/source/Target/RegisterContextUnwind.cpp
@@ -1555,12 +1555,12 @@ RegisterContextUnwind::SavedLocationForRegister(
}
if (unwindplan_regloc.IsSame()) {
- if (!IsFrameZero() &&
+ if (!m_all_registers_available &&
(regnum.GetAsKind(eRegisterKindGeneric) == LLDB_REGNUM_GENERIC_PC ||
regnum.GetAsKind(eRegisterKindGeneric) == LLDB_REGNUM_GENERIC_RA)) {
UnwindLogMsg("register %s (%d) is marked as 'IsSame' - it is a pc or "
- "return address reg on a non-zero frame -- treat as if we "
- "have no information",
+ "return address reg on a frame which does not have all "
+ "registers available -- treat as if we have no information",
regnum.GetName(), regnum.GetAsKind(eRegisterKindLLDB));
return UnwindLLDB::RegisterSearchResult::eRegisterNotFound;
} else {
diff --git a/lldb/test/Shell/Unwind/Inputs/signal-in-leaf-function-aarch64.c b/lldb/test/Shell/Unwind/Inputs/signal-in-leaf-function-aarch64.c
new file mode 100644
index 0000000000000..9a751330623f4
--- /dev/null
+++ b/lldb/test/Shell/Unwind/Inputs/signal-in-leaf-function-aarch64.c
@@ -0,0 +1,15 @@
+#include <signal.h>
+#include <unistd.h>
+
+int __attribute__((naked)) signal_generating_add(int a, int b) {
+ asm("add w0, w1, w0\n\t"
+ "udf #0xdead\n\t"
+ "ret");
+}
+
+void sigill_handler(int) { _exit(0); }
+
+int main() {
+ signal(SIGILL, sigill_handler);
+ return signal_generating_add(42, 47);
+}
diff --git a/lldb/test/Shell/Unwind/signal-in-leaf-function-aarch64.test b/lldb/test/Shell/Unwind/signal-in-leaf-function-aarch64.test
new file mode 100644
index 0000000000000..0580d0cf734ae
--- /dev/null
+++ b/lldb/test/Shell/Unwind/signal-in-leaf-function-aarch64.test
@@ -0,0 +1,24 @@
+# REQUIRES: target-aarch64 && native
+# UNSUPPORTED: system-windows
+
+# RUN: %clang_host %S/Inputs/signal-in-leaf-function-aarch64.c -o %t
+# RUN: %lldb -s %s -o exit %t | FileCheck %s
+
+breakpoint set -n sigill_handler
+# CHECK: Breakpoint 1: where = {{.*}}`sigill_handler
+
+run
+# CHECK: thread #1, {{.*}} stop reason = signal SIGILL
+
+thread backtrace
+# CHECK: frame #0: [[ADD:0x[0-9a-fA-F]*]] {{.*}}`signal_generating_add
+# CHECK: frame #1: [[MAIN:0x[0-9a-fA-F]*]] {{.*}}`main
+
+continue
+# CHECK: thread #1, {{.*}} stop reason = breakpoint 1
+
+thread backtrace
+# CHECK: frame #0: {{.*}}`sigill_handler
+# Unknown number of signal trampoline frames
+# CHECK: frame #{{[0-9]+}}: [[ADD]] {{.*}}`signal_generating_add
+# CHECK: frame #{{[0-9]+}}: [[MAIN]] {{.*}}`main
diff --git a/lldb/unittests/UnwindAssembly/ARM64/TestArm64InstEmulation.cpp b/lldb/unittests/UnwindAssembly/ARM64/TestArm64InstEmulation.cpp
index 80abeb8fae9e5..9303d6f5f3c6e 100644
--- a/lldb/unittests/UnwindAssembly/ARM64/TestArm64InstEmulation.cpp
+++ b/lldb/unittests/UnwindAssembly/ARM64/TestArm64InstEmulation.cpp
@@ -77,7 +77,7 @@ TEST_F(TestArm64InstEmulation, TestSimpleDarwinFunction) {
// UnwindPlan we expect:
- // row[0]: 0: CFA=sp +0 =>
+ // row[0]: 0: CFA=sp +0 => fp= <same> lr= <same>
// row[1]: 4: CFA=sp+16 => fp=[CFA-16] lr=[CFA-8]
// row[2]: 8: CFA=fp+16 => fp=[CFA-16] lr=[CFA-8]
// row[2]: 16: CFA=sp+16 => fp=[CFA-16] lr=[CFA-8]
@@ -88,13 +88,19 @@ TEST_F(TestArm64InstEmulation, TestSimpleDarwinFunction) {
EXPECT_TRUE(engine->GetNonCallSiteUnwindPlanFromAssembly(
sample_range, data, sizeof(data), unwind_plan));
- // CFA=sp +0
+ // CFA=sp +0 => fp= <same> lr= <same>
row_sp = unwind_plan.GetRowForFunctionOffset(0);
EXPECT_EQ(0ull, row_sp->GetOffset());
EXPECT_TRUE(row_sp->GetCFAValue().GetRegisterNumber() == gpr_sp_arm64);
EXPECT_TRUE(row_sp->GetCFAValue().IsRegisterPlusOffset() == true);
EXPECT_EQ(0, row_sp->GetCFAValue().GetOffset());
+ EXPECT_TRUE(row_sp->GetRegisterInfo(gpr_fp_arm64, regloc));
+ EXPECT_TRUE(regloc.IsSame());
+
+ EXPECT_TRUE(row_sp->GetRegisterInfo(gpr_lr_arm64, regloc));
+ EXPECT_TRUE(regloc.IsSame());
+
// CFA=sp+16 => fp=[CFA-16] lr=[CFA-8]
row_sp = unwind_plan.GetRowForFunctionOffset(4);
EXPECT_EQ(4ull, row_sp->GetOffset());
@@ -146,6 +152,12 @@ TEST_F(TestArm64InstEmulation, TestSimpleDarwinFunction) {
EXPECT_TRUE(row_sp->GetCFAValue().GetRegisterNumber() == gpr_sp_arm64);
EXPECT_TRUE(row_sp->GetCFAValue().IsRegisterPlusOffset() == true);
EXPECT_EQ(0, row_sp->GetCFAValue().GetOffset());
+
+ EXPECT_TRUE(row_sp->GetRegisterInfo(gpr_fp_arm64, regloc));
+ EXPECT_TRUE(regloc.IsSame());
+
+ EXPECT_TRUE(row_sp->GetRegisterInfo(gpr_lr_arm64, regloc));
+ EXPECT_TRUE(regloc.IsSame());
}
TEST_F(TestArm64InstEmulation, TestMediumDarwinFunction) {
@@ -381,8 +393,12 @@ TEST_F(TestArm64InstEmulation, TestFramelessThreeEpilogueFunction) {
EXPECT_FALSE(row_sp->GetRegisterInfo(gpr_x26_arm64, regloc));
EXPECT_FALSE(row_sp->GetRegisterInfo(gpr_x27_arm64, regloc));
EXPECT_FALSE(row_sp->GetRegisterInfo(gpr_x28_arm64, regloc));
- EXPECT_FALSE(row_sp->GetRegisterInfo(gpr_fp_arm64, regloc));
- EXPECT_FALSE(row_sp->GetRegisterInfo(gpr_lr_arm64, regloc));
+
+ EXPECT_TRUE(row_sp->GetRegisterInfo(gpr_fp_arm64, regloc));
+ EXPECT_TRUE(regloc.IsSame());
+
+ EXPECT_TRUE(row_sp->GetRegisterInfo(gpr_lr_arm64, regloc));
+ EXPECT_TRUE(regloc.IsSame());
row_sp = unwind_plan.GetRowForFunctionOffset(36);
EXPECT_TRUE(row_sp->GetCFAValue().GetRegisterNumber() == gpr_sp_arm64);
|
@@ -467,7 +465,7 @@ size_t UnwindAssemblyInstEmulation::WriteMemory( | |||
m_pushed_regs[reg_num] = addr; | |||
const int32_t offset = addr - m_initial_sp; | |||
m_curr_row->SetRegisterLocationToAtCFAPlusOffset(reg_num, offset, | |||
cant_replace); | |||
/*can_replace=*/true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be narrowed down so that it only overwrites the <same>
rules, but I'm not sure it's necessary given that lines 464&465 ensure that the register can get pushed only once.
@@ -1555,12 +1555,12 @@ RegisterContextUnwind::SavedLocationForRegister( | |||
} | |||
|
|||
if (unwindplan_regloc.IsSame()) { | |||
if (!IsFrameZero() && | |||
if (!m_all_registers_available && | |||
(regnum.GetAsKind(eRegisterKindGeneric) == LLDB_REGNUM_GENERIC_PC || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pc=<same>
is probably still only valid for real frame zero, so we could make the m_all_registers_available
check lr
-only.
.. or drop the lr check entirely, since some non-ABI-respecting functions could actually preserve the value of lr even if they are not leaf functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, I know there are other codepaths that handle this correctly, where we can backtrace out of a frameless function that faults into a trap handler and we have the entire register state available in the trap handler.
Looking at this, I'm a little uncertain why we have m_behaves_like_zeroth_frame
and m_all_registers_available
which are both set to true under the same conditions, and then we sometimes use m_behaves_like_zeroth_frame
, sometimes m_all_registers_available
, and sometimes call RegisterContextUnwind::BehavesLikeZerothFrame
which has a redundant check if the frame number is 0, sigh. Looks like some accumulated nonsense that you shouldn't have to deal with in this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks for the review. I see that the new test is failing on green dragon https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/as-lldb-cmake/3596/testReport/lldb-shell/Unwind/signal_in_leaf_function_aarch64_test/, and from the looks of it, it is failing because it is failing to unwind in the test scenario (I don't have aarch64 mac hardware to confirm). What do you want me to do about this? Disable the test on mac and file a bug? |
I think that would be a good start. Also tagging @jasonmolenda for advice. |
Will debug this today. |
Ah, so the problem is,
The bad instruction in When I've done these kinds of test cases in the past, I usually add a signal handler and then send the signal to the inferior, e.g.
(the test explicitly checks the stack for the function names that should appear above the signal handler) |
I have fixed/worked around the mach exception issue in a followup commit with a |
Oh, that's a clever idea, I forgot about that setting. If you add
|
Ah, I misunderstood what the nature of the failure was. I tried running the shell test, and it's failing for different reasons. I almost never touch shell tests, I find them really hard to debug so I'm not sure what the problem is. If I run it by hand,
which all looks good to me, but it the shell test fails with
|
maybe the shell test is building without debug info, I am surprised to see assembly there. If I build it like that and run it by hand,
|
@labath this seems to have broken lldb-aarch64-windows bot with TestInterruptBacktrace.py failing on |
The test stops the process before and after injecting a signal and verifies that the relevant parts of backtrace (i.e., calls to
^ This is the most important part of the message. Btw, at the top of the failure message lit will print the commands executed as a part of this test. For most tests (this one included), you can just yank the lldb command out of there and run it in a terminal to debug. |
…nction (#91321)" This reverts commit fd1bd53. TestInterruptBacktrace was broken on AArch64/Windows as a result of this change. See lldb-aarch64-windows buildbot here: https://lab.llvm.org/buildbot/#/builders/219/builds/11261
LLDB became unresponsive on windows when a |
Could you please give me some more information about the problem? I don't have access to a windows arm machine, and the failure message doesn't give me much to go on. |
Ah, thanks, I missed that! Let me debug it and comment further |
Ah, so the problem here is that we're missing the eh_frame instructions for _sigtramp on arm64 with macOS 14. |
I will try to debug and get back with more information. If you need specific information or logs please let me know. |
(and it turns out the reason we don't have eh_frame is because _sigtramp on arm64 is written in C, and I'm not sure how I'm going to track which callee-saved register the argument is copied into, so this is definitely not something I can get fixed quickly.) |
Thanks. It's hard to say what exactly will help, because I don't have any idea what could be going wrong. However, since this is an unwinding problem, the I know this is quite a lot, but I'm kinda shooting in the dark here. :) |
Makes sense. I'll leave the test XFAILed then (once the windows issue is sorted out). Thanks for looking into this, I'm glad the test proved to be useful. |
…unction (llvm#91321)" This reapplies fd1bd53, which was reverted due to a test failure on aarch64/windows. The failure was caused by a combination of several factors: - clang targeting aarch64-windows (unlike msvc, and unlike clang targeting other aarch64 platforms) defaults to -fomit-frame-pointers - lldb's code for looking up register values for `<same>` unwind rules is recursive - the test binary creates a very long chain of fp-less function frames (it manages to fit about 22k frames before it blows its stack) Together, these things have caused lldb to recreate the same deep recursion when unwinding through this, and blow its own stack as well. Since lldb frames are larger, about 4k frames like this was sufficient to trigger the stack overflow. This version of the patch works around this problem by increasing the frame size of the test binary, thereby causing it to blow its stack sooner. This doesn't fix the issue -- the same problem can occur with a real binary -- but it's not very likely, as it requires an infinite recursion in a simple (so it doesn't use the frame pointer) function with a very small frame (so you can fit a lot of them on the stack). A more principled fix would be to make lldb's lookup code non-recursive, but I believe that's out of scope for this patch. The original patch description follows: A leaf function may not store the link register to stack, but we it can still end up being a non-zero frame if it gets interrupted by a signal. Currently, we were unable to unwind past this function because we could not read the link register value. To make this work, this patch: - changes the function-entry unwind plan to include the `fp|lr = <same>` rules. This in turn necessitated an adjustment in the generic instruction emulation logic to ensure that `lr=[sp-X]` can override the `<same>` rule. - allows the `<same>` rule for pc and lr in all `m_all_registers_available` frames (and not just frame zero). The test verifies that we can unwind in a situation like this, and that the backtrace matches the one we computed before getting a signal.
So, it took some effort, but I managed to reproduce the problem and figure out what's going on. You can find my analysis on #92503 |
#92503) …unction (#91321)" This reapplies fd1bd53, which was reverted due to a test failure on aarch64/windows. The failure was caused by a combination of several factors: - clang targeting aarch64-windows (unlike msvc, and unlike clang targeting other aarch64 platforms) defaults to -fomit-frame-pointers - lldb's code for looking up register values for `<same>` unwind rules is recursive - the test binary creates a very long chain of fp-less function frames (it manages to fit about 22k frames before it blows its stack) Together, these things have caused lldb to recreate the same deep recursion when unwinding through this, and blow its own stack as well. Since lldb frames are larger, about 4k frames like this was sufficient to trigger the stack overflow. This version of the patch works around this problem by increasing the frame size of the test binary, thereby causing it to blow its stack sooner. This doesn't fix the issue -- the same problem can occur with a real binary -- but it's not very likely, as it requires an infinite recursion in a simple (so it doesn't use the frame pointer) function with a very small frame (so you can fit a lot of them on the stack). A more principled fix would be to make lldb's lookup code non-recursive, but I believe that's out of scope for this patch. The original patch description follows: A leaf function may not store the link register to stack, but we it can still end up being a non-zero frame if it gets interrupted by a signal. Currently, we were unable to unwind past this function because we could not read the link register value. To make this work, this patch: - changes the function-entry unwind plan to include the `fp|lr = <same>` rules. This in turn necessitated an adjustment in the generic instruction emulation logic to ensure that `lr=[sp-X]` can override the `<same>` rule. - allows the `<same>` rule for pc and lr in all `m_all_registers_available` frames (and not just frame zero). The test verifies that we can unwind in a situation like this, and that the backtrace matches the one we computed before getting a signal.
llvm#92503) …unction (llvm#91321)" This reapplies fd1bd53, which was reverted due to a test failure on aarch64/windows. The failure was caused by a combination of several factors: - clang targeting aarch64-windows (unlike msvc, and unlike clang targeting other aarch64 platforms) defaults to -fomit-frame-pointers - lldb's code for looking up register values for `<same>` unwind rules is recursive - the test binary creates a very long chain of fp-less function frames (it manages to fit about 22k frames before it blows its stack) Together, these things have caused lldb to recreate the same deep recursion when unwinding through this, and blow its own stack as well. Since lldb frames are larger, about 4k frames like this was sufficient to trigger the stack overflow. This version of the patch works around this problem by increasing the frame size of the test binary, thereby causing it to blow its stack sooner. This doesn't fix the issue -- the same problem can occur with a real binary -- but it's not very likely, as it requires an infinite recursion in a simple (so it doesn't use the frame pointer) function with a very small frame (so you can fit a lot of them on the stack). A more principled fix would be to make lldb's lookup code non-recursive, but I believe that's out of scope for this patch. The original patch description follows: A leaf function may not store the link register to stack, but we it can still end up being a non-zero frame if it gets interrupted by a signal. Currently, we were unable to unwind past this function because we could not read the link register value. To make this work, this patch: - changes the function-entry unwind plan to include the `fp|lr = <same>` rules. This in turn necessitated an adjustment in the generic instruction emulation logic to ensure that `lr=[sp-X]` can override the `<same>` rule. - allows the `<same>` rule for pc and lr in all `m_all_registers_available` frames (and not just frame zero). The test verifies that we can unwind in a situation like this, and that the backtrace matches the one we computed before getting a signal. (cherry picked from commit bbd54e0)
A leaf function may not store the link register to stack, but we it can still end up being a non-zero frame if it gets interrupted by a signal. Currently, we were unable to unwind past this function because we could not read the link register value.
To make this work, this patch:
fp|lr = <same>
rules. This in turn necessitated an adjustment in the generic instruction emulation logic to ensure thatlr=[sp-X]
can override the<same>
rule.<same>
rule for pc and lr in allm_all_registers_available
frames (and not just frame zero).The test verifies that we can unwind in a situation like this, and that the backtrace matches the one we computed before getting a signal.