Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libspl/backtrace: dump registers in libunwind backtraces #16653

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

robn
Copy link
Member

@robn robn commented Oct 15, 2024

Motivation and Context

In postmortem debugging, often all we have is the binary. Having the registers makes it a least possible to follow the disassembly and try to guess how we got here.

Description

libunwind knows how to fish all the registers out of the stack frame, not just IP. Loop over 'em and spit 'em out.

I've made no effort to do the same in the non-libunwind builds, because libc doesn't typically give you the tools to do it well/at all.

I'm not yet fully satisfied with this. I think I can do better by capturing the register state at the point the assert is tripped, but it will be a lot more invasive. I'll keep working on this, but I think this is still very useful right now.

How Has This Been Tested?

Induced a crash, enjoyed the output:

ASSERT at module/zfs/spa.c:5791:spa_open_common()
"oh no" == 0 (0x7fe60100db1b == 0)
  PID: 870661    COMM: zdb
  TID: 870661    NAME: zdb
Registers:
  RAX: 0x0000000000000037  RDX: 0x0000000000000000  RCX: 0x0000000000000000
  RBX: 0x0000000000000002  RSI: 0x00007ffe7eac2e60  RDI: 0x00007ffe7eac4770
  RBP: 0x00007ffe7eac4f70  RSP: 0x00007ffe7eac46c0   R8: 0x0000000000000000
   R9: 0x0000000000000073  R10: 0x00007ffe7eac4d30  R11: 0x0000000000000021
  R12: 0x00007ffe7eac46f0  R13: 0x0000000000000000  R14: 0x00007ffe7eac4770
  R15: 0x00007ffe7eac4b40  RIP: 0x00007fe600fd8d26
Call trace:
  [0x00007fe600fd8c45] libspl_assertf+0x135 (in /home/robn/code/zfs/.libs/libzpool.so.6.0.0 +0x2ccc45)
  [0x00007fe600e66bc7] spa_open_common+0x57 (in /home/robn/code/zfs/.libs/libzpool.so.6.0.0 +0x15abc7)
  [0x00007fe600e2648f] dsl_pool_hold+0x1f (in /home/robn/code/zfs/.libs/libzpool.so.6.0.0 +0x11a48f)
  [0x00007fe600dd93c7] dmu_objset_hold_flags+0x27 (in /home/robn/code/zfs/.libs/libzpool.so.6.0.0 +0xcd3c7)
  [0x0000562e1ced46cc] open_objset+0x10c (in /home/robn/code/zfs/.libs/zdb +0x126cc)
  [0x0000562e1ceccbfb] main+0x149b (in /home/robn/code/zfs/.libs/zdb +0xabfb)
  [0x00007fe60068724a] __libc_start_call_main+0x7a (in /usr/lib/x86_64-linux-gnu/libc.so.6 +0x2724a)
  [0x00007fe600687305] __libc_start_main_alias_2+0x85 (in /usr/lib/x86_64-linux-gnu/libc.so.6 +0x27305)
  [0x0000562e1cecccf1] _start+0x21 (in /home/robn/code/zfs/.libs/zdb +0xacf1)

I haven't tested on non-amd64 architectures, but the libunwind documentation doesn't say anything about particular facilities not being available. At worst, I'd expect it to show nothing, or maybe only the IP.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

More useful stuff, especially when trying to follow a disassembly.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. We can continue to incrementally improve this as needed.

@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Oct 16, 2024
@robn
Copy link
Member Author

robn commented Oct 16, 2024

FYI, I managed to try this on a RPi4b this morning:

ASSERT at cmd/zdb/zdb.c:693:usage()
"ohno" == 0 (0x5555d71550 == 0)
  PID: 62086     COMM: zdb
  TID: 62086     NAME: zdb
Registers:
   x0: 0x0000007fc44ce5c8   x1: 0x0000007f9c0bfb90   x2: 0x0000007fc44ce510
   x3: 0x0000000000000000   x4: 0x0000007f9b5a89f0   x5: 0x0000000000000073
   x6: 0x00000000fffffffd   x7: 0x00000000ffffffff   x8: 0x0000000000000040
   x9: 0x0000007fc44cf150  x10: 0x0000000000000000  x11: 0x0000000000000000
  x12: 0x000000000000000a  x13: 0x00000000010ec6b3  x14: 0x0000000000001c30
  x15: 0x00000000010ec6b3  ip0: 0x0000007f9bdff590  ip1: 0x0000007f9bf20040
  x18: 0x0000000000001d90  x19: 0x000000000000f286  x20: 0x0000007f9c0753d0
  x21: 0x0000000000000002  x22: 0x0000007f9b6d14e0  x23: 0x0000007fc44d16a1
  x24: 0x00000000000002b5  x25: 0x0000005555d77490  x26: 0x0000007f9c075000
  x27: 0xffffffffffffffff  x28: 0x0000000000000000   fp: 0x0000007fc44ce410
   lr: 0x0000007f9bdff610   sp: 0x0000007fc44ce410   pc: 0x0000007f9bdff610
  ???: 0x0000000000000000  ???: 0x0000000000000000  ???: 0x0000000000000000
  ???: 0x0000000000000000  ???: 0x0000000000000000  ???: 0x0000000000000000
  ???: 0x0000000000000000  ???: 0x0000000000000000  ???: 0x0000000000000000
  ???: 0x0000000000000000  ???: 0x0000000000000000  ???: 0x0000000000000000
  ???: 0x0000000000000000  ???: 0x0000000000000000  ???: 0x0000000000000000
  ???: 0x0000000000000000  ???: 0x0000000000000000  ???: 0x0000000000000000
  ???: 0x0000000000000000  ???: 0x0000000000000000  ???: 0x0000000000000000
  ???: 0x0000000000000000  ???: 0x0000000000000000  ???: 0x0000000000000000
  ???: 0x0000000000000000  ???: 0x0000000000000000  ???: 0x0000000000000000
  ???: 0x0000000000000000  ???: 0x0000000000000000  ???: 0x0000000000000000
  ???: 0x0000000000000000   v0: 0x0000000000000000   v1: 0x0000000000000000
   v2: 0x0000000000000000   v3: 0x0000000000000000   v4: 0x0000000000000000
   v5: 0x0000000000000000   v6: 0x0000000000000000   v7: 0x0000000000000000
   v8: 0x0000000000000000   v9: 0x0000000000000000  v10: 0x0000000000000000
  v11: 0x0000000000000000  v12: 0x0000000000000000  v13: 0x0000000000000000
  v14: 0x0000000000000000  v15: 0x0000000000000000  v16: 0x0000000000000000
  v17: 0x0000000000000000  v18: 0x0000000000000000  v19: 0x0000000000000000
  v20: 0x0000000000000000  v21: 0x0000000000000000  v22: 0x0000000000000000
  v23: 0x0000000000000000  v24: 0x0000000000000000  v25: 0x0000000000000000
  v26: 0x0000000000000000  v27: 0x0000000000000000  v28: 0x0000000000000000
  v29: 0x0000000000000000  v30: 0x0000000000000000  v31: 0x0000000000000000
  fpsr: 0x0000000000000000  fpcr: 0x0000000000000000
Call trace:
  [0x0000007f9bdff4bc] libspl_assertf+0x11c
  [0x0000005555d60f4c] usage+0x4c
  [0x0000005555d59cbc] main+0x5bc
  [0x0000007f9b557740] __libc_init_first+0x80
  [0x0000007f9b557818] __libc_start_main+0x98
  [0x0000005555d5a9b0] _start+0x30

So I suspect I'm holding something slightly wrong, but also I may not have a chance to get back to this for a few days. It's not the worst thing, so probably still nbd to merge but if I get a chance before you do I'll look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants