Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libunwind gives incorrect line number #123

Closed
WSUFan opened this issue May 19, 2024 · 10 comments
Closed

libunwind gives incorrect line number #123

WSUFan opened this issue May 19, 2024 · 10 comments
Labels

Comments

@WSUFan
Copy link
Contributor

WSUFan commented May 19, 2024

I have tried the signal_safe demo and enabled libunwind. However, the line information in the stack trace appears to be incorrect. Here is my test main program:

#include <csignal>
#include <cstddef>
#include <cstring>
#include <signal.h>
#include <sys/wait.h>
#include <unistd.h>

#include <cpptrace/cpptrace.hpp>

// This is just a utility I like, it makes the pipe API more expressive.
struct pipe_t {
  union {
    struct {
      int read_end;
      int write_end;
    };
    int data[2];
  };
};

void do_signal_safe_trace(cpptrace::frame_ptr *buffer, std::size_t size) {
  // Setup pipe and spawn child
  pipe_t input_pipe;
  pipe(input_pipe.data);
  const pid_t pid = fork();
  if (pid == -1) {
    return; /* Some error occurred */
  }
  if (pid == 0) { // child
    dup2(input_pipe.read_end, STDIN_FILENO);
    close(input_pipe.read_end);
    close(input_pipe.write_end);
    execl("tracer", "tracer", nullptr);
    const char *exec_failure_message = "exec(tracer) failed: Make sure "
                                       "the tracer executable is in "
                                       "the current working directory and the "
                                       "binary's permissions are correct.\n";
    write(STDERR_FILENO, exec_failure_message, strlen(exec_failure_message));
    _exit(1);
  }
  // Resolve to safe_object_frames and write those to the pipe
  for (std::size_t i = 0; i < size; i++) {
    cpptrace::safe_object_frame frame;
    cpptrace::get_safe_object_frame(buffer[i], &frame);
    write(input_pipe.write_end, &frame, sizeof(frame));
  }
  close(input_pipe.read_end);
  close(input_pipe.write_end);
  // Wait for child
  waitpid(pid, nullptr, 0);
}

void handler(int signo, siginfo_t *info, void *context) {
  // Print basic message
  const char *message = "SIGSEGV occurred:\n";
  write(STDERR_FILENO, message, strlen(message));
  // Generate trace
  constexpr std::size_t N = 100;
  cpptrace::frame_ptr buffer[N];
  std::size_t count = cpptrace::safe_generate_raw_trace(buffer, N);
  do_signal_safe_trace(buffer, N);
  // Up to you if you want to exit or continue or whatever
  _exit(1);
}

void warmup_cpptrace() {
  // This is done for any dynamic-loading shenanigans
  cpptrace::frame_ptr buffer[10];
  std::size_t count = cpptrace::safe_generate_raw_trace(buffer, 10);
  cpptrace::safe_object_frame frame;
  cpptrace::get_safe_object_frame(buffer[0], &frame);
}
void bad() { // ------> this is line 73
  int *p = nullptr;
  //*p = 10;
  std::vector<int> v;
  // v.at(10);
  *p = 10; // ----------> this is line 78
}

int main() {
  warmup_cpptrace();
  // Setup signal handler
  struct sigaction action = {0};
  action.sa_flags = 0;
  action.sa_sigaction = &handler;
  if (sigaction(SIGSEGV, &action, NULL) == -1) {
    perror("sigaction");
  }
  if (sigaction(SIGABRT, &action, NULL) == -1) {
    perror("sigaction");
  }
  bad();
}

The stacktrace generated with libunwind is

#0 0x7dd924ad851f0000 at /lib/x86_64-linux-gnu/libc.so.6
#1 0x55a13aa3df840000 in bad() at /src/s3-cache/test.cc:73:12 <---------------wrong location, should be 78
#2 0x55a13aa3e0100000 in main at /src/s3-cache/test.cc:93:6
#3 0x7dd924abfd8f0000 at /lib/x86_64-linux-gnu/libc.so.6
#4 0x7dd924abfe3f0000 at /lib/x86_64-linux-gnu/libc.so.6
#5 0x55a13aa3dcb40000 at /src/s3-cache/test

Howerver, the expected result is

#0 0x7dd924ad851f0000 at /lib/x86_64-linux-gnu/libc.so.6
#1 0x55a13aa3df840000 in bad() at /src/s3-cache/test.cc:78:12 
#2 0x55a13aa3e0100000 in main at /src/s3-cache/test.cc:93:6
#3 0x7dd924abfd8f0000 at /lib/x86_64-linux-gnu/libc.so.6
#4 0x7dd924abfe3f0000 at /lib/x86_64-linux-gnu/libc.so.6
#5 0x55a13aa3dcb40000 at /src/s3-cache/test

Any idea why? Is the problem with libunwind?
My compile command was g++ -o test test.cc -g1 -O2 -lcpptrace -ldwarf -lz -lzstd -lunwind -llzma -w -std=c++17 -fpic
If I do O0 then the information was correct. Also, if I used clang, I got nothing

#0  0x74e948ab551f0000 at /lib/x86_64-linux-gnu/libc.so.6
#1  0x74e948a9cd8e0000 at /lib/x86_64-linux-gnu/libc.so.6
#2  0x602bd13da9980000 in _ULx86_64_step at /root/libunwind/src/x86_64/Gstep.c:78:15
#3  0x602bd13560a00000 at /src/s3-cache/test
#4  0x602bd1347cb50000 at /src/s3-cache/test
#5  0x74e948c8f1900000 at /lib/x86_64-linux-gnu/libc.so.6
#6  0x74e9490580400000 at /lib64/ld-linux-x86-64.so.2
#7  0x602bd135601c0000 at /src/s3-cache/test
#8  0x74e94902937c0000 at /lib64/ld-linux-x86-64.so.2
#9  0x74e948a77ad00000 at /lib/x86_64-linux-gnu/libc.so.6
#10 0x74e948a76b280000 at /lib/x86_64-linux-gnu/libc.so.6
#11 0x74e9490297e90000 at /lib64/ld-linux-x86-64.so.2
#12 0x74e948a896500000 at /lib/x86_64-linux-gnu/libc.so.6
#13 0x74e948a896500000 at /lib/x86_64-linux-gnu/libc.so.6
#14 0x602bd13423870000 at /src/s3-cache/test
#15 0x602bd1340d580000 at /src/s3-cache/test
#16 0x74e94902937c0000 at /lib64/ld-linux-x86-64.so.2
#17 0x602bd13e30a20000 at /src/s3-cache/test
#18 0x74e948af16980000 at /lib/x86_64-linux-gnu/libc.so.6
#19 0x74e948c4f4980000 at /lib/x86_64-linux-gnu/libc.so.6
@WSUFan
Copy link
Contributor Author

WSUFan commented May 19, 2024

Of course, with gcc's unwind, the line number was correct

@jeremy-rifkin
Copy link
Owner

Hi, thanks for the report. In general it's not entirely surprising that a line number could be off on -O2, however there does seem to be a small cpptrace bug here :)

In this case gcc detects the UB in bad and optimizes it to https://godbolt.org/z/4Yo75eo1n

bad():
        mov     DWORD PTR ds:0, 0
        ud2

If I try plugging this into my local test setup, modified to print object addresses instead of runtime addresses, I see the same behavior

#0 0x000000000004251f at /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000008573 in bad() at /mnt/c/Users/rifkin/home/projects/cpptrace/test/signal_demo.cpp:26:12
#2 0x0000000000007ba3 in main at /mnt/c/Users/rifkin/home/projects/cpptrace/test/signal_demo.cpp:99:8
#3 0x0000000000029d8f in __libc_start_call_main at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#4 0x0000000000029e3f in __libc_start_main_impl at ./csu/../csu/libc-start.c:392:3
#5 0x0000000000008344 at /mnt/c/Users/rifkin/home/projects/cpptrace/build/signal_demo

(where void bad() { is line 26)

Looking at an objectdump

0000000000008570 <_Z3badv>:
void bad() { // ------> this is line 73
    8570:	f3 0f 1e fa          	endbr64 
  *p = 10; // ----------> this is line 78
    8574:	c7 04 25 00 00 00 00 	mov    DWORD PTR ds:0x0,0x0
    857b:	00 00 00 00 
    857f:	0f 0b                	ud2    
    8581:	66 66 2e 0f 1f 84 00 	data16 cs nop WORD PTR [rax+rax*1+0x0]
    8588:	00 00 00 00 
    858c:	0f 1f 40 00          	nop    DWORD PTR [rax+0x0]

The address returned from libunwind would have corresponded to 0x0000000000008574, the mov DWORD PTR ds:0x0,0x0, but currently the library unconditionally subtracts 1 from the pointer

buffer[i] = to_frame_ptr(pc) - 1;

The reason why is that when you call in x86 the instruction pointer written to the stack frame is for the instruction after the call and subtracting 1 gets the instruction pointer back into the call, but I guess for a signal frame the address is just the pointer of the instruction where things went wrong, which makes sense to me.

As to why clang wouldn't print anything, that's weird but it could be a general dwarf symbol resolution issue I haven't encountered yet. Does it happen when a stack trace is generated outside a signal tracing setup?

@jeremy-rifkin
Copy link
Owner

I think the core of that issue should now be fixed on dev.

When I tested again locally that frame was resolving to the vector construction which seems to be another edge case but this time in how the inlined subroutine is encoded in dwarf

0x00007bf7:   DW_TAG_subprogram
                DW_AT_external	(true)
                DW_AT_name	("bad")
                DW_AT_decl_file	("/mnt/c/Users/rifkin/home/projects/cpptrace/test/signal_demo.cpp")
                DW_AT_decl_line	(26)
                DW_AT_decl_column	(0x06)
                DW_AT_linkage_name	("_Z3badv")
                DW_AT_low_pc	(0x0000000000008570)
                DW_AT_high_pc	(0x0000000000008581)
                DW_AT_frame_base	(DW_OP_call_frame_cfa)
                DW_AT_GNU_all_call_sites	(true)
                DW_AT_sibling	(0x00007d53)

                ...

0x00007c31:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin	(0x00007d53 "_ZNSt6vectorIiSaIiEEC4Ev")
                  DW_AT_entry_pc	(0x0000000000008574)
                  DW_AT_GNU_entry_view	(0x02)
                  DW_AT_low_pc	(0x0000000000008574)
                  DW_AT_high_pc	(0x0000000000008574)
                  DW_AT_call_file	("/mnt/c/Users/rifkin/home/projects/cpptrace/test/signal_demo.cpp")
                  DW_AT_call_line	(29)
                  DW_AT_call_column	(0x14)

jeremy-rifkin added a commit that referenced this issue May 19, 2024
…e gun and not check the high-pc. I think I erroneously added this in c94b03a. Discovered as part of #123.
@jeremy-rifkin jeremy-rifkin added the resolved in next release Resolved in dev label May 19, 2024
@jeremy-rifkin
Copy link
Owner

jeremy-rifkin commented May 19, 2024

I am able to reproduce clang not printing symbols as well, it may be due to

Terminate called after throwing an instance of cpptrace::detail::internal_error: Cpptrace internal error: Cpptrace assertion failed at /mnt/c/Users/rifkin/home/projects/cpptrace/src/symbols/symbols_with_libdwarf.cpp:610: void cpptrace::detail::libdwarf::dwarf_resolver::retrieve_symbol(const die_object &, Dwarf_Addr, Dwarf_Half, stacktrace_frame &, std::vector<stacktrace_frame> &): Vec should be empty?
    ASSERT(vec.size() == 0);

@WSUFan
Copy link
Contributor Author

WSUFan commented May 19, 2024

Thank you for your response. I have checked out the dev branch, and I was able to get the correct line numbers with gcc. However, with Clang, I tried -O0 and it works, -O2 does not yield any information at all.
Anyway, at least gcc works for now👍

@jeremy-rifkin
Copy link
Owner

Thanks for your patience. What version of clang are you using? I thought I had been able to reproduce however I am no longer able to, using clang 18.

@WSUFan
Copy link
Contributor Author

WSUFan commented May 22, 2024

I'm using clang 14 on Ubuntu 22. I also tried clang 17 on Ubuntu 22. Neither of them work.

@jeremy-rifkin
Copy link
Owner

Thanks for the additional information. Unfortunately I have been able to reproduce on either ubuntu 22 or 24 with clang 14, 17, or 18 on debug or release. There is a quirk of optimization surrounding bad()'s undefined behavior on clang that causes the failure path for sigaction to always run printing sigaction: Resource temporarily unavailable or similar, but that should only affect a test program like yours and only in release.

I'm going to go ahead and close this issue for now since the core issues have been solved, however I'm very interested in hunting down any clang issues. If there is any more info you can provide to help reproduce I would be grateful. My guess would be some discrepancy in how the build is being done affecting where debug symbols are being placed.

@jeremy-rifkin
Copy link
Owner

I've released v0.6.0 which includes the fix for this

@WSUFan
Copy link
Contributor Author

WSUFan commented May 30, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants