-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debuginfo-based panic locations #2154
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
- Feature Name: debuginfo_based_panic | ||
- Start Date: 2017-09-15 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Instead of generating location strings for panic information, read it at runtime from debug information. To make this work in release builds, introduce symbol files (external debug symbols) which are built in release mode by default. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
By convention, recoverable errors are usually handled in Rust with the `Result` type while unrecoverable errors (usually bugs) cause a panic. When that happens, it is thus often necessary to figure out exactly what happened so the cause can be fixed. | ||
|
||
Today, `panic!()` attempts to help with that by printing a message along with the source code location where it happened: | ||
|
||
``` | ||
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /checkout/src/libcore/option.rs:335 | ||
note: Run with `RUST_BACKTRACE=1` for a backtrace. | ||
``` | ||
|
||
However, this is not very useful. A panic's root cause is rarely located within the panicking function - most commonly, it happened because a function's *preconditions were violated*. | ||
|
||
The textbook example for this is of course `unwrap()` (on both `Option` and `Result`) as well as similar interfaces like `Index::index`/`IndexMut::index_mut`. The above example refers to the location in `option.rs` where the panic happened, but the `unwrap()` implementation is just doing its job here - it was really caused by a bug that caused the user application to believe an `Option` has to be `Some` in a case where it's not. When debugging this, the given location information is utterly useless - the `unwrap()` function panicked so of course `panic!()` was called from there! | ||
|
||
When dealing with issues like this, one usually enables `RUST_BACKTRACE=1`. This generates a full stack trace with information about where the bug may have originated. When even that isn't enough, a debugger can provide valuable insights into what's really going on. | ||
|
||
The main purpose of panic location information as well as backtraces and debuggers is **debugging**: Finding and fixing the bug. Debugging code written in a compiled systems language is a hard problem but Rust is not unique here - the C ecosystem was facing exactly the same problem and debug info (in a platform-specific format like DWARF or PDB) exists to solve it. Of course, Rust is already leveraging this to support both backtraces and debuggers. But from this perspective it certainly seems very redundant to compile dedicated panic location strings when this information already exists in a binary's debug sections. | ||
|
||
Furthermore, there is no automated solution right now that could remove panic location strings from binaries. This is a problem for developers of closed-source software as it can, in some circumstances, faciliate reverse engineering. | ||
|
||
Because including debug info in release binaries is not viable for various reasons (most notably debug info is often *larger* than code), many platforms already store it in a separate location (`*.pdb` on Windows, `*.dSYM` on MacOS). On Linux and several other unixoid platforms, this approach is still quite new and uncommon but nonetheless perfectly possible. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
* The Rust compiler simply continues to generate platform-compatible debug information. It gains a new switch that causes it to generate *external* debug info. What this means is that the binary itself is stripped - all debug info is written to a separate **symbol file** (in a platform-specific format). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this switch do on platforms that don't support external debug info? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As specified in the referencel-level explanation, this is not a flag but an additional value for the existing AFAIK all supported platforms are either DWARF or PDB. As DWARF can do both but PDB is always external, this basically means that |
||
* Cargo's release profile enables external debug info. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is in fact one of the most common roadblocks people have on |
||
* Panic messages now always include a backtrace (instead of the old location string). | ||
* This means that the root cause of panics from `unwrap()` and similar interfaces are now much clearer. | ||
* It's simple and obvious as most modern languages print backtraces for unhandled errors. | ||
* The backtrace is generated from debug information. Without debug symbols, memory addresses are printed instead. The standard `addr2line` utility can be used to manually obtain location information once debug info is available. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would the equivalent utility be for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The cross-platform equivalent of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do addr2line and llvm-symbolizer work on position independent executables? I'm not sure how it would be able to based off of just a the binary and an address but I could be missing something. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, perhaps I should have specified this: Naturally, these symbolization tools always work with addresses into the binary, not runtime memory addresses. You simply query the base/load address of each loaded object and subtract (my proof of concept assumes one single statically-linked binaries, for anything more complex you would of course have multiple baseptrs and figure out which one to subtract) that from your addresses before printing them. You're right, without this step randomized addresses would be meaningless. |
||
* User perspectives: | ||
* You're developing an application. You build in the `dev` profile. You have debug symbols and thus complete backtraces for every panic. | ||
* You're testing your application (think QA). It's a `release` build. You have debug symbols and thus complete backtraces for every panic. | ||
* You ship your application to an end user. You ship only the binary, not the symbol file. The end user runs into a panic and receives a raw stack trace (of memory addresses). The end user now reports the bug to you. Using `addr2line` you obtain a complete backtrace. | ||
* Alternatively, you ship the symbol file to the user. The user can now debug your application without recompiling and - obviously - gets complete backtraces for every panic. | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
**Symbol files** already exist (and work) on Windows and MacOS. On Linux, it's as simple as writing the main binary to one ELF image and all the debug sections to another. The existing `-C debuginfo` compiler options gains a third level "3 = full debug info in an external symbol file". The cargo release profile sets `debuginfo=3` (instead of `debuginfo=0`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The location of the debuginfo is orthogonal to its detail. Why couldn't I have out-of-binary level 1 debuginfo? It seems like it would make more sense for this to be a separate flag. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rationale here: #2154 (comment) I'm not very happy with this either, I don't care about it. The only reason this isn't a new compiler flag is that I was too lazy to write up the whole "new flag, -Z/-C, stabilization, Cargo.toml" boilerplate. |
||
|
||
The default panic handler no longer uses the location information supplied by `panic!()`. Instead, it always displays a backtrace. If there is no debug info in the binary itself, it looks for symbol files and loads those (this already works on Windows and on MacOS since [rust-lang/rust#44251](https://github.com/rust-lang/rust/pull/44251), still missing for Linux though). | ||
|
||
It does so through an interface in `std::panic` by first obtaining a raw stack trace (basically `Iterator<usize>`) and then requesting symbols for those stack frames. This interface can be very useful for projects like error-chain as well, especially since *obtaining* the raw stack trace is very *cheap*. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
The single major drawback here is implementation complexity. Walking the stack is usually quite simple but parsing debug info to obtain symbols is not. On the other hand, backtraces already work on most platforms and are even essential to debugging for many users, so most of this is either already solved or worth the effort on its own. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Debuginfo is commonly 3-5x the size of the main binary. Relying on that much stuff to get line information seems like a drawback to me. For reference, we build a service at work with level 1 debuginfo to have symbolicated stack traces. The binary is 53 MiB, of which 41 MiB is debuginfo. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the other hand debuginfo can easily be left out, whereas line information hardcoded into the binary is always there no matter what you do wasting space. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How much space do those hardcoded lines take up? Has anyone proposed a flag to turn off There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This RFC proposes to replace the I'm aware that debuginfo can be huge, I just don't see why that's a problem:
So what does change?
|
||
|
||
# Rationale and Alternatives | ||
[alternatives]: #alternatives | ||
|
||
This is, quite obviously, a counter-proposal to [#2091](https://github.com/rust-lang/rfcs/pull/2091#issuecomment-329148747): Implicit caller location. | ||
|
||
Where #2091 is specifically geared towards small helper functions that always want to blame panics on their caller, this proposal focuses on a much broader class of problems. In reality, the lines are blurred: Some functions obviously are small helpers that should be annotated with `#[blame_caller]` and others obviously aren't. But there definitely is a gap where it's not entirely clear where the error should be reported. The approach's big weakness is that it focuses entirely on finding a *single stack frame that takes all the blame*. In the exemplary case of `unwrap()`, printing the location of the call to `unwrap()` instead of the code inside the implementation that ends up panicking is of course so much more useful - but it still doesn't come close to the picture you get from a full backtrace. | ||
|
||
While #2091 does acknowledge that relying on debug info is an alternative, its current draft argues: | ||
|
||
>Programmatic access to the stack backtrace is often used in interpreted or runtime-heavy languages like Python and Java. However, the stack backtrace is not suitable as the only solution for systems languages like Rust because optimization often collapses multiple levels of function calls. In some embedded systems, the backtrace may even be unavailable! | ||
|
||
It is of course possible for the compiler to annotate code from inlined functions with their original source information. In fact, rustc is already capable of doing this today! Missing inlined frames are a bug in the backtracing implementation, not an argument against backtracing in general. | ||
|
||
>The debug information is usually not provided in release mode. | ||
|
||
Not relevant either as this RFC changes that (with no downsides except maybe slightly longer compile times). | ||
|
||
>Even if this is generated, the debug symbols are generally not distributed to end-users, which means the error reports will only contain numerical addresses. This can be seen as a benefit, as the implementation detail won't be exposed, but how to submit/analyze an error report would be out-of-scope for this RFC. | ||
|
||
Again, the C ecosystem has solved this exact same issue many years ago: | ||
|
||
``` | ||
$ addr2line -Cipf -e target/release/tyrion -a 0x7e9e 0x7fe3 0x6ec5 0x20381 0x6b54 0x6d84 0x50b0b | ||
0x0000000000007e9e: backtrace::backtrace::libunwind::trace at /home/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/backtrace-0.3.2/src/backtrace/libunwind.rs:53 | ||
(inlined by) backtrace::backtrace::trace<closure> at /home/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/backtrace-0.3.2/src/backtrace/mod.rs:42 | ||
0x0000000000007fe3: backtrace::capture::{{impl}}::new at /home/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/backtrace-0.3.2/src/capture.rs:64 | ||
0x0000000000006ec5: tyrion::main::{{closure}} at /overtime/git/tyrion/src/main.rs:41 | ||
0x0000000000020381: ?? ??:0 | ||
0x0000000000006b54: std::panicking::begin_panic<&str> at /build/rust/src/rustc-1.19.0-src/src/libstd/panicking.rs:511 | ||
0x0000000000006d84: tyrion::test at /overtime/git/tyrion/src/main.rs:12 | ||
(inlined by) tyrion::main at /overtime/git/tyrion/src/main.rs:81 | ||
0x0000000000050b0b: malloc_usable_size at ??:? | ||
``` | ||
|
||
It **is** a benefit, no matter how you look at it. | ||
|
||
>There are multiple issues preventing us from relying on debug info nowadays. | ||
> | ||
>[...] | ||
> | ||
>These signal that debuginfo support is not reliable enough if we want to solve the unwrap/expect issue now. | ||
|
||
Implementation complexity (which includes fixing critical issues) is obviously this RFC's big drawback. But #2091 is very complex as well! Programmers would have to remember to annotate every little helper function *everywhere* with `#[blame_caller]`, at the price of sacrificing debuggability *inside* those functions. All of that - for what? Again, all of these practical conerns are directly caused by the focus on printing just *one single stack frame* instead of a trace. | ||
|
||
Dedicated panic location strings are not a zero-cost abstraction in Rust's traditional sense - they redundantly reproduce information that already has a much better home somewhere else: In the debug sections. | ||
|
||
|
||
# Unresolved questions | ||
[unresolved]: #unresolved-questions | ||
|
||
* What exactly should the new API in `std::panic` look like? | ||
* What about the `no_std` world? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this RFC deal with this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is that if the standard panic handler just discards
file!()
andline!()
, I would expect the optimizer to remove them (see #2070 (comment)). This is nice because it avoids having to deal with reworking all of those panic internals (although doing that is probably a cleaner and better solution).Just like the design of the compiler flag, this is a detail that I wanted to gloss over until later rounds of feedback (it already took me a while to find the time to write up this first draft, no point in delaying it further for things that don't really matter when I'm just looking for feedback on the basic idea).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is
std::panic::set_hook
going to work?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoting from
PanicInfo::location()
:Payload still exists of course. And location can be resolved from debuginfo or just return
None
when that fails. Adding a method toPanicInfo
that returns the entire stack trace seems like a good idea too.