-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detailed debug-info (DWARF) support in new backends (initially x64). #2565
Conversation
@yurydelendik, I think I will need your help on this -- it appears that you wrote all of the debug support initially? Two questions:
Any help from anyone who is passionate about debuginfo would be appreciated -- this is one of the last pieces needed to enable the switch to the new backend by default in wasmtime, and I've dumped a bunch of time into this and need to context-switch for a little bit :-) |
0a103d7
to
f0c68e2
Compare
They are marked as "ignored" to not run them on developer's computer: the tests require lldb/gdb/llvm-dwarfdump install. These tests are run on the CI. See wasmtime/.github/workflows/main.yml Lines 287 to 293 in cacebfb
I'll try to be useful, yes.
To keep vmctx register/value valid as long as possible is a desired functionality, as well as knowing it during stack unwinding to provide Instance information. We talked with @sunfishcode about having fixed location for vmctx some time back. If vmctx is not available, there is no way to get a variable value stored in the instances memory and shadow stack. The variables are marked as optimized out in this case. Which is acceptable solution from debugging of optimized code point of view. Of course we have to minimize such cases to make live of developers better. |
Makes sense; I'll think about ways to do this. Perhaps the simplest would be to just add a hack that emits a A question: I was struggling to work out the invariants in the code that converts range-associated data. What do There's something a little more broken going on in this PR right now, as the vmctx value is dead at a certain point, but DWARF info related to its former register is being generated. So I suspect that I'm misunderstanding some of the invariants related to ranges. Thanks for the help! |
f0c68e2
to
1572492
Compare
@yurydelendik with the vmctx-always-alive hack, this seems to pass the gdb DWARF test locally on the new backend now (and I'm watching CI); thanks! Question above about ranges still stands; I want to make sure I got the conversions right. Otherwise, though, I think this is ready for review. |
It would be nice to have a more general way of keeping a value alive somewhere for debuginfo purposes. |
Yes, I agree. I actually thought of a really simple/ugly (but maybe more robust) way of doing this -- allocate slots on the stack for all This would horribly pessimize code performance, though maybe not too much more than non-optimized debug code already is; but it would be much simpler and more likely to be correct in all sorts of edge cases. I see this as sort of in the same spirit as the idea proposed in #2459 for reftypes tracing: in both cases we instrument the CLIF to provide needed functionality so that the core compiler remains simpler and easier to verify. It is a little unconventional, though, so perhaps for another time. |
1572492
to
c1e0be1
Compare
Sure, I can review. |
This PR propagates "value labels" all the way from CLIF to DWARF metadata on the emitted machine code. The key idea is as follows: - Translate value-label metadata on the input into "value_label" pseudo-instructions when lowering into VCode. These pseudo-instructions take a register as input, denote a value label, and semantically are like a "move into value label" -- i.e., they update the current value (as seen by debugging tools) of the given local. These pseudo-instructions emit no machine code. - Perform a dataflow analysis *at the machine-code level*, tracking value-labels that propagate into registers and into [SP+constant] stack storage. This is a forward dataflow fixpoint analysis where each storage location can contain a *set* of value labels, and each value label can reside in a *set* of storage locations. (Meet function is pairwise intersection by storage location.) This analysis traces value labels symbolically through loads and stores and reg-to-reg moves, so it will naturally handle spills and reloads without knowing anything special about them. - When this analysis converges, we have, at each machine-code offset, a mapping from value labels to some number of storage locations; for each offset for each label, we choose the best location (prefer registers). Note that we can choose any location, as the symbolic dataflow analysis is sound and guarantees that the value at the value_label instruction propagates to all of the named locations. - Then we can convert this mapping into a format that the DWARF generation code (wasmtime's debug crate) can use. This PR also adds the new-backend variant to the gdb tests on CI.
59eb476
to
7e12abc
Compare
Thanks! I went over this again quickly and noticed a few typos, and decided to rebase while I was here. Also note there's a ridealong change to remove an outdated doc-comment about the backend pipeline (I can split that out if you'd like). |
@@ -207,6 +225,31 @@ fn append_memory_deref( | |||
return Ok(false); | |||
} | |||
} | |||
LabelValueLoc::Reg(r) => { | |||
let reg = isa.map_regalloc_reg_to_dwarf(r)? as u8; | |||
writer.write_u8(gimli::constants::DW_OP_breg0.0 + reg)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExpressionWriter
has write_op_breg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, and I really appreciated the top level overview comment but, as a reader, I would have appreciated some more function/method/struct level comments so I didn't have to go searching through the code to figure out how a struct member was used and things like that. A few inline comments below. r=me with them addressed and/or questions answered, etc.
Thanks!
I had missed that the CI config didn't actually run the tests, because (I think) `matrix.target` is not set by default (?). All of our hosts are native x86-64, so we can just gate on OS (Ubuntu) instead. I also discovered that while I had been testing with the gdb tests locally, when *all* `debug::*` tests are run, there are two that do not pass on the new backend because of specific differences in compiled code. One is a value-lifetime issue (the value is "optimized out" at the point the breakpoint is set) and the other has to do with basic-block order (it is trying to match against hardcoded machine-code offsets which have changed).
@fitzgen I addressed your comments, but while double-checking the CI results one last time, realized the debug tests weren't actually running on CI (GitHub Actions config issue). On fixing that, discovered that the tests I was running locally were a subset of all debug tests, and two of the others were failing. These failures were due to (IMHO) not-super-important differences in codegen, so I added directives to ignore; but let me know if this is still r+ or if you'd like me to dig deeper. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This PR propagates "value labels" all the way from CLIF to DWARF
metadata on the emitted machine code. The key idea is as follows:
Translate value-label metadata on the input into "value_label"
pseudo-instructions when lowering into VCode. These
pseudo-instructions take a register as input, denote a value label,
and semantically are like a "move into value label" -- i.e., they
update the current value (as seen by debugging tools) of the given
local. These pseudo-instructions emit no machine code.
Perform a dataflow analysis at the machine-code level, tracking
value-labels that propagate into registers and into [SP+constant]
stack storage. This is a forward dataflow fixpoint analysis where each
storage location can contain a set of value labels, and each value
label can reside in a set of storage locations. (Meet function is
pairwise intersection by storage location.)
This analysis traces value labels symbolically through loads and
stores and reg-to-reg moves, so it will naturally handle spills and
reloads without knowing anything special about them.
When this analysis converges, we have, at each machine-code offset, a
mapping from value labels to some number of storage locations; for
each offset for each label, we choose the best location (prefer
registers). Note that we can choose any location, as the symbolic
dataflow analysis is sound and guarantees that the value at the
value_label instruction propagates to all of the named locations.
Then we can convert this mapping into a format that the DWARF
generation code (wasmtime's debug crate) can use.
This PR also adds the new-backend variant to the gdb tests on CI.