-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post-mortem debugging support inside V8/Node.js #227
Comments
I like the approach if it does not introduce any overhead. |
Ok, here are the results for the first prototype: Binary size growth was ~6% (~2.5 Mb), less than I was expecting, which is good.
Benchmarks
Not sure if we can optimize this solution more than it already is (the prototype is using V8_UNLIKELY and V8_INLINE to check if postmortem mode is enabled with as few instructions as possible). If anyone is interested, the code is here: nodejs/node@4ce744a...mmarchini:v8-postmortem-analyzer-api I want to play with other ideas next week and I'm open to suggestions :) |
In the document, @hashseed recommended to use a config flag. |
Yeah, but then we need a custom build just to execute postmortem tools built using this API (which would increase the barrier to use these tools). I'll try @hashseed suggestion to load the heap spaces on the same addresses they were allocated before, but other suggestions would be appreciated as well. |
@mmarchini How would that new API be consumed by debugging tools? I don't think I understand all the subtleties of that. Do you have code somewhere (maybe as changes to llnode?) that does that? |
This is exciting! @bcantrill and I spoke with several members of the V8 team years ago about built-in VM support for postmortem debugging, but it didn't seem like there was enough interest to build and maintain that support. One thing I don't fully understand: in what context does this API execute? Does a debugger (e.g., lldb) somehow load a copy of v8 and execute API calls into it? (If so, there are lots of follow-up questions, but I'm not sure how else this is intended to work.) |
@misterdjules @davepacheco the code I've been using to test the API is a Node.js native module which uses the LLDB API to load a core dump and read it. The current proposal wouldn't run as an LLDB plugin, but we would be able to implement a REPL using JavaScript, and it would lower the barrier for contributions in the front-end as well as provide a JavaScript API by default for automated post-mortem analysis. Here's an example on how to use the API: https://gist.github.com/mmarchini/647720e08468b8b96a7922f79c20c87e Just to emphasize: I'm open to any suggestions which would help us maintain post-mortem tools in the long term. If we come up with something that works and is entirely different from the proposed solution, I'll be happy to implement it :) |
I tried @hashseed's suggestion of loading the dump content into memory at the same addresses, and it works nicely once everything is initialized properly. It requires only minor changes in V8, and introduces no runtime overhead. I wrote a design document describing the approach; feedback is welcome! |
This looks amazing @sethbrenith! Thank you for working on this! I will try to implement a lldb extension using this API next week. |
@sethbrenith very impressive! Looks like we currently have two options: Both approaches considered, (A) adds a lot more API surface, and is a bigger cross-cutting change. The V8 team is currently working on pointer compression and possibly defining object layout in a DSL. Both would cause conflicts with (A). It's probably no surprise that I like (B) better. It also offers value to V8 and Chromium, which is why the chances that it is continuously maintained is much higher. |
@sethbrenith looks like you've made some good progress. I'll agree with @hashseed that something that has the best likelihood of ongoing maintenance is important. One question I have is around the second version of V8 that needs to be loaded. That sounds like something that may be some effort to manage in that we'd have to make them available and find the matching version for a given core dump. @hashseed how much overhead would it be to include the extra code needed for this approach so that it's in the core? I understand it might not be possible to get just that with the current flags/options but wonder it it might be something we'd want to do if the subset we need has a small enough overhead. |
Thanks so much for sharing this @sethbrenith! As I said, I'm really excited about the possibility of building postmortem debugging into V8 itself. (The only reason we went the way we did in the first place with mdb_v8 was that we felt we had no other option at the time.) I think I don't fully understand the approach here, so I apologize in advance if these questions or comments aren't applicable. One downside of this approach is that by calling into V8 to do the work, there are classes of problems (mostly in V8 itself) that likely cannot be debugged using this approach. Namely, if V8 itself crashes (e.g., because some internal data structure is invalid, because some pointer is NULL or something), you can use mdb_v8 to debug that problem. On the other hand, if you load another copy of V8 and use it to inspect its state, it's seems likely the debugger would crash for the same reason as the original program. A more important case of is may be where the original program crashed because it ran out of memory. Would it not be likely that the debugger would also run out of memory when trying to, say, print objects? This case seems pretty important, though maybe I've misunderstood or there's some way to work around it. Aside from running out of memory, the next more common case might be debugging issues involving add-ons, which might also corrupt C/C++-level structures. Relatedly, how does memory management work in the postmortem host? Does V8 continue allocating from its heap? Does it run garbage collection?
It seems like the main advantage of this approach is to eliminate the brittleness of putting all the implementation details in the debugger. Are these problems easier? |
Thanks everybody for the feedback, I really appreciate it. In response to @davepacheco's questions:
Yes, this is a risk. I don't know how often this problem would arise because I haven't done extensive testing on real-world crash dumps, but Jaroslav mentioned in a comment on the document that it is sometimes a problem with the existing gdb tools. I hope that this approach would still be able to provide some value in that case (for example, maybe you can't print a single corrupt object, but you can print the stack and some related objects that help you understand how you got to the current execution point).
I haven't actually tried it, but I'll speculate anyway. If the dump was from a 32-bit process, then it might have run up against the total addressable memory limit, so moving its content into a new process isn't going to help anything. Perhaps the postmortem host could choose some pages to not map so it has enough room for its own execution. In a 64-bit process, on the other hand, the address space is huge so any OOM crash was probably the process running out of physical memory. In that case, the postmortem host could map the memory from a file using copy-on-write to avoid requiring as much physical memory (or it could just run on a machine with more memory or page file available).
Isolate::PrintStack executes within a DisallowHeapAllocation block, so it doesn't allocate anything or run GC. I haven't investigated printing for every object type, so I'm less sure about them, but I'd be surprised if they allocated things in the JS heap or triggered GC.
I hope so. I like the idea that fixing a problem in this tool is likely just finding some variable that needs to be initialized and adding it to a list, rather than substantially rewriting code that has to match the internal logic of V8. |
I have a somewhat working lldb extension which uses the API proposed by @sethbrenith to debug Node.js core dumps. It's still extremelly rough in the edges though, I'll share it after doing some cleanup. Overall I like the API, it's simple to use and gives you the results you want. I left a few suggestions (along with possible implementations) in the CL. My main concern for now is regarding the
In my current prototype the postmortem host code is included in the |
@sethbrenith feel free to join our bi-weekly working group meeting to discuss this topic if you want. We'll have one today at 09:30 PST (#274). We'll also have a summit early in March (#203) in Munich if you want to discuss it in person :) |
I would imagine that we would have a very limited set of supported features that we can perform with the debugging host:
I don't actually expect any continued execution, nor GC. In fact, that would be a security risk. Regarding testing, I was hoping to have code in V8 upstream, and also include tests to ensure it works. |
Could we have an option to print the object's content and stack traces in a JSON format? A JSON format would make it easier for debuggers to extend functionality without requiring changes to V8. For example, a debugger could get the JSON formatted stack trace from V8 and interpolate it with the native stack trace to output a complete stack trace with all C++ and JavaScript frames (I don't think we can get this complete stack trace today with the gdb macros). |
Thanks @mmarchini for the prototyping work! It's really valuable to know that thread local storage is a problem area on *nix debuggers. That said, I'd recommend holding off on further coding for now because the discussion in the doc is still pretty far from consensus on whether this is the right approach. I agree that an option for machine-parseable output would be useful; my WinDbg prototype was using regex matching on the output to find things that look like object pointers and make them clickable, which is pretty delightful but also error-prone. In the long term, when printing is based on Torque metadata and not hand-coded, it should be relatively easy to support two separate printing format options. Unfortunately I can't make it to Munich this time, but I'll call in to this morning's meeting. Thanks for the invitations! |
@hashseed wrote:
I agree that we probably don't want that, but when I asked that, I wasn't clear on what would prevent that. I'm still not sure how we know that attempting to print an object or one of these other operations wouldn't try to allocate memory from the VM's heap. It seems like an intrinsic challenge with using the VM code itself inside the debugger without the VM really knowing that it's running in a restricted context? @mmarchini wrote:
I think this would be great. It's trickier than it might seem for objects, since they can contain cycles and other complex structures. A few years ago we proposed a JSON format that I believe could represent heap objects, stacks, and other metadata, but I don't think anybody's implemented most of it. |
If anyone is interested, I implemented a proof-of-concept lldb plugin for Node.js using the API described here. The plugin is intended for analyzing Linux core dumps on Linux environments. The code is available here and implementation details as well as challenges and limitations are described here. |
Based on the meeting we had last Thursday, the most likely path moving forward is to use Torque to generate meta description of objects, and then use those meta descriptions to navigate objects using debuggers. It might take some time to get there though, because Torquification of heap objects in V8 is still in its early stages. |
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made. |
I'm working on a new approach to post-mortem debugging to make it easier for tools to keep up-to-date with V8 changes. As discussed in #202, keeping llnode up-to-date today is a tortuous and tricky process, and sometimes llnode might not work for the last Node.js releases (for example, it's not working for Node.js v10.x at the moment).
This new approach relies on V8 exposing an API to make it easier to navigate core dumps. V8 don't need to know how to handle core dumps as long as it provides a way for tools to navigate the heap without reimplementing everything and using several heuristics.
I created a design document for this proposal and I'm working on a prototype:
Would love to get some feedback from @nodejs/v8 and @nodejs/post-mortem on this proposal :)
The text was updated successfully, but these errors were encountered: