Discuss: State of llnode #202

mmarchini · 2018-06-06T16:36:32Z

Follow up from the Diagnostics Session in the Collaborators Summit:

llnode is a project under the Node.js org., and since it is a diagnostic tool it should be of our interest as a Working Group. Right now the project is fairly stable but has some issues:

The first one is lack of visibility for the project: it seems like llnode is mostly known and used by Core collaborators and a handful of enterprises with large deployments. There's little content about llnode, and I couldn't find any content about llnode on nodejs.org, Node.js Medium or Node.js Youtube Channel
As a consequence, we only have a handful of people working on the project, leading to slow progress as well as PRs staying open forever (examples: JS API: initial implementation llnode#147 and src: add commands to inspect the workqueue llnode#122). Since we need to update llnode for almost every V8 releases, this can also lead to llnode not working for the latest version of Node.js.

I'm bringing this up as a discussion related to #157. llnode is a tool worth being 1st tier supported based on its importance for Core and Native Module developers as well as enterprise, but since it is a less known feature with a low bus factor its too risky today to have llnode as a potential blocker for releases.

Would love to discuss some ideas on how to improve the current scenario of the project.

The text was updated successfully, but these errors were encountered:

cjihrig · 2018-06-06T18:19:06Z

I couldn't find any content about llnode on nodejs.org, Node.js Medium or Node.js Youtube Channel

I wrote this a few months back, but it's outside of any official channels.

llnode is a tool worth being 1st tier supported

I want to agree with you here, but it's extremely impractical without some type of commitment or collaboration with V8. (Also, what about ChakraCore?)

we only have a handful of people working on the project

An unscientific Twitter poll I ran a while back said that most people thought postmortem debugging was too hard, didn't think it was worth it, or didn't know how. I can quote at least one TSC member as saying that postmortem debugging should just be a thing of the past. As promises are used more and more, without a great solution for them, I think fewer people will care. Without many users, there will be even fewer people willing to maintain a tool with such a steep learning curve (in some cases you need to reverse engineer what V8 is doing).

I'm not trying to be negative here - I'm a big fan of llnode, but those are the challenges that I see.

mmarchini · 2018-06-06T18:30:21Z

@cjihrig I agree with everything you said.

people thought postmortem debugging was too hard

I think we'll be able to improve llnode's usability once the JS API lands. For example, we could have a GUI or our own REPL.

I can quote at least one TSC member as saying that postmortem debugging should just be a thing of the past

Have they suggested any alternatives? V8 Snapshots are too expensive to be used in production, and they don't provide all the information we can gather with core dumps (the complete call stack, inspecting JS contexts and scripts, etc.).

As promises are used more and more, without a great solution for them, I think fewer people will care

Agreed, but to move forward on how to handle promises we need more people contributing (which is hard).

I want to agree with you here, but it's extremely impractical without some type of commitment or collaboration with V8. (Also, what about ChakraCore?)

For some time now I wanted to try to use V8's API (or even N-API) to write something similar to llnode, which wouldn't require so many hacks and wouldn't rely on postmortem metadata to inspect a core dump, but I didn't had the time to work on it yet.

joyeecheung · 2018-06-06T20:53:25Z

I want to agree with you here, but it's extremely impractical without some type of commitment or collaboration with V8. (Also, what about ChakraCore?)

I have to agree with @cjihrig on this for now, if llnode being in tier 1 means breakage of llnode would block V8 updates in core.

I think we'll be able to improve llnode's usability once the JS API lands. For example, we could have a GUI or our own REPL.

On one hand I am optimistic about this, on the other hand based on a GUI prototype (cannot release it because I worked on it in my last job) that I've developed before and observations from using llnode with a lot of in-production core dumps (frequent crashes caused by lldb itself, cannot deal with a lot of core dumps while gdb works fine, etc.), I don't think the tool is ready, especially after I've seen this post from a year ago in the lldb mailing list. For example, sometimes I have to use this hack in order to load core dumps with lldb, but it does not always work because there are deeper issues with how lldb just crash when it encounter things that it does not support or just doesn't try hard enough. It's nice to have this tool usable when I need to debug C++ and JavaScript together and it's kind of painful when I couldn't, but with the support of lldb itself I think that's where we will be for some time. The JS API does not solve the issue of lldb and a crash of the background process of a GUI tool is very confusing even if I know it's not unfixable. You just get lazy when you think about the llvm development process (svn, mailing list, giant repo, .etc).

mike-kaufman · 2018-06-07T16:20:00Z

I also agree w/ Colin here. Not sure what the solution is - any debugger extension working over a core dump is going to require detailed understanding of the host + VM internals, and those internals will change.

I also agree that the right approach here is to push all knowledge of VM internals down onto the VM, and land on a stable API that debugger extensions can take a dep on. Perhaps this API becomes part of n-api. Need to get the VMs to sign up here to support this. Not sure what V8 already has in place here. Chakra has a windbg extension that understands externals, but unless someone volunteers, I don't see that being ported to any other platforms.

Also, I agree w/ above that this is a scenario/tool w/ narrow appeal. For most JS devs, they only care about the JS stack + JS heap. The question is, is a JS stack + heap sufficient to debug a core-dump in a meaningful way? Or, is it the common case that by the time you have a core dump, the failure is pretty deep & the JS stack + heap is useless?

That said, in terms of design, it would be nice if there was a system like this:

-------------
| UI Tool    |   ---->  let people have a variety of tools/UX here. 
-------------          UI tools can work cross-plat, cross-node-version, & cross-VM
     /\
      |      ------------> some crdp-like protocol
      |
---------------------------
| Node Core Dump Adapter |   ----> this is going to be platform-dependent, 
--------------------------             VM dependent & node version dependent
     /\                                 May leverage host-specific tools like lldb to read
      |                                 interact w/ dump.
      |
---------------    
| core dump |    
---------------

joyeecheung · 2018-06-07T21:20:58Z

The question is, is a JS stack + heap sufficient to debug a core-dump in a meaningful way?

Having used llnode to debug many in-production core dumps, my answer is yes. If someone's app is stuck in an infinite loop (which is not that uncommon I am afraid, especially in buggy loggers), it's possible to use gcore to trigger a core dump and use that to see the call stack. Sometimes it helps with OOM if the objects causing it is obvious from the object count/total size (you'll be able to trace the reference with llnode), or if it happens that the app is running code that cause the OOM, like some code that tries to join a huge array then you'll be able to see where that array is coming from. Sometimes it helps with segfaults if you need the JS stack for reproduction.

But the core dump is obviously not a silver bullet, also lldb itself does not work as well as gdb with a lot of core dumps.

Or, is it the common case that by the time you have a core dump, the failure is pretty deep & the JS stack + heap is useless?

It's very rare that the JS stack or heap is completely useless. It's just hard to know where to look if you are not familiar with the code base and there are a lot of node_modules functions on the stack and a lot of objects created by code that you are not familiar with. But I would say it's the same situation with other VM-aware tools like heap snapshots.

github-actions · 2020-07-17T00:37:24Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

mmarchini · 2020-07-17T03:49:16Z

This spun into other issues so I believe it can be closed. Feel free to reopen if more discussion is needed.

mmarchini closed this as completed Jun 6, 2018

mmarchini reopened this Jun 6, 2018

mmarchini mentioned this issue Jul 19, 2018

doc: initial cut at support tiers for diag tools nodejs/node#21870

Closed

3 tasks

mmarchini mentioned this issue Aug 30, 2018

Post-mortem debugging support inside V8/Node.js #227

Closed

github-actions bot added the stale label Jul 17, 2020

mmarchini closed this as completed Jul 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discuss: State of llnode #202

Discuss: State of llnode #202

mmarchini commented Jun 6, 2018

cjihrig commented Jun 6, 2018

mmarchini commented Jun 6, 2018 •

edited

Loading

joyeecheung commented Jun 6, 2018 •

edited

Loading

mike-kaufman commented Jun 7, 2018 •

edited

Loading

joyeecheung commented Jun 7, 2018 •

edited

Loading

github-actions bot commented Jul 17, 2020

mmarchini commented Jul 17, 2020

Discuss: State of llnode #202

Discuss: State of llnode #202

Comments

mmarchini commented Jun 6, 2018

cjihrig commented Jun 6, 2018

mmarchini commented Jun 6, 2018 • edited Loading

joyeecheung commented Jun 6, 2018 • edited Loading

mike-kaufman commented Jun 7, 2018 • edited Loading

joyeecheung commented Jun 7, 2018 • edited Loading

github-actions bot commented Jul 17, 2020

mmarchini commented Jul 17, 2020

mmarchini commented Jun 6, 2018 •

edited

Loading

joyeecheung commented Jun 6, 2018 •

edited

Loading

mike-kaufman commented Jun 7, 2018 •

edited

Loading

joyeecheung commented Jun 7, 2018 •

edited

Loading