dissect dead node service core dumps with mdb via a smart os vm
There is a Unix-like operating system called SmartOS whose ancestry represents a strong investment in low-level introspection tools (such as dtrace for instance).
Once such tool is mdb
, a high quality modular debugger which
ships with SmartOS - it can be used to inspect execution
context from the kernel to application layers.
Some rather clever people wrote a debugging module called mdb_v8
that allows introspection of node core dumps from a high level (e.g.
inspecting closure scope) to a low level (e.g. memory addresses).
It turns out that mdb
can analyse Linux core files as well
as SmartOS core files. We just have provide the core file
and the node binary that was running when the core dump was
generated.
autopsy
installs a a SmartOS VM and then acts as a
stdio proxy to mdb
.
For using mdb see the mdb reference docs
- VirtualBox
- 2gb of RAM for VM
- The VM runs entirely in RAM and mdb can be memory intensive also, 2gb is a safe bet
Install autopsy from npm:
npm install -g autopsy
Once finished the following executables will be available
- autopsy setup - runs setup
- autopsy start - starts the vm
- autopsy stop - stops the vm
- autopsy status - gets vm status
- autopsy remove - removes vm
- autopsy ssh - use exactly as you would ssh, provides a tunnel from a server to locally installed vm
- autopsy - provides the CLI proxy to mdb in the vm
Next, set up the VM
autopsy setup
This will install autopsy on the system, download smartos virtual machine assets and setup a smartos vm in virtual box.
Assets for the VM are ~150mb and downloads from S3.
If setup is interupted for any reason (including network failure during assets download), simply try again. Partial downloads will be resumed.
Before we can do an autopsy the VM needs to be running.
Simply run
autopsy start
Autopsy takes a snapshot of the initial VM state on first run to
optimize subsequent boots, so the first autopsy-start
will be
the longest.
The VM runs SmartOS completely in ram (there are no zones). This means VM state is immutable.
The autopsy
command takes the following args
autopsy [node-binary] core-file
On OS X the node binary is not optional, on Linux if not supplied the current installed node binary will be used.
When this command is run the following occurs
- copy the node binary and core file into the vm (in a purpose built smartos zone)
- ssh into the vm, login to the relevant smartos zone
- run mdb with the copied core dump and node binary files
- inject
::load v8
to get the v8 related debugging commands - pipe host machine process.stdin to the mdb interactive environment and pipe the mdb stdout to the host machine stdout
For using mdb see the mdb reference docs
When we're done we may wish to free memory by stopping the VM with
autopsy stop
The example
folder has a core
and node
file that we're
generated by the die.js
file
You can try out autopsy with these two files (on OS X and Linux), from the same folder as this readme do
autopsy example/node example/core
Once the mdb console appears you can try
> ::jsstack
For starters, and then if you want to get fancy
> ::findjsobjects -p myproperty
137289672551
> 137289672551::jsprint
EC2 (and other VPS-type solutions) runs "machines" in virtualized containers, it's very tricky to make a virtual machine run on a virtual machines, and even where it is possible there is either an insufferable performance cost and/or certain low level features must be enabled which risk of introducing security issues. That aside copying node, a core file and using mdb all in a ram-only VM is memory intensive - not something we want to do (or maybe even can do) on a production server.
But autopsy provides a way to do seamless postmortems on an EC2 server or any kind of linux VM - by setting up an SSH tunnel back to through the local machine and into the SmartOS vm running locally.
This can be achieved in a few easy steps
- install autopsy on local system and run autopsy setup, ensure its working locally
- install autopsy within a VM or on an EC2 server (etc.)
- ssh into the server in the same way as always, except prefix the command with
autopsy
, like so:
autopsy ssh -i myKey.pem user@example.com
Simply use whatever ssh
flags you normally would, and autopsy will additionally
set up the tunnelling (for the curious we inject the -R
flag with the port of VM
mapped through to the same port on the server.)
In production, if we run our node processes with --abort-on-uncaught-exception
we will always get a core dump when a process crashes (that is, as long as our linux environment is set up correctly)
You can also manually generate a core file using process.abort()
.
Finally a core file can also be obtained by attaching gdb
to a running processing and executing generate-core
.
If you're using an ubuntu server (and probably debian etc. etc.) you may have apport installed - this intercepts core files so we need to get rid of it
sudo apt-get purge apport
Next you need to make sure that linux is configured to allocate space for the core file, like so
ulimit -c unlimited
The VM currently maps port 2222 to the port 22 (ssh), at the moment is non-configurable - so to use autopsy port 2222 needs to be free on the host system.
Currently there's no command for removing the vm, follow these steps, in order
- open virtual box, right click the vm, click remove - then click "delete all files"
- rm the
assets
folder from the autopsy module folderrm $(npm get prefix)/lib/node_modules/autopsy/assets
- make sure there isn't a smartos folder left in the virtual box virtual machines folder (
~/VirtualBox\ VMs
)
We recommend installing globally, since there can (currently) only be one smartos vm.
If the smartos.iso file or any parent folder is moved/renamed the vm will fail to start because virtualbox won't be able to locate the the iso. In this case you would need to manually update virtual box with the paths.
For troubleshooting (or the curious), debugging can be turned on like so
DEBUG=autopsy:* <cmd>
At present the following commands have debug output
- autopsy
- autopsy setup
- autopsy start