Skip to content

Commit

Permalink
bestpractices: add crash pre-requisites
Browse files Browse the repository at this point in the history
This covers the preparation of crash diagnostics
Refs: #254
  • Loading branch information
gireeshpunathil committed Sep 25, 2019
1 parent baa43e9 commit 2640f91
Showing 1 changed file with 130 additions and 0 deletions.
130 changes: 130 additions & 0 deletions documentation/crash/crash_setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@

## Node.js application crash diagnostics: Best Practices series #1

This is the first of a series of best practices and useful tips if you
are using Node.js in large scale production systems.

## Introduction

Typical prodcution systems do not enjoy the benefits of development
and staging systems in many aspects:

- they are isolated from public internet
- they are not loaded with development and debug tools
- they are configured with the most robust and secure
configurations possible at the OS level
- in certain deployment scenarios (such as Cloud) those
operate in a head-less mode [ no ssh ]
- in certain deployment scenarios (such as Cloud) those
operate in a state-less mode [ no persistent disk]

The net effect of these constraints is that your production systems
need to be manually `prepared` in advance to enable crash dianostic
data generation on the first failure itself, without loosing vital data.
The rest of the document illustrates this preparation steps.

## Available disk space
Ensure that there is enough disk space available for the core file
to be written:

- Maximum of 4GB for a 32 bit process.
- Much larger for 64 bit process (common case). To know the precise
requirement, measure the peak-load memory usage of your application.
Add a 10% to that to accommodate core metadata. If you are using
common monitoring tools, one of the graph should reveal the peak
memory. If not, you can measure it directly in the system.

In Linux variants, you can use `top -p <pid>` to see the instantaneous
memory usage of the process:

```
$ top -p 106916
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
106916 user 20 0 600404 54500 15572 R 109.7 0.0 81098:54 node
```

In Darwin, the flag is `-pid`
In AIX, the command is `topas`
In freebsd, the command is `top`. In both AIX and freebsd, there is no
flag to show per-process details. In Windows, you could use the task
manager window and view the process attributes visually.

Insufficient file system space will result in truncated core files,
and can severely hamper the ability to diagnose the problem.

Figure out how much free space is available in the file system:
`df -k` can be used invariably across UNIX platforms.
In Windows, Windows explorer when pointed to a disk partition,
provides a view of the available space in that partition.

## Core file location and name

By default, core file is generated on a crash event, and is
written to the current working directory - the location from
where the node process was started, in most of the UNIX variants.
In Darwin, it appears in /cores location.

By default, core files from node processes on Linux are named as
`core` or `core.<pid>`, where <pid> is node process id.
By default, core files from node processes on AIX and Darwin are
named ‘core’.
By default, core files from node processes on freebsd are named
‘%N.core’. where `%N` is the name of the crashed process.

However, Superuser (root) can control and change these defaults.

In Linux, `sysctl kernel.core_pattern` shows corrent core file pattern.

Modify pattern using `sysctl -w kernel.core_pattern=pattern` as root.

In AIX, `lscore` shows the current core file pattern.

Enable full core dump generation using `chdev -l sys0 -a fullcore=true`
Modify the current pattern using `chcore -p on -n on -l /path/to/coredumps`

In Darwin and freebsd, `sysctl kern.corefile` shows the corrent core file pattern.

Modify the current pattern using `sysctl -w kern.corefile=newpattern` as root.

To obtain full core files, set the following ulimit options, across UNIX variants:

`ulimit -c unlimited` - turn on core file generation capability with unlimited size
`ulimit -d unlimited` - set the user data limit to unlimited
`ulimit -f unlimited` - set the file limit to unlimited

The current ulimit settings can be displayed using:

`ulimit -a`

However, these are the `soft` limits and are enforced per user, per
shell environment. Please note that these values are themselves
practically constrained by the system-wide `hard` limit set by the
system administrator. System administrators (with superuser privileges)
may display, set or change the hard limits by adding the -H flag to
the standard set of ulimit commands.

## Manual dump generation

Under certain circumstances where you want to collect a core
manually follow these steps:

In linux, use `gcore [-a] [-o filename] pid` where `-a`
specifies to dump everything.
In AIX, use `gencore [pid] [filename]`
In freebsd and Darwin, use `gcore [-s] [executable] pid`
In Windows, you can use `Task manager` window, right click on the
node process and select `create dump` option.

Special note on Ubuntu systems with `Yama hardened kernel`

Yama security policy inhibits a second process from collecting dump,
practically rendering `gcore` unusable.

`setcap cap_sys_ptrace=+ep `which gdb``


These steps make sure that when your Node.js application crashes in
production a valid, full core dump is generated at a known location that
can be loaded into debuggers that understand Node.js internsls, and
diagnose the issue. Next article in this series will focus on that part.

0 comments on commit 2640f91

Please sign in to comment.