Debugging and troubleshooting

If you're developing or debugging Tempesta's code (thanks for that by the way!), then you need to make several steps to simplify the process or even save your VM in a cloud. Tempesta FW is a Linux kernel extension, so the below are typical steps helping in Linux kernel development and debugging.

Automatic reboot to safe kernel

If a kernel crash occurs, then the system may hang, so it has a sense to setup automatic reboot. Normally your Linux should reboot automatically on kernel panic. You can check it by

    # cat /proc/sys/kernel/panic
    1

1 means 1 second before reboot on system panic. You can access and set the setting by sysctl kernel.panic. Next, you can emulate the panic by

    # echo c > /proc/sysrq-trigger

and see that the system reboots in 1 second. One more important setting is sysctl kernel.panic_on_oops (you can find it also in /proc/sys/kernel/panic_on_oops). Usually it's set to 1, i.e. reboot on any kernel fault ("Oops") occurred. Setting this isn't necessary, but you may prefer to use it.

Next, you need to reboot to testing kernel but make the system reboot automatically to safe kernel if the first one occasionally crashes. You can do this using

    GRUB_DEFAULT=saved
    GRUB_CMDLINE_LINUX_DEFAULT="panic=1"

in /etc/default/grub. The settings allow you to set a kernel as safe, i.e. booted by default, and add the kernel parameter to reboot automatically in 1 second after panic. The kernel parameter automatically sets kernel.panic sysctl. To apply the changes run

   # update-grub

for Debian or Ubuntu. Now let's list all installed kernels in the system. For Debian or Ubuntu you might see following:

    # grep 'menuentry\>' /boot/grub/grub.cfg
    menuentry 'Ubuntu' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-cfda0544-9803-41e7-badb-43563085ff3a' {
        menuentry 'Ubuntu, with Linux 4.8.15+' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.8.15+-advanced-cfda0544-9803-41e7-badb-43563085ff3a' {
    	menuentry 'Ubuntu, with Linux 4.8.15+ (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.8.15+-recovery-cfda0544-9803-41e7-badb-43563085ff3a' {
        menuentry 'Ubuntu, with Linux 4.4.0-75-generic' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.4.0-75-generic-advanced-cfda0544-9803-41e7-badb-43563085ff3a' {
    	menuentry 'Ubuntu, with Linux 4.4.0-75-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.4.0-75-generic-recovery-cfda0544-9803-41e7-badb-43563085ff3a' {
        menuentry 'Ubuntu, with Linux 4.4.0-45-generic' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.4.0-45-generic-advanced-cfda0544-9803-41e7-badb-43563085ff3a' {
    	menuentry 'Ubuntu, with Linux 4.4.0-45-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.4.0-45-generic-recovery-cfda0544-9803-41e7-badb-43563085ff3a' {

I.e. the new kernel is listed in submenu. However, we can choose the first kernel as testing and the second one as safe:

   # grub-set-default 'Ubuntu, with Linux 4.4.0-75-generic'
   # grub-reboot 'Ubuntu, with Linux 4.8.15+'

After the system reboot it will boot into 4.8.15-tfw kernel, but if it fails it will automatically reboot to 4.1.27+ kernel.

Crash dumps

It might be useful to setup automatic storing kernel crash dumps. There are plenty of good documentation about kdump & crash utility, please explore the links from references.

Just some settings for quick start and to fix known issues:

kernel.panic=60 in /etc/sysctl.conf - we've found that sometimes kdump isn't in time to create a memory image and this setting fixes the problem.
it's better to use as small tempesta_dbmem kernel parameter as possible, e.g. tempesta_dbmem=4. Smaller TempestaDB memory leave more space for kdump.
adjust /etc/default/grub.d/kdump-tools.default as GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=2048M-:512M"

References

Serial console

It's extremely important for the kernel debugging to get output of the serial console (dmesg), which contains call stacks and necessary system output just before a problem occurred.

Add printk.synchronous=1 to the kernel cmdline to get all printk() messages. If you see messages like

** 5029 printk messages dropped **

in dmesg, then you typically need to set the option. This is extremely important for debug builds producing massive printing.

Virtual machines

If you use a KVM virtual machine, then it's very straightforward to enable serial console output:

add -serial file:serial.txt option to qemu-system-x86_64 call so KVM will write the output of the serial console to the serial.txt file.
add console=tty0 console=ttyS0,115200n8 to GRUB_CMDLINE_LINUX line in /etc/default/grub to let the kernel forward the console output to the tty device.
update the grub as in the instructions above and reboot your VM.

Hardware servers

If you run Tempesta FW on a hardware server and do not have access to IPMI, then a netconsole can be used to get the output of serial console. You can use the Ubuntu guide how to set it up.

Decoding chashdump and netconsole call traces

Call traces are dumped on kernel crash and on kernel warnings. Typical trace consists of entries which looks like

do_syscall_64+0x33/0x80

Here are

a function name (do_syscall_64 in the example),
an offset from function beginning (0x33 in the example),
a function length (/0x80 in the example).

The addr2line utility helps to convert this information into a source file name and line number. The call to utility is:

addr2line -e /PATH/TO/MODULE -i HEX_ADDRESS

Here:

/PATH/TO/MODULE is a path to a module where the function is located. In case of kernel built in functions, this should be a path to the kernel. Note that the kernel should not be compressed and debug info should not be stripped from it.
HEX_ADDRESS - an address of the instruction, '0x' prefix is required.

To convert the function name and the offset into the plain hex address, a mix from nm, grep, perl & awk could be used, so the overal command will be in the form

addr2line -e /PATH/TO/MODULE -i $(perl -e 'printf("%x\n", 0x'$(nm /PATH/TO/MODULE | grep '\<FUNCTION_NAME$' | awk '{print $1}')'+HEX_OFFSET)')

; for example above it will be

addr2line -e vmlinux -i $(perl -e 'printf("%x\n", 0x'$(nm vmlinux | grep '\<do_syscall_64$'| awk '{print $1}')'+0x33)')

There are scripts in linux kernel tree capable of automating this task. Those can be used like this:

linux-kernel-source/scripts/decode_stacktrace.sh linux-kernel-source/vmlinux linux-kernel-source tempesta-source/fw/tempesta_fw.ko < dmesg.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly