Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get a Backtrace from ROS2 tutorial #58

Merged
merged 8 commits into from
Aug 5, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
218 changes: 218 additions & 0 deletions tutorials/docs/get_backtrace.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
.. _get_backtrace:

Get Backtrace in ROS2 / Nav2
****************************

- `Overview`_
- `Preliminaries`_
- `From a Node`_
- `From a Launch File`_
- `From Navigation2 Bringup`_

Overview
========

This document explains one set of methods for getting backtraces for ROS 2 and Navigation2.
There are many ways to accomplish this, but this is a good starting point for new C++ developers without GDB experience.

The following steps show ROS 2 users how to modify the navigation2 stack to get traces from specific servers when they encounter a problem.
This tutorial applies to both simulated and physical robots.

This will cover how to get a backtrace from a specific node using ``ros2 run``, from a launch file representing a single node using ``ros2 launch``, and from a more complex orchestration of nodes.
By the end of this tutorial, you should be able to get a backtrace when you notice a server crashing in ROS2.

Preliminaries
=============

GDB is the most popular debugger for C++ on Unix systems.
It can be used to determine the reason for a crash and track threads.
It may also be used to add breakpoints in your code to check values in memory a particular points in your software.

Using GDB is a critical skill for all software developers working on C/C++.
Many IDEs will have some kind of debugger or profiler built in, but with ROS, there are few IDEs to choose.
Therefore it's important to understand how to use these raw tools you have available rather than relying on an IDE to provide them.
Further, understanding these tools is a fundamental skill of C/C++ development and leaving it up to your IDE can be problematic if you change roles and no longer have access to it or are doing development on the fly through an ssh session to a remote asset.

Using GDB luckily is fairly simple after you have the basics under your belt.
The first step is to add ``-g`` to your compiler flags for the ROS package you want to profile / debug.
This flag builds debug symbols that GDB and valgrind can read to tell you specific lines of code in your project are failing and why.
If you do not set this flag, you can still get backtraces but it will not provide line numbers for failures.
Be sure to remove this flag after debugging, it will slow down performance at run-time.

Adding the following line to your ``CMakeLists.txt`` for your project should do the trick.
If your project already has a ``add_compile_options()``, you can simply add ``-g`` to it.
Then simply rebuild your workspace with this package ``colcon build --packages-select <package-name>``.
It may take a little longer than usual to compile.

.. code-block:: cmake

add_compile_options(-g)

Now you're ready to debug your code!
If this was a non-ROS project, at this point you might do something like below.
Here we're launching a GDB session and telling our program to immediately run.
Once your program crashes, it will return a gdb session prompt denoted by ``(gdb)``.
At this prompt you can access the information you're interested in.
However, since this is a ROS project with lots of node configurations and other things going on, this isn't a great option for beginners or those that don't like tons of commandline work and understanding the filesystem.

.. code-block:: bash

gdb ex run --args /path/to/exe/program

Below are sections to describe the 3 major situations you could run into with ROS2-based systems.
Read the section that best describes the problem you're attempting to solve.

From a Node
===========

Just as in our non-ROS example, we need to setup a GDB session before launching our ROS2 node.
While we could set this up through the commandline with some knowledge of the ROS 2 file system, we can instead use the launch ``--prefix`` option the kind folks at Open Robotics provided for us.

``--prefix`` will execute some bits of code before our ``ros2`` command allowing us to insert some information.
If you attempted to do ``gdb ex run --args ros2 run <pkg> <node>`` as analog to our example in the preliminaries, you'd find that it couldn't find the ``ros2`` command.
If you're even more clever, you'd find that trying to source your workspace would also fail for similar reasons.

Rather than having to revert to finding the install path of the executable and typing it all out, we can instead use ``--prefix``.
This allows us to use the same ``ros2 run`` syntax you're used to without having to worry about some of the GDB details.

.. code-block:: bash

ros2 run --prefix 'gdb -ex run --args' <pkg> <node> --all-other-launch arguments

Just as before, this prefix will launch a GDB session and run the node you requested with all the additional commandline arguments.
You should now have your node running and should be chugging along with some debug printing.

Once your server crashes, you'll see a prompt like below. At this point you can now get a backtrace.

.. code-block:: bash

(gdb)

In this session, type ``backtrace`` and it will provide you with a backtrace.
Copy this for your needs.
For example:

.. code-block:: bash

(gdb) backtrace
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff79cc859 in __GI_abort () at abort.c:79
#2 0x00007ffff7c52951 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff7c5e47c in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff7c5e4e7 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff7c5e799 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff7c553eb in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x000055555555936c in std::vector<int, std::allocator<int> >::_M_range_check (
this=0x5555555cfdb0, __n=100) at /usr/include/c++/9/bits/stl_vector.h:1070
#8 0x0000555555558e1d in std::vector<int, std::allocator<int> >::at (this=0x5555555cfdb0,
__n=100) at /usr/include/c++/9/bits/stl_vector.h:1091
#9 0x000055555555828b in GDBTester::VectorCrash (this=0x5555555cfb40)
at /home/steve/Documents/nav2_ws/src/gdb_test_pkg/src/gdb_test_node.cpp:44
#10 0x0000555555559cfc in main (argc=1, argv=0x7fffffffc108)
at /home/steve/Documents/nav2_ws/src/gdb_test_pkg/src/main.cpp:25

In this example you should read this in the following way, starting at the bottom:

- In the main function, on line 25 we call a function VectorCrash.

- In VectorCrash, on line 44, we crashed in the Vector's ``at()`` method with input ``100``.

- It crashed in ``at()`` on STL vector line 1091 after throwing an exception from a range check failure.

These traces take some time to get used to reading, but in general, start at the bottom and follow it up the stack until you see the line it crashed on.
Then you can deduce why it crashed.
When you are done with GDB, type ``quit`` and it will exit the session and kill any processes still up.
It may ask you if you want to kill some threads at the end, say yes.

From a Launch File
==================

Just as in our non-ROS example, we need to setup a GDB session before launching our ROS2 launch file.
While we could set this up through the commandline, we can instead make use of the same mechanics that we did in the ``ros2 run`` node example, now using a launch file.

In your launch file, find the node that you're interested in debugging.
For this section, we assume that your launch file contains only a single node (and potentially other information as well).
The ``Node`` function used in the ``launch_ros`` package will take in a field ``prefix`` taking a list of prefix arguments.
We will insert the same GDB snippet here as used in our node example.
See below for an example debugging SLAM Toolbox.

.. code-block:: python

start_sync_slam_toolbox_node = Node(
parameters=[
get_package_share_directory("slam_toolbox") + '/config/mapper_params_online_sync.yaml',
{'use_sim_time': use_sim_time}
],
package='slam_toolbox',
executable='sync_slam_toolbox_node',
name='slam_toolbox',
prefix=['gdb -ex run --args'],
output='screen')

Just as before, this prefix will launch a GDB session and run the launch file you requested with all the additional launch arguments defined.

.. note::
As of ROS2 Foxy, there may be an issue with ``prefix`` in launch files not returning ``gdb`` prompts after a crash. See `this ticket <https://github.com/ros2/launch_ros/issues/165>`_ for more information.

Once your server crashes, you'll see a prompt like below. At this point you can now get a backtrace.

.. code-block:: bash

(gdb)

In this session, type ``backtrace`` and it will provide you with a backtrace.
Copy this for your needs.
See the example trace in the section above for an example.

These traces take some time to get used to reading, but in general, start at the bottom and follow it up the stack until you see the line it crashed on.
Then you can deduce why it crashed.
When you are done with GDB, type ``quit`` and it will exit the session and kill any processes still up.
It may ask you if you want to kill some threads at the end, say yes.

From Navigation2 Bringup
========================

Working with launch files with multiple nodes is a little different so you can interact with your GDB session without being bogged down by other logging in the same terminal.
For this reason, when working with larger launch files, its good to pull out the specific server you're interested in and launching it seperately.

As such, for this case, when you see a crash you'd like to investigate, its beneficial to separate this server from the others.

If your server of interest is being launched from a nested launch file (e.g. an included launch file) you may want to do the following:

- Comment out the launch file inclusion from the parent launch file

- Recompile the package of interest with ``-g`` flag for debug symbols

- Launch the parent launch file in a terminal

- Launch the server's launch file in another terminal following the instructions in `From a Launch File`_.

Alternatively, if you server of interest is being launched in these files directly (e.g. you see a ``Node``, ``LifecycleNode``, or inside a ``ComponentContainer``), you will need to seperate this from the others:

- Comment out the node's inclusion from the parent launch file

- Recompile the package of interest with ``-g`` flag for debug symbols

- Launch the parent launch file in a terminal

- Launch the server's node in another terminal following the instructions in `From a Node`_.

.. note::
Note that in this case, you may need to remap or provide parameter files to this node if it was previously provided by the launch file. Using ``--ros-args`` you can give it the path to the new parameters file, remaps, or names. See `this ROS2 tutorial <https://index.ros.org/doc/ros2/Tutorials/Node-arguments/>`_ for the commandline arguments required.

We understand this can be a pain, so it might encourage you to rather have each node possible as a separately included launch file to make debugging easier. An example set of arguments might be ``--ros-args -r __node:=<node_name> --params-file /absolute/path/to/params.yaml`` (as a template).

Once your server crashes, you'll see a prompt like below in the specific server's terminal. At this point you can now get a backtrace.

.. code-block:: bash

(gdb)

In this session, type ``backtrace`` and it will provide you with a backtrace.
Copy this for your needs.
See the example trace in the section above for an example.

These traces take some time to get used to reading, but in general, start at the bottom and follow it up the stack until you see the line it crashed on.
Then you can deduce why it crashed.
When you are done with GDB, type ``quit`` and it will exit the session and kill any processes still up.
It may ask you if you want to kill some threads at the end, say yes.
1 change: 1 addition & 0 deletions tutorials/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Navigation2 Tutorials
:maxdepth: 1

docs/camera_calibration.rst
docs/get_backtrace.rst
docs/navigation2_on_real_turtlebot3.rst
docs/navigation2_with_slam.rst
docs/navigation2_with_stvl.rst
Expand Down