Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rclcpp::Node::make_shared() or destructor does not appear to be threadsafe #447

Open
matt-attack opened this issue Mar 2, 2018 · 3 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@matt-attack
Copy link
Contributor

matt-attack commented Mar 2, 2018

Trying to create then destroy many nodes at the same time causes a wide variety of different crashes. It seems that either creating or destroying a node is not thread safe.

Here's a minimal example that crashes almost every time for me:

#include <chrono>
#include <iostream>
#include <memory>
#include <string>

#include "rclcpp/rclcpp.hpp"

#include "example_interfaces/srv/add_two_ints.hpp"

using namespace std::chrono_literals;

void fun(std::string id)
{
  std::shared_ptr<rclcpp::Node> node = rclcpp::Node::make_shared("add_two_ints_client"+id);
}

int main(int argc, char ** argv)
{
  // Force flush of the stdout buffer.
  setvbuf(stdout, NULL, _IONBF, BUFSIZ);

  rclcpp::init(argc, argv);

  int num_threads = 20;

  std::vector<std::thread> threads;
  for (int i = 0; i < num_threads; i++)
  {
    threads.push_back(std::thread(std::bind(fun, std::to_string(i))));
  }

  for (int i = 0; i < num_threads; i++)
  {
    threads[i].join();
    printf("Thread %d Finished\n", i);
  }
  
  printf("Finished");

  rclcpp::shutdown();
  return 0;
}

Often I will get

*** Error in `/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client': free(): invalid pointer: 0x00007fce4c447ca8 ***
*** Error in `/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client': free(): invalid pointer: 0x00007fce4c447ca8 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fce4e80f7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fce4e81837a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fce4e81c53c]
/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps10RTPSDomain21removeRTPSParticipantEPNS1_15RTPSParticipantE+0xdb)[0x7fce4c10cb9b]
/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps15ParticipantImplD1Ev+0xa8)[0x7fce4c114b98]
/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps15ParticipantImplD0Ev+0x9)[0x7fce4c114e39]
/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps6Domain17removeParticipantEPNS0_11ParticipantE+0x8a)[0x7fce4c10ef3a]
/opt/ros/ardent/lib/librmw_fastrtps_cpp.so(rmw_destroy_node+0xff)[0x7fce4c66d90f]
/opt/ros/ardent/lib/librcl.so(rcl_node_fini+0x96)[0x7fce4db61f36]
/opt/ros/ardent/lib/librclcpp.so(+0x8329d)[0x7fce4f17d29d]
/opt/ros/ardent/lib/librclcpp.so(_ZN6rclcpp15node_interfaces8NodeBaseD2Ev+0x2a5)[0x7fce4f17d6c5]
/opt/ros/ardent/lib/librclcpp.so(_ZN6rclcpp15node_interfaces8NodeBaseD0Ev+0x9)[0x7fce4f17d6f9]
/opt/ros/ardent/lib/librclcpp.so(_ZN6rclcpp4NodeD2Ev+0x3a2)[0x7fce4f17b6a2]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x4061a9]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x4060fb]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x405f8b]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0x42)[0x4034da]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x402fdd]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x402cde]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x402cfa]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x40274f]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x406304]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x406243]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x406208]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x40612a]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x40607a]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7fce4ee30c80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fce4f3ef6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fce4e89f41d]
======= Memory map: ========
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fce4e80f7e5]
00400000-0040a000 r-xp 00000000 103:07 8793911                           /home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client
00609000-0060a000 r--p 00009000 103:07 8793911                           /home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client
0060a000-0060b000 rw-p 0000a000 103:07 8793911                           /home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client
0184d000-0198b000 rw-p 00000000 00:00 0                                  [heap]
7fcd0dfd4000-7fcd0dfd5000 ---p 00000000 00:00 0 
7fcd0dfd5000-7fcd0e7d5000 rw-p 00000000 00:00 0 
7fcd0e7d5000-7fcd0e7d6000 ---p 00000000 00:00 0 
7fcd0e7d6000-7fcd0efd6000 rw-p 00000000 00:00 0 
7fcd0efd6000-7fcd0efd7000 ---p 00000000 00:00 0 
7fcd0efd7000-7fcd0f7d7000 rw-p 00000000 00:00 0 
7fcd0f7d7000-7fcd0f7d8000 ---p 00000000 00:00 0 
7fcd0f7d8000-7fcd0ffd8000 rw-p 00000000 00:00 0 
7fcd0ffd8000-7fcd0ffd9000 ---p 00000000 00:00 0 
7fcd0ffd9000-7/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fce4e81837a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fce4e81c53c]
/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps10RTPSDomain21removeRTPSParticipantEPNS1_15RTPSParticipantE+0xdb)[0x7fce4c10cb9b]
fcd107d9000 rw-p 00000000 00:00 0 
7fcd107d9000-7fcd107da000 ---p 00000000 00:00 0 
7fcd107da000-7fcd10fda000 rw-p 00000000 00:00 0 
7fcd10fda000-7fcd10fdb000 ---p 00000000 00:00 0 
7fcd10fdb000-7fcd117db000 rw-p 00000000 00:00 0 
7fcd117db000-7fcd117dc000 ---p 00000000 00:00 0 
7fcd117dc000-7fcd11fdc000 rw-p 00000000 00:00 0 
7fcd11fdc000-7fcd11fdd000 ---p 00000000 00:00 0 
7fcd11fdd000-7fcd127dd000 rw-p 00000000 00:00 0 
7fcd127dd000-7fcd127de000 ---p 00000000 00:00 0 
7fcd127de000-7fcd12fde000 rw-p 00000000 00:00 0 
7fcd12fde000-7fcd12fdf000 ---p 00000000 00:00 0 
7fcd12fdf000-7fcd137df000 rw-p 00000000 00:00 0 
7fcd137df000-7fcd137e0000 ---p 00000000 00:00 0 
7fcd137e0000-7fcd13fe0000 rw-p 00000000 00:00 0 
7fcd13fe0000-7fcd13fe1000 ---p 00000000 00:00 0 
7fcd13fe1000-7fcd147e1000 rw-p 00000000 00:00 0 
7fcd147e1000-7fcd147e2000 ---p 00000000 00:00 0 
7fcd147e2000-7fcd14fe2000 rw-p 00000000 00:00 0 
7fcd14fe2000-7fcd14fe3000 ---p 00000000 00:00 0 
7fcd14fe3000-7fcd157e3000 rw-p 00000000 00:00 0 
7fcd157e3/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps15ParticipantImplD0Ev+0x9)[0x7fce4c114e39]
/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps6Domain17removeParticipantEPNS0_11ParticipantE+0x8a)[0x7fce4c10ef3a]
/opt/ros/ardent/lib/librmw_fastrtps_cpp.so(rmw_destroy_node+0xff)[0x7fce4c66d90f]
/opt/ros/ardent/lib/librcl.so(rcl_node_fini+0x96)[0x7fce4db61f36]
/opt/ros/ardent/lib/librclcpp.so(+0x8329d)[0x7fce4f17d29d]
/opt/ros/ardent/lib/librclcpp.so(_ZN6rclcpp15node_interfaces8NodeBaseD2Ev+0x2a5)[0x7fce4f17d6c5]

Sometimes I get:

>>> [rcutils|error_handling.c:155] rcutils_set_error_state()
This error state is being overwritten:

  'string_map already initialized, at /tmp/binarydeb/ros-ardent-rcutils-0.4.0/src/string_map.c:56'

with this new error message:

  'Failed to initialize map for logger severities [string_map already initialized, at /tmp/binarydeb/ros-ardent-rcutils-0.4.0/src/string_map.c:56]. Severities will not be configurable., at /tmp/binarydeb/ros-ardent-rcutils-0.4.0/src/logging.c:108'

rcutils_reset_error() should be called after error handling to avoid this.
<<<
[rcutils|/tmp/binarydeb/ros-ardent-rcl-0.4.0/src/rcl/node.c:112] error initializing logging: Failed to initialize map for logger severities [string_map already initialized, at /tmp/binarydeb/ros-ardent-rcutils-0.4.0/src/string_map.c:56]. Severities will not be configurable., at /tmp/binarydeb/ros-ardent-rcutils-0.4.0/src/logging.c:108

Other times:

terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
  what():  could not create subscription: create_subscriber() could not create subscriber, at /tmp/binarydeb/ros-ardent-rmw-fastrtps-cpp-0.4.0/src/rmw_subscription.cpp:140, at /tmp/binarydeb/ros-ardent-rcl-0.4.0/src/rcl/subscription.c:149

I have been seeing lots of bugs/crashes that seem related to a race condition when there are multiple nodes in a process, but this is the most basic and reproducible example.

@wjwwood
Copy link
Member

wjwwood commented Mar 3, 2018

The short answer is that I would not expect it to be thread-safe. Unfortunately rclcpp hasn't gone through an API review yet and still lacks clarity on a lot of these kinds of details and therefore also lacks documentation on these points.

However, if you look into the functions that rclcpp is ultimately calling in rcl, there are several which are explicitly marked as not thread-safe, e.g. rcl_guard_condition_init/rcl_guard_condition_fini (especially if rcl_trigger_guard_condition is called during rcl_guard_condition_fini) or even just rcl_node_init.

Now, in each case, those rcl functions may be thread-safe in certain circumstances so the solution may not be as drastic a putting a global mutex lock around all the rclcpp versions of those functions, but the time has not yet been taken to figure out where it is thread-safe and where it is not and what to do about it in rclcpp to ensure that it used in a safe way.

There's some work coming up where we will audit the memory and thread behavior of these functions and the underlying rmw functions too, so we'll hopefully be able to provide better behavior and documentation soon. In the meantime we'd certainly like help fixing issues narrowly where ever you find it convenient.

@wjwwood wjwwood added bug Something isn't working help wanted Extra attention is needed labels Mar 3, 2018
@sagniknitr
Copy link
Contributor

@wjwwood Is there any chance it is caused due to deadlock in Fast-RTPS threads ? It seems there is a issue on Fast RTPS (eProsima/Fast-DDS#190) which may relate to this.

@matt-attack
Copy link
Contributor Author

@sagniknitr I have seen that issue come up as well, so it's very possible its at least related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants