Nodelets not tear down when nodelet manager receives SIGINT #55

mikaelarguedas · 2017-03-09T13:29:46Z

Reposting this as a standalone issue here . Thanks @dseifert for reporting this).

To test this, prepare the following:

add a destructor to nodelet_core/test_nodelet/src/plus.cpp that has a ROS_INFO (or higher) output

Create this launch file:
<launch>
 <node pkg="nodelet" type="nodelet" name="nodelet_manager"  args="manager" output="screen"/>

 <node pkg="nodelet" type="nodelet" name="n1" output="screen" args="load test_nodelet/Plus nodelet_manager" />
</launch>
Now, perform these tests:

Test 1: run the launch file; use ps xa | grep nodelet to figure out the process ID of the nodelet load command and kill -SIGINT it ... you will see that the destructor is called

Test 2: run the launch file, use ps xa | grep nodelet to figure out the process ID of the nodelet manager command and kill -SIGINT it ... you will see that the destructor is NOT called

The text was updated successfully, but these errors were encountered:

andviane · 2019-02-04T16:00:49Z

Confirmed. We have the server thread that must be properly shutdown in destructor (seems the recommended practice). By placing the log statement there we discovered that the destructor does not run while killing the application with CTRL-C.

jpapon · 2019-09-21T22:42:41Z

I'm experiencing what I believe is a related issue in ros-melodic.
With various camera drivers (realsense-ros, libuvc_ros, avt_camera, etc...), when using nodelet mangers with many nodelets in them, I am unable to relaunch pipelines on an already started roscore.
So this does not work:

roscore
In second terminal session, roslaunch my pipeline which has 10-20 nodelets in the nodelet manager, including a camera nodelet.
ctrl-c
roslaunch pipeline second time.
The second roslaunch always results in a hang or a crash (depending on the camera/pipeline). This seems to be because things aren't unloading properly.
Killing and restarting the roscore allows me to run the pipeline again (once).

On the other hand, if I don't have a separate roscore running, I can kill/relaunch the pipeline without issue.

Is anyone aware of a workaround other than not having a separate roscore? Having to restart roscore can be problematic in larger distributed systems.

tompe17 · 2019-12-02T16:19:27Z

I see a similar issue to what jpapon describes.

I have a launch files that starts 350 nodes or nodelets. And if I do not restart the roscore when re-running the launch file I get problem with nodes (are nodelets) being killed because name duplication. But the different names should be in different namespace so that should not be an issue.

If I let the launch file start the roscore then it works without any problems.

rosnode list

does not show any nodes left after I have killed the launch file. And ps doesn not show any hanging processes. rosnode cleanup did not help.

doronhi · 2020-01-01T08:31:01Z

I have the same issue with realsense2_camera (realsense-ros).
After running roslaunch realsense2_camera rs_camera.launch,
The command rosnode kill /camera/realsense2_camera triggers the nodelet's destructor while rosnode kill /camera/realsense2_camera_manager and "Ctrl-C" do not.

YoshuaNava · 2020-07-29T11:25:23Z

Hi all,
In my view this could be happening because when trying to shut down the nodelet

We might be breaking the bond with ROS before the components are unloaded
There is no ros::Time available to leave waits like the one for service calls to shutdown: https://github.com/ros/nodelet_core/blob/indigo-devel/nodelet/src/nodelet.cpp#L192

I think the second point would be easy to verify, by adding a ros::Time::shutdown(); to ensure that ros time users stop waiting.

Further explanation

Using this as reference: http://docs.ros.org/diamondback/api/roscpp/html/init_8cpp_source.html

Request shutdown sets a global variable inside the node class that should smoothly shut down a node in the next iteration of spin. (Line 140) That request is serviced later by the Poll Manager object of the node, in an asynchronous manner.

Shutdown actually shuts down the queues, time loggers and connections of a node. (Line 519)

However, shutdown might not advance to shut down time, which is required to stop nodes that use ros::Rate and use custom signal handlers and threads. In this case you have to manually shut down the time tracking instance.

In both cases there are recursive mutexes to prevent that multiple shutdown calls crash the node.

LucasWaelti · 2021-02-15T10:15:09Z

A possible work around would be to kill the nodelet manager. For instance, assume the following nodes are running but need to be restarted:

/stereo/stereo_cam_nodelet
/stereo/stereo_nodelet_manager

By running the command:

rosnode kill /stereo/stereo_nodelet_manager

all associated nodelets are killed and can then be restarted. Killing /stereo/stereo_cam_nodelet first prevents to restart the nodelets however.

mikaelarguedas added the bug label Mar 9, 2017

mikaelarguedas mentioned this issue Mar 9, 2017

Nodelet unloading broken #50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodelets not tear down when nodelet manager receives SIGINT #55

Nodelets not tear down when nodelet manager receives SIGINT #55

mikaelarguedas commented Mar 9, 2017

andviane commented Feb 4, 2019 •

edited

Loading

jpapon commented Sep 21, 2019

tompe17 commented Dec 2, 2019

doronhi commented Jan 1, 2020

YoshuaNava commented Jul 29, 2020 •

edited

Loading

LucasWaelti commented Feb 15, 2021

Nodelets not tear down when nodelet manager receives SIGINT #55

Nodelets not tear down when nodelet manager receives SIGINT #55

Comments

mikaelarguedas commented Mar 9, 2017

andviane commented Feb 4, 2019 • edited Loading

jpapon commented Sep 21, 2019

tompe17 commented Dec 2, 2019

doronhi commented Jan 1, 2020

YoshuaNava commented Jul 29, 2020 • edited Loading

LucasWaelti commented Feb 15, 2021

andviane commented Feb 4, 2019 •

edited

Loading

YoshuaNava commented Jul 29, 2020 •

edited

Loading