Connext is very slow to shutdown #325

clalancette · 2018-12-14T20:00:20Z

Bug report

Required Info:

Operating System:
- Ubuntu 16.04 AMD64
Installation type:
- Source
Version or commit hash:
- Release versions of crystal, from Create crystal release repos file. ros2#624
DDS implementation:
- RTI Connext
Client library (if applicable):
- rclpy-ish

Steps to reproduce issue

RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run demo_nodes_cpp talker
(hit Ctrl-C here)

Expected behavior

Talker starts up, publishes some data, then quickly goes away when the user hits Ctrl-C.

Actual behavior

Talker starts up, publishes some data, then takes at least 3 seconds (sometimes longer) to go away after the user hits Ctrl-C.

Additional information

This seems to get worse with the number of nodes in the process. For the composition demos, for instance, it almost seems like it takes 3-5 seconds for each node loaded into the process.

The text was updated successfully, but these errors were encountered:

jwillemsen · 2018-12-17T10:53:24Z

We have seen this also when testing with AXCIOMA, deleting a domain participant can take some time with RTI Connext DDS

clalancette · 2020-08-20T17:26:38Z

From @brawner:

The rclcpp unit tests take significantly longer for rmw_connext_cpp than other middleware implementations.

Compare ConnextDDS unit tests with rclcpp (11 minutes)
http://build.ros2.org/view/Rci/job/Rci__nightly-connext_ubuntu_focal_amd64/32/testReport/rclcpp/

vs FastRTPS (1 min 30 s):
http://build.ros2.org/view/Rci/job/Rci__nightly-fastrtps_ubuntu_focal_amd64/34/testReport/rclcpp/

I've narrowed this down to the destruction of the rclcpp::Node, a call to destruct a waitable takes the vast majority of this time. I believe this is happening in AllocatorMemoryStrategy, but since it uses shared_ptrs, the actual destruct may be elsewhere.
If I put steady_clock measurements around this line, or even if I iterate through the vector and call reset explicitly, there is one waitable that will consume significant amounts of time to delete. This isn't for all waitables, just one default one that I can't figure out where it's coming from.

clalancette · 2020-09-03T20:10:43Z

@brawner I've assigned you to this for any follow-up work to determine exactly where the destruction slowness is coming from. If it turns out to be in Connext code, and there is nothing we can do about it, then I'll suggest opening a Known Issue on https://index.ros.org/doc/ros2/Releases/Release-Rolling-Ridley/ (and maybe the other distributions as well).

brawner · 2020-10-21T01:11:53Z

It appears deleting the participant is the main hangup here. This line takes all 3.18 seconds of node destruction.

rmw_connext/rmw_connext_shared_cpp/src/node.cpp

Line 486 in e83bb12

DDS::ReturnCode_t local_ret = dpf_->delete_participant(participant);

A couple of forum posts, but I'm not sure they help me understand the issue any better.
https://community.rti.com/forum-topic/ddstheparticipantfactory-deleteparticipant-hangs
https://community.rti.com/forum-topic/deleteparticipant-never-returns

clalancette · 2020-10-21T12:37:26Z

A couple of forum posts, but I'm not sure they help me understand the issue any better.
https://community.rti.com/forum-topic/ddstheparticipantfactory-deleteparticipant-hangs
https://community.rti.com/forum-topic/deleteparticipant-never-returns

There's one thing in there that is potentially helpful; it seems like the delays can come because delete_participant is waiting for threads to complete. If we can figure out how to make those threads complete faster, then it could be speeded up.

However, since we don't have the source to Connext, figuring that out may be tricky.

neil-rti · 2020-10-27T20:56:25Z

Hi Chris - I'm not certain if this has already been explored, but there is a QoS setting for shutdown period:
https://community.rti.com/forum-topic/3-seconds-shutdown-domainparticipant
has the Q&A and links to the QoS documentation.

Resolves #184. With some RMW implementations, like rmw_connext_cpp, it takes a bit longer to initialize or shutdown ROS nodes. Related issue: ros2/rmw_connext#325 Signed-off-by: Jacob Perron <jacob@openrobotics.org>

clalancette · 2020-10-28T13:49:35Z

Hi Chris - I'm not certain if this has already been explored, but there is a QoS setting for shutdown period:
https://community.rti.com/forum-topic/3-seconds-shutdown-domainparticipant
has the Q&A and links to the QoS documentation.

I didn't know about that one. Thanks for the heads-up, I'll give it a try and see if it makes a difference to this problem.

Resolves #184. With some RMW implementations, like rmw_connext_cpp, it takes a bit longer to initialize or shutdown ROS nodes. Related issue: ros2/rmw_connext#325 Signed-off-by: Jacob Perron <jacob@openrobotics.org>

clalancette added the enhancement New feature or request label Dec 19, 2018

clalancette mentioned this issue Aug 20, 2020

[rmw_connext_cpp] Destructing node, specifically a default waitable, is slow (> 3s) for rmw_connext_cpp ros2/rclcpp#1250

Closed

brawner mentioned this issue Aug 27, 2020

[foxy backport] Check allocation of requester and replier (#60) ros2/rosidl_typesupport_connext#63

Merged

clalancette assigned brawner Sep 3, 2020

brawner mentioned this issue Oct 21, 2020

Initial benchmark tests for rclcpp::init/shutdown create/destroy node ros2/rclcpp#1411

Merged

brawner mentioned this issue Oct 28, 2020

Increase test_composable_node_container timeout ros2/launch_ros#195

Merged

clalancette mentioned this issue Oct 29, 2020

Reduce the shutdown_cleanup_period. #473

Merged

clalancette closed this as completed in #473 Nov 16, 2020

asorbini mentioned this issue Mar 8, 2021

Add support for rmw_connextdds ros2/rclcpp#1574

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connext is very slow to shutdown #325

Connext is very slow to shutdown #325

clalancette commented Dec 14, 2018

jwillemsen commented Dec 17, 2018

clalancette commented Aug 20, 2020

clalancette commented Sep 3, 2020

brawner commented Oct 21, 2020

clalancette commented Oct 21, 2020

neil-rti commented Oct 27, 2020

clalancette commented Oct 28, 2020

Connext is very slow to shutdown #325

Connext is very slow to shutdown #325

Comments

clalancette commented Dec 14, 2018

Bug report

Steps to reproduce issue

Expected behavior

Actual behavior

Additional information

jwillemsen commented Dec 17, 2018

clalancette commented Aug 20, 2020

clalancette commented Sep 3, 2020

brawner commented Oct 21, 2020

clalancette commented Oct 21, 2020

neil-rti commented Oct 27, 2020

clalancette commented Oct 28, 2020