Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

it takes too long to terminate when using component_container_isolated #2083

Closed
dujeong opened this issue Jan 16, 2023 · 7 comments · Fixed by #2085
Closed

it takes too long to terminate when using component_container_isolated #2083

dujeong opened this issue Jan 16, 2023 · 7 comments · Fixed by #2085

Comments

@dujeong
Copy link

dujeong commented Jan 16, 2023

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04
  • Installation type:
    • binaries (humble)
  • Version or commit hash:
  • DDS implementation:
    • cyclonedds
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

I set use_composition to "True" to run the navigation stack nodes in one process and it worked.

after that, I typed ctrl+c to exit the process. but it takes too long to terminate(more than 10 seconds), and it does not terminate properly sometimes.

Below is part of the launch file I ran.

......

nav2_container = Node(
        condition=IfCondition(use_composition),
        package='rclcpp_components',
        executable='component_container_isolated',
        name=container_name,
        output='screen',
        respawn=use_respawn,
        respawn_delay=2.0,
        parameters=[configured_params_for_compositon, {'autostart': autostart}],
        remappings=remappings),
            
......

load_composable_nodes = LoadComposableNodes(
        condition=IfCondition(use_composition),
        target_container=container_name, #nav2_container
        composable_node_descriptions=[
            ComposableNode(
                package='nav2_controller',
                plugin='nav2_controller::ControllerServer',
                name='controller_server',
                parameters=[configured_params],
                remappings=[('cmd_vel', 'cmd_vel_nav')] + remappings),

            ComposableNode(
                package='nav2_smoother',
                plugin='nav2_smoother::SmootherServer',
                name='smoother_server',
                parameters=[configured_params],
                remappings=remappings),

            ComposableNode(
                package='nav2_planner',
                plugin='nav2_planner::PlannerServer',
                name='planner_server',
                parameters=[configured_params],
                remappings=remappings),

            ComposableNode(
                package='nav2_behaviors',
                plugin='behavior_server::BehaviorServer',
                name='behavior_server',
                parameters=[configured_params],
                remappings=remappings),

            ComposableNode(
                package='nav2_waypoint_follower',
                plugin='nav2_waypoint_follower::WaypointFollower',
                name='waypoint_follower',
                parameters=[configured_params],
                remappings=remappings),

            ComposableNode(
                package='nav2_bt_navigator',
                plugin='nav2_bt_navigator::BtNavigator',
                name='bt_navigator',
                parameters=[configured_params],
                remappings=remappings),

            ComposableNode(
                package='nav2_velocity_smoother',
                plugin='nav2_velocity_smoother::VelocitySmoother',
                name='velocity_smoother',
                parameters=[configured_params],
                remappings=[('cmd_vel', 'cmd_vel_nav'),
                            ('cmd_vel_smoothed', 'cmd_vel')] + remappings),

            ComposableNode(
                package='nav2_lifecycle_manager',
                plugin='nav2_lifecycle_manager::LifecycleManager',
                name='lifecycle_manager_navigation',
                parameters=[{'use_sim_time': use_sim_time,
                            'autostart': autostart,
                            'node_names': lifecycle_nodes}],
                remappings=remappings),
        ],
    )
    
.......

Below is the log when component container ends. (actually I use two container node, one is for navigation stack, and the other is for my custom node)

[ERROR] [component_container_isolated-13]: process[component_container_isolated-13] failed to terminate '5' seconds after receiving 'SIGINT', escalating to 'SIGTERM'
[ERROR] [component_container_isolated-7]: process[component_container_isolated-7] failed to terminate '5' seconds after receiving 'SIGINT', escalating to 'SIGTERM'
[INFO] [component_container_isolated-13]: sending signal 'SIGTERM' to process[component_container_isolated-13]
[INFO] [component_container_isolated-7]: sending signal 'SIGTERM' to process[component_container_isolated-7]
[component_container_isolated-13] [INFO 1673835779.167809326] [rclcpp]: signal_handler(signum=15) (./src/rclcpp/signal_handler.cpp:71)
[component_container_isolated-7] [INFO 1673835779.169543075] [rclcpp]: signal_handler(signum=15) (./src/rclcpp/signal_handler.cpp:71)
[ERROR] [component_container_isolated-13]: process[component_container_isolated-13] failed to terminate '10.0' seconds after receiving 'SIGTERM', escalating to 'SIGKILL'
[ERROR] [component_container_isolated-7]: process[component_container_isolated-7] failed to terminate '10.0' seconds after receiving 'SIGTERM', escalating to 'SIGKILL'
[INFO] [component_container_isolated-13]: sending signal 'SIGKILL' to process[component_container_isolated-13]
[INFO] [component_container_isolated-7]: sending signal 'SIGKILL' to process[component_container_isolated-7]
[ERROR] [component_container_isolated-7]: process has died [pid 22074, exit code -9, cmd '/opt/ros/humble/lib/rclcpp_components/component_container_isolated --ros-args -r __node:=perception_container -r __ns:=/'].
[ERROR] [component_container_isolated-13]: process has died [pid 22086, exit code -9, cmd '/opt/ros/humble/lib/rclcpp_components/component_container_isolated --ros-args -r __node:=nav_container -r __ns:=/ --params-file /tmp/tmpf4tgn9jt --params-file /tmp/launch_params_ei4e54qx -r /tf:=tf -r /tf_static:=tf_static'].

On the other hand, when using component_container instead of component_container_isolated, it is almost finished immediately.

Is it normal to take a long time to end when using component_container_isolated?

@SteveMacenski
Copy link
Collaborator

SteveMacenski commented Jan 17, 2023

Is this a problem with the container or Nav2? Do you have this issue if you use different components in the component_container_isolated container not in Nav2? I agree that I also see this issue in Nav2 and just haven't had the cycles yet to address it, but its been coming up more often as a cause of potentially other subtle problems so its quickly coming up on my priority list.

@mjcarroll
Copy link
Member

I actually believe this may be an issue with component_container_isolated as I have seen it in a non-navigation2 related test case

Since it is related to something I'm working on, I'll see if I can get a cleaner reproduction.

@dujeong
Copy link
Author

dujeong commented Jan 18, 2023

I also had the same problem when using non-Nav2 component. I am grateful that you are concerned about this matter.
I'll continue to watch the progress with interest. Thanks.

@mjcarroll
Copy link
Member

mjcarroll commented Jan 18, 2023

I now have a minimal reproduction example using a component with a single timer and publication (https://github.com/safe-ros/reference_system/blob/437dd76e4e61d9cb446bd9a55955d12a808be8ba/src/publisher_nodes.cpp#L22-L27)

The trace with component_container_isolated is:

[INFO] [1674053025.958086651] [rclcpp]: signal_handler(signum=2)
[DEBUG] [1674053025.958145773] [rclcpp]: signal_handler(): notifying deferred signal handler
[DEBUG] [1674053025.958158777] [rclcpp]: deferred_signal_handler(): woken up due to SIGINT/SIGTERM or uninstall
[DEBUG] [1674053025.958171037] [rclcpp]: deferred_signal_handler(): shutting down
[DEBUG] [1674053025.958294236] [rclcpp]: deferred_signal_handler(): shutting down rclcpp::Context @ 0x5555555e1670, because it had shutdown_
on_signal == true
[DEBUG] [1674053025.958301675] [rcl]: Calling timer
[DEBUG] [1674053025.958304635] [rcl]: Shutting down ROS client library, for context at address: 0x5555555733b0
[DEBUG] [1674053025.958354320] [rcl]: Finalizing publisher
[Thread 0x7fffd3a25640 (LWP 2029602) exited]
[DEBUG] [1674053025.958698169] [rcl]: Publisher finalized
[DEBUG] [1674053025.958706397] [rcl]: Finalizing publisher
[DEBUG] [1674053025.958852691] [rcl]: Publisher finalized
[DEBUG] [1674053025.958884440] [rclcpp]: deferred_signal_handler(): waiting for SIGINT/SIGTERM or uninstall

Where component_container is:

[DEBUG] [1674052861.825278151] [rclcpp]: signal_handler(): notifying deferred signal handler
[DEBUG] [1674052861.825376426] [rclcpp]: deferred_signal_handler(): woken up due to SIGINT/SIGTERM or uninstall
[DEBUG] [1674052861.825459703] [rclcpp]: deferred_signal_handler(): shutting down
[DEBUG] [1674052861.825417176] [rcl]: Calling timer
[DEBUG] [1674052861.825647042] [rclcpp]: deferred_signal_handler(): shutting down rclcpp::Context @ 0x5555555da670, because it had shutdown_
on_signal == true
[DEBUG] [1674052861.825721537] [rcl]: Shutting down ROS client library, for context at address: 0x55555556c3b0
[DEBUG] [1674052861.825793344] [rcl]: Finalizing publisher
[DEBUG] [1674052861.825798820] [ComponentManager]: Removing components from executor
[DEBUG] [1674052861.825851324] [rcl]: Finalizing service
[DEBUG] [1674052861.826746300] [rcl]: Publisher finalized
[DEBUG] [1674052861.826788352] [rcl]: Finalizing publisher
[DEBUG] [1674052861.828644147] [rcl]: Service finalized
[DEBUG] [1674052861.828686944] [rcl]: Finalizing service
[DEBUG] [1674052861.829173980] [rcl]: Publisher finalized
[DEBUG] [1674052861.829281071] [rclcpp]: deferred_signal_handler(): waiting for SIGINT/SIGTERM or uninstall
[DEBUG] [1674052861.830552697] [rcl]: Service finalized
[DEBUG] [1674052861.830589553] [rcl]: Finalizing service
[DEBUG] [1674052861.832177938] [rcl]: Service finalized
[DEBUG] [1674052861.832232938] [rcl]: Timer canceled
[DEBUG] [1674052861.832260656] [rcl]: Timer canceled
[DEBUG] [1674052861.832318331] [rcl]: Finalizing event
[DEBUG] [1674052861.832344074] [rcl]: Event finalized
[DEBUG] [1674052861.832366763] [rcl]: Finalizing publisher
[DEBUG] [1674052861.833195265] [rcl]: Publisher finalized
[DEBUG] [1674052861.833258483] [rcl]: Finalizing event
[DEBUG] [1674052861.833280894] [rcl]: Event finalized
[DEBUG] [1674052861.833311577] [rcl]: Finalizing subscription
[DEBUG] [1674052861.834148656] [rcl]: Subscription finalized
[DEBUG] [1674052861.834202086] [rcl]: Finalizing service
[DEBUG] [1674052861.837573573] [rcl]: Service finalized                                                                             
[DEBUG] [1674052861.837610288] [rcl]: Finalizing service
[DEBUG] [1674052861.838504391] [rcl]: Service finalized
[DEBUG] [1674052861.838539905] [rcl]: Finalizing service
[DEBUG] [1674052861.839156223] [rcl]: Service finalized
[DEBUG] [1674052861.839190487] [rcl]: Finalizing service
[DEBUG] [1674052861.839621913] [rcl]: Service finalized
[DEBUG] [1674052861.839661069] [rcl]: Finalizing event
[DEBUG] [1674052861.839691382] [rcl]: Event finalized
[DEBUG] [1674052861.839720585] [rcl]: Finalizing publisher
[DEBUG] [1674052861.840022292] [rcl]: Publisher finalized
[DEBUG] [1674052861.840115628] [rcl]: Finalizing node
[DEBUG] [1674052861.840257508] [rcl]: Node finalized
[DEBUG] [1674052861.906052625] [rcl]: Finalizing event
[DEBUG] [1674052861.906064873] [rcl]: Event finalized
[DEBUG] [1674052861.906071833] [rcl]: Finalizing subscription
[DEBUG] [1674052861.906225742] [rcl]: Subscription finalized
[DEBUG] [1674052861.906245334] [rcl]: Finalizing node
[Thread 0x7ffff160c640 (LWP 2029272) exited]
[Thread 0x7ffff4779640 (LWP 2029266) exited]
[Thread 0x7ffff3f78640 (LWP 2029267) exited]
[Thread 0x7ffff4f7a640 (LWP 2029265) exited]
[Thread 0x7ffff3777640 (LWP 2029268) exited]
[Thread 0x7ffff2775640 (LWP 2029270) exited]
[Thread 0x7ffff2f76640 (LWP 2029269) exited]
[Thread 0x7ffff1e94640 (LWP 2029271) exited]
[DEBUG] [1674052861.907536170] [rcl]: Node finalized
[Thread 0x7ffff5802640 (LWP 2029264) exited]
[DEBUG] [1674052862.008138456] [rclcpp]: SignalHandler::uninstall(): notifying deferred signal handler
[DEBUG] [1674052862.008265901] [rclcpp]: deferred_signal_handler(): woken up due to SIGINT/SIGTERM or uninstall
[DEBUG] [1674052862.008305800] [rclcpp]: deferred_signal_handler(): signal handling uninstalled
[DEBUG] [1674052862.008501813] [rclcpp]: signal handler uninstalled
[Thread 0x7ffff6003640 (LWP 2029263) exited]
[Thread 0x7ffff6804640 (LWP 2029262) exited]
[Inferior 1 (process 2029259) exited normally]

@fujitatomoya
Copy link
Collaborator

@iuhilnehc-ynos can you take a look at this, if you have time? (CC: @Barry-Xu-2018 )

@Barry-Xu-2018
Copy link
Collaborator

@fujitatomoya We will enter Chinese New Year holiday from tomorrow. So we will continue to investigate it after holiday.

@mjcarroll
Copy link
Member

With a bit more debugging, I believe I have isolated it to a difference in the way that the deferred_signal_handler works between the isolated and non-isolated component containers.

In the isolated case, we end up blocking on a sem_wait and the semaphore never gets posted to because Signalhandler::uninstall never seems to get called.

A trace with more points added: https://gist.github.com/mjcarroll/a83c9b8428d198a2a7b753b19446f593

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants