Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage even at idle #1015

Closed
ruffsl opened this issue Aug 7, 2019 · 23 comments
Closed

High CPU usage even at idle #1015

ruffsl opened this issue Aug 7, 2019 · 23 comments
Assignees
Labels
1 - High High Priority performance This is a performance problem or enhancement
Milestone

Comments

@ruffsl
Copy link
Member

ruffsl commented Aug 7, 2019

Bug report

The security working group has been creating a SROS2 demo targeting the turtlebot 3 with a running navigation stack. However our meager mortal laptops @mikaelarguedas and I have (~2017 Dell XPS i7 32GB) are struggling to run the tb3 demo while maintaining a usable desktop, as about about 40% of our CPU is under load even when the robot is not planning or moving:

image

I fear many participants in the workshop with lesser machines will struggle to run the demo with added security cipher overhead or unavoidable virtual machines on mac/windows as workshops happen to do when linux is involved. Is there anything we can do to lighten this up, like rate limit spinning in dwb_controler, amcl, map_model, or patches we can pull into our .repo file targeting dashing?

https://github.com/ros-swg/turtlebot3_demo

Required Info:

Steps to reproduce issue

See current tutorial README.md as of writing:
https://github.com/ros-swg/turtlebot3_demo/tree/ccc63b8462a714f6f870db97ae8797d167522463

Expected behavior

Ideal load for the navigation stack when not moving would be low like in ROS1, mostly dominated by gzviewer and rviz if not running in headless mode.

Actual behavior

The processes for dwb_controler, amcl, map_model take the lion's share of CPU, even with the robot is localized and not moving. Note that this is even without security or gzclient enabled.

image

Additional information

The above is just from launching these two launch files

ros2 launch turtlebot3_gazebo turtlebot3_world.launch.py
ros2 launch turtlebot3_navigation2 navigation2.launch.py use_sim_time:=true

Where gzclient can be disabled in navigation2.launch.py via:

-            cmd=['gazebo', '--verbose', world, '-s', 'libgazebo_ros_init.so'],
+            cmd=['gzserver', '--verbose', world, '-s', 'libgazebo_ros_init.so'],
@SteveMacenski
Copy link
Member

SteveMacenski commented Aug 7, 2019

@ruffsl are you using any custom config files or just the defaults?

If I'm reading this right: you're seeing 12% CPU usage for the world model (global costmap), DWB controller (which also is running a costmap), and AMCL (woah, that's shouldn't be even remotely that high). Frankly 12% CPU seems about right for the costmap items since that's spinning regardless if anyone's using it. AMCL shouldn't be 12% ever.

Any other errors/warnings like inability to transform laser scans? Particularly coming from AMCL?

Also is this on dashing or bleed?

Is this running with the encryption going or just normal?

@mikaelarguedas
Copy link
Contributor

Also is this on dashing or bleed?

We're currently targeting dashing (with some tiny config changes to avoid errors in RViz dashing-devel...mikaelarguedas:mikael/dashing-sync).

Any other errors/warnings like inability to transform laser scans? Particularly coming from AMCL?

I am seeing a bunch of dropped message and tf errors still:

[amcl-3] [INFO] [amcl_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 67.401 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 71.601 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: . canTransform returned after 1.001 timeout was 1.

Is it a common set of error messages or are we doing something bad somewhere?

Currently looking at ros2/geometry2#146 and ros2/rviz#375 to see if these improve things.

Is this running with the encryption going or just normal?

I believe these results are just normal, with encryption CPU loads gets significantly higher on my machine

are you using any custom config files or just the defaults?

We're using these parameters: https://github.com/ROBOTIS-GIT/turtlebot3/blob/ros2/turtlebot3_navigation2/param/burger_params.yaml

The diff from the ones from navigation2@master:

$ diff burger_params.yaml nav2_params.yaml 
60c60
<     max_vel_x: 0.22
---
>     max_vel_x: 0.26
64c64
<     max_speed_xy: 0.22
---
>     max_speed_xy: 0.26
86,87c86,87
<     PathAlign.scale: 32.0
<     GoalAlign.scale: 24.0
---
>     PathAlign.scale: 0.0
>     GoalAlign.scale: 0.0
103c103
<       robot_radius: 0.105
---
>       robot_radius: 0.22
125c125
<       robot_radius: 0.105
---
>       robot_radius: 0.22
145c145
<     yaml_filename: "map.yaml"
---
>     yaml_filename: "turtlebot3_world.yaml"

@ruffsl
Copy link
Member Author

ruffsl commented Aug 7, 2019

@ruffsl are you using any custom config files or just the defaults?

You can see our repos file we are using in the Version or commit hash bullet above, as we have a few commits optop of 9cbda50 for changing use_sim_time, unreliable transport, and tweaking the rviz config view. Currently: https://github.com/mikaelarguedas/navigation2/commits/mikael/dashing-sync

Frankly 12% CPU seems about right for the costmap items since that's spinning regardless if anyone's using it.

I'd still argue even that's aggressive, could that be toned down as well by shrinking the local cost map area?

Any other errors/warnings like inability to transform laser scans? Particularly coming from AMCL?

We see lot of info messages from world_model:

[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 179.203 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 179.402 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 179.603 for reason(0)

Here is an capture from navigation2.launch.py, while successfully localizing and navigating:

[INFO] [launch]: All log files can be found below /home/ruffsl/.ros/log/2019-08-07-14-33-32-485199-dox-5447
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [lifecycle_manager-1]: process started with pid [5466]
[INFO] [map_server-2]: process started with pid [5467]
[INFO] [amcl-3]: process started with pid [5468]
[INFO] [world_model-4]: process started with pid [5469]
[INFO] [dwb_controller-5]: process started with pid [5470]
[INFO] [navfn_planner-6]: process started with pid [5477]
[INFO] [recoveries_node-7]: process started with pid [5484]
[INFO] [bt_navigator-8]: process started with pid [5499]
[INFO] [rviz2-9]: process started with pid [5511]
[lifecycle_manager-1] [INFO] [lifecycle_manager]: Creating
[map_server-2] [INFO] [map_server]: Creating
[world_model-4] [INFO] [world_model]: Creating
[lifecycle_manager-1] [INFO] [lifecycle_manager]: �[34m�[1mStarting the system bringup...�[0m�[0m
[lifecycle_manager-1] [INFO] [lifecycle_manager]: �[34m�[1mCreating and initializing lifecycle service clients�[0m�[0m
[amcl-3] [INFO] [amcl]: Creating
[lifecycle_manager-1] [INFO] [lifecycle_manager]: �[34m�[1mConfiguring and activating map_server�[0m�[0m
[dwb_controller-5] [INFO] [dwb_controller]: Creating
[dwb_controller-5] [INFO] [local_costmap.local_costmap]: Creating
[world_model-4] [INFO] [global_costmap.global_costmap]: Creating
[recoveries_node-7] [INFO] [recoveries]: Configuring Spin
[recoveries_node-7] [WARN] [rcl.logging_rosout]: Publisher already registered for provided node name. If this is due to multiple nodes with the same name then all logs for that logger name will go out over the existing publisher. As soon as any node with that name is destructed it will unregister the publisher, preventing any further logs for that name from being published on the rosout topic.
[recoveries_node-7] [INFO] [recoveries]: Configuring BackUp
[navfn_planner-6] [INFO] [navfn_planner]: Creating
[bt_navigator-8] [INFO] [bt_navigator]: Creating
[map_server-2] [INFO] [map_server]: Configuring
[map_server-2] [INFO] [map_server]: OccGridLoader: Creating
[map_server-2] [INFO] [map_server]: OccGridLoader: Configuring
[map_server-2] [WARN] [map_server]: Mode parameter not set, using default value (trinary): bad conversion
[map_server-2] [INFO] [map_server]: Activating
[map_server-2] [INFO] [map_server]: OccGridLoader: Activating
[lifecycle_manager-1] [INFO] [lifecycle_manager]: �[34m�[1mConfiguring and activating amcl�[0m�[0m
[amcl-3] [INFO] [amcl]: Configuring
[amcl-3] [INFO] [amcl]: initTransforms
[amcl-3] [WARN] [rcl.logging_rosout]: Publisher already registered for provided node name. If this is due to multiple nodes with the same name then all logs for that logger name will go out over the existing publisher. As soon as any node with that name is destructed it will unregister the publisher, preventing any further logs for that name from being published on the rosout topic.
[amcl-3] [INFO] [amcl]: initPubSub
[amcl-3] [INFO] [amcl]: Subscribed to map topic.
[amcl-3] [INFO] [amcl]: Activating
[lifecycle_manager-1] [INFO] [lifecycle_manager]: �[34m�[1mConfiguring and activating world_model�[0m�[0m
[world_model-4] [INFO] [world_model]: Configuring
[world_model-4] [INFO] [global_costmap.global_costmap]: Configuring
[world_model-4] [INFO] [global_costmap.global_costmap]: Using plugin "static_layer"
[world_model-4] [INFO] [global_costmap.global_costmap]: Using plugin "obstacle_layer"
[world_model-4] [INFO] [global_costmap.global_costmap]: Subscribed to Topics: scan
[world_model-4] [INFO] [global_costmap.global_costmap]: Using plugin "inflation_layer"
[world_model-4] [INFO] [global_costmap.global_costmap]: StaticLayer: Resizing costmap to 384 X 384 at 0.050000 m/pix
[world_model-4] [INFO] [world_model]: Activating
[world_model-4] [INFO] [global_costmap.global_costmap]: Activating
[world_model-4] [INFO] [global_costmap.global_costmap]: Checking transform
[rviz2-9] [INFO] [rviz2]: Stereo is NOT SUPPORTED
[rviz2-9] [INFO] [rviz2]: OpenGl version: 4.6 (GLSL 4.6)
[world_model-4] [INFO] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: . canTransform returned after 1.048 timeout was 1.
[rviz2-9] [INFO] [rviz2]: Stereo is NOT SUPPORTED
[rviz2-9] [WARN] [rcl.logging_rosout]: Publisher already registered for provided node name. If this is due to multiple nodes with the same name then all logs for that logger name will go out over the existing publisher. As soon as any node with that name is destructed it will unregister the publisher, preventing any further logs for that name from being published on the rosout topic.
[rviz2-9] Parsing robot urdf xml string.
[amcl-3] [WARN] [amcl]: Waiting for map....
[world_model-4] [INFO] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: . canTransform returned after 1.089 timeout was 1.
[world_model-4] [INFO] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: . canTransform returned after 1.003 timeout was 1.
[rviz2-9] [INFO] [rviz2]: Creating 1 swatches_
[rviz2-9] [ERROR] [rviz2]: InvalidParametersException: Stream size does not match calculated image size in Image::loadRawData at /tmp/binarydeb/ros-dashing-rviz-ogre-vendor-6.1.2/obj-x86_64-linux-gnu/ogre-master-ca665a6-prefix/src/ogre-master-ca665a6/OgreMain/src/OgreImage.cpp (line 283)
[rviz2-9] [ERROR] [rviz2]: Failed to create 1 swatches_
[rviz2-9] [INFO] [rviz2]: Creating 2 swatches_
[rviz2-9] [ERROR] [rviz2]: InvalidParametersException: Stream size does not match calculated image size in Image::loadRawData at /tmp/binarydeb/ros-dashing-rviz-ogre-vendor-6.1.2/obj-x86_64-linux-gnu/ogre-master-ca665a6-prefix/src/ogre-master-ca665a6/OgreMain/src/OgreImage.cpp (line 283)
[rviz2-9] [ERROR] [rviz2]: Failed to create 2 swatches_
[rviz2-9] [INFO] [rviz2]: Creating 4 swatches_
[amcl-3] [WARN] [amcl]: Waiting for map....
[world_model-4] [INFO] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: . canTransform returned after 1.003 timeout was 1.
[world_model-4] [INFO] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: . canTransform returned after 1.002 timeout was 1.
[amcl-3] [WARN] [amcl]: Waiting for map....
[world_model-4] [INFO] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: . canTransform returned after 1.005 timeout was 1.
[world_model-4] [INFO] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: . canTransform returned after 1.003 timeout was 1.
[amcl-3] [WARN] [amcl]: Waiting for map....
[world_model-4] [INFO] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: . canTransform returned after 1.005 timeout was 1.
[rviz2-9] [INFO] [rviz2]: Setting goal: Frame:map, Position(-2.01168, -0.475417, 0), Orientation(0, 0, 0.000398123, 1) = Angle: 0.000796245
[amcl-3] [INFO] [amcl]: initialPoseReceived
[amcl-3] [INFO] [amcl]: Setting pose (52.589000): -2.012 -0.475 0.001
[world_model-4] [INFO] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: . canTransform returned after 1.004 timeout was 1.
[amcl-3] [INFO] [amcl]: Received a 384 X 384 map @ 0.050 m/pix
[amcl-3] 
[amcl-3] [INFO] [amcl]: Setting pose (53.594000): -2.012 -0.475 0.001
[amcl-3] [INFO] [amcl]: createLaserObject
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 44.001 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: Lookup would require extrapolation into the past.  Requested time 54.09400 but the earliest data is at time 54.80200, when looking up transform from frame [base_link] to frame [map]. canTransform returned after 1.006 timeout was 1.
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 44.201 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 44.401 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 44.601 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 44.801 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap]: start
[lifecycle_manager-1] [INFO] [lifecycle_manager]: �[34m�[1mConfiguring and activating dwb_controller�[0m�[0m
[dwb_controller-5] [INFO] [dwb_controller]: Configuring
[dwb_controller-5] [INFO] [local_costmap.local_costmap]: Configuring
[rviz2-9] [INFO] [rviz2]: Creating 1 swatches_
[rviz2-9] [ERROR] [rviz2]: InvalidParametersException: Stream size does not match calculated image size in Image::loadRawData at /tmp/binarydeb/ros-dashing-rviz-ogre-vendor-6.1.2/obj-x86_64-linux-gnu/ogre-master-ca665a6-prefix/src/ogre-master-ca665a6/OgreMain/src/OgreImage.cpp (line 283)
[rviz2-9] [ERROR] [rviz2]: Failed to create 1 swatches_
[rviz2-9] [INFO] [rviz2]: Creating 2 swatches_
[rviz2-9] [ERROR] [rviz2]: InvalidParametersException: Stream size does not match calculated image size in Image::loadRawData at /tmp/binarydeb/ros-dashing-rviz-ogre-vendor-6.1.2/obj-x86_64-linux-gnu/ogre-master-ca665a6-prefix/src/ogre-master-ca665a6/OgreMain/src/OgreImage.cpp (line 283)
[rviz2-9] [ERROR] [rviz2]: Failed to create 2 swatches_
[rviz2-9] [INFO] [rviz2]: Creating 4 swatches_
[dwb_controller-5] [INFO] [local_costmap.local_costmap]: Using plugin "obstacle_layer"
[dwb_controller-5] [INFO] [local_costmap.local_costmap]: Subscribed to Topics: scan
[dwb_controller-5] [INFO] [local_costmap.local_costmap]: Using plugin "inflation_layer"
[dwb_controller-5] [INFO] [dwb_controller]: Setting transform_tolerance to 0.200000
[dwb_controller-5] [INFO] [dwb_controller]: Using critic "RotateToGoal" (dwb_critics::RotateToGoalCritic)
[dwb_controller-5] [INFO] [dwb_controller]: Critic plugin initialized
[dwb_controller-5] [INFO] [dwb_controller]: Using critic "Oscillation" (dwb_critics::OscillationCritic)
[dwb_controller-5] [INFO] [dwb_controller]: Critic plugin initialized
[dwb_controller-5] [INFO] [dwb_controller]: Using critic "BaseObstacle" (dwb_critics::BaseObstacleCritic)
[dwb_controller-5] [INFO] [dwb_controller]: Critic plugin initialized
[dwb_controller-5] [INFO] [dwb_controller]: Using critic "GoalAlign" (dwb_critics::GoalAlignCritic)
[dwb_controller-5] [INFO] [dwb_controller]: Critic plugin initialized
[dwb_controller-5] [INFO] [dwb_controller]: Using critic "PathAlign" (dwb_critics::PathAlignCritic)
[dwb_controller-5] [INFO] [dwb_controller]: Critic plugin initialized
[dwb_controller-5] [INFO] [dwb_controller]: Using critic "PathDist" (dwb_critics::PathDistCritic)
[dwb_controller-5] [INFO] [dwb_controller]: Critic plugin initialized
[dwb_controller-5] [INFO] [dwb_controller]: Using critic "GoalDist" (dwb_critics::GoalDistCritic)
[dwb_controller-5] [INFO] [dwb_controller]: Critic plugin initialized
[dwb_controller-5] [INFO] [dwb_controller]: Controller frequency set to 20.0000Hz
[dwb_controller-5] [INFO] [dwb_controller]: Activating
[dwb_controller-5] [INFO] [local_costmap.local_costmap]: Activating
[dwb_controller-5] [INFO] [local_costmap.local_costmap]: Checking transform
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 45.002 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 45.201 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 45.401 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 45.602 for reason(0)
[dwb_controller-5] [INFO] [local_costmap.local_costmap]: start
[lifecycle_manager-1] [INFO] [lifecycle_manager]: �[34m�[1mConfiguring and activating navfn_planner�[0m�[0m
[navfn_planner-6] [INFO] [navfn_planner]: Configuring
[navfn_planner-6] [INFO] [navfn_planner]: Activating
[lifecycle_manager-1] [INFO] [lifecycle_manager]: �[34m�[1mConfiguring and activating bt_navigator�[0m�[0m
[bt_navigator-8] [INFO] [bt_navigator]: Configuring
[rviz2-9] [INFO] [rviz2]: Creating 1 swatches_
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 45.802 for reason(0)
[bt_navigator-8] [INFO] [bt_navigator_client_node]: Waiting for "ComputePathToPose" action server
[bt_navigator-8] [INFO] [bt_navigator_client_node]: "ComputePathToPose" BtActionNode initialized
[bt_navigator-8] [INFO] [bt_navigator_client_node]: Waiting for "FollowPath" action server
[bt_navigator-8] [INFO] [bt_navigator_client_node]: "FollowPath" BtActionNode initialized
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 46.002 for reason(0)
[bt_navigator-8] [INFO] [bt_navigator_client_node]: Waiting for "Spin" action server
[bt_navigator-8] [INFO] [bt_navigator_client_node]: "Spin" BtActionNode initialized
[bt_navigator-8] [INFO] [bt_navigator]: Activating
[lifecycle_manager-1] [INFO] [lifecycle_manager]: �[34m�[1mThe system is active�[0m�[0m
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 46.201 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 46.401 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 46.602 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 46.802 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 47.002 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 47.201 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 47.401 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 47.601 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 47.801 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 48.001 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 48.202 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 48.402 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 48.601 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 48.801 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 49.001 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 49.201 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 49.401 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 49.601 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 49.802 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 50.001 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 50.201 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 50.401 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 50.601 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 50.801 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 51.001 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 51.202 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 51.401 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 51.601 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 51.801 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 52.001 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 52.201 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 52.402 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 52.601 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 52.802 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 53.001 for reason(0)
[bt_navigator-8] [INFO] [bt_navigator]: Begin navigating from current location to (1.81, -0.41)
[dwb_controller-5] [INFO] [dwb_controller]: Received a goal, begin following path
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 53.201 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 53.401 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 53.602 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 53.802 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 54.002 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 54.201 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 54.402 for reason(0)
[world_model-4] [INFO] [global_costmap.global_costmap_rclcpp_node]: [signalFailure] Drop message: frame 'base_scan' at time 54.602 for reason(0)
[dwb_controller-5] [INFO] [dwb_controller]: Reached the goal!
[bt_navigator-8] [INFO] [bt_navigator]: Navigation succeeded

@SteveMacenski
Copy link
Member

SteveMacenski commented Aug 7, 2019

To be frank, I've been mostly focused on capability and not yet performance, but it looks like I may refocus some of that effort to performance. If you're running in simulation you can drop the controller rate to 5hz and the global costmap update rate to 1hz. Set those *Align configs to 0. That should help a bit.

The CPU 12% for the costmaps seems about right* given all the problems (TF dropping messages, processing a costmap by itself was never a super light weight thing, is there a depth camera involved there or just laser?), but I'm not saying that there isn't something wrong, just from hand wavey background knowledge of getting really deep into costmap 2D in ROS1 navigation 12% is like the lowest I've ever had especially when publishing that costmap at rate, which right now the Nav2 version doesnt have subscriber detection, but I havent run with only a laser.

AMCL and the costmaps all running at 12% seem suspect, given they're all laser scan based and there's so many TF things happening.

@orduno
Copy link
Contributor

orduno commented Aug 8, 2019

@mjeronimo do these numbers match what you've found so far from your performance analysis?

@orduno
Copy link
Contributor

orduno commented Aug 8, 2019

In addition to what has been proposed, consider reducing the global costmap resolution and the local costmap window size. This will certainly have some impact on navigation quality, but it's worth experimenting. We want to try that at some point.

Also, for simulation you can use nav2_simulation_launch with the argument simulator:=gzserver, this will launch everything you need on a single command.

ros2 launch nav2_bringup nav2_simulation_launch.py simulator:=gzserver

@SteveMacenski
Copy link
Member

I think the curent global costmap resolution is 5cm, that's a pretty good number, maintaining it shouldn't be that hard if the ranges are setup for the laser to be small. The RPlidar is pretty short range and low resolution, I don't know that lowering the global costmap resolution will have much impact given the small amount of data being processed.

I'm more suspicious that potentially the global costmap is being updated at 5hz in the default configs (will look today), that would add some extra "umph" when our replanning rate I think its 1hz default.

I'm generally suspect of AMCL if those numbers can be reproduced. These items are on my queue to think about tomorrow or next week depending on time availability.

@SteveMacenski SteveMacenski self-assigned this Aug 8, 2019
@SteveMacenski
Copy link
Member

I'm going to assign myself to start on this, but help from other folks is appreciated.

@crdelsey
Copy link
Contributor

crdelsey commented Aug 8, 2019

My first guess as to the problem would be ros2/geometry2#119. As in, you need this PR integrated into whatever you are running. I think it is integrated in master now. Probably not dashing.

All the nodes mentioned have TF listeners, so the above would be a problem for them.

@crdelsey crdelsey added the 1 - High High Priority label Aug 8, 2019
@crdelsey crdelsey added this to the Aug 2019 milestone Aug 8, 2019
@mjeronimo
Copy link

@orduno In my recent performance analysis, on the NUC I've been using I see about 3% CPU to spin a node. After running a navigation, the nodes should fall back to this level. One can compare a sample ROS2 application, such as a simple pub/sub example, for reference.

Since the three modules mentioned are the ones that have the TF listener, as Carl mentions, I would suspect that the additional CPU utilization is coming from there. There have been recent changes TF listener and the latest ROS2 code should have addressed this.

@SteveMacenski
Copy link
Member

3% a node

jeeze. AMCL used to take ~3% running the particle filter

@SteveMacenski
Copy link
Member

Another thing I just recalled is that we dont use the costmap updates topics yet since RVIZ doesn't support them as of today. Sending the full costmap every time isn't a cheap operation.

@SteveMacenski
Copy link
Member

I'm seeing crazy costmap update rates while debugging another issue

[dwb_controller-5] [INFO] [local_costmap.local_costmap]: Map update time: 0.000919024
[dwb_controller-5] [INFO] [local_costmap.local_costmap]: Cycle: 0.000919024

@crdelsey
Copy link
Contributor

@SteveMacenski Are you still seeing "crazy" costmap updates? If so, can you open a separate issue for that?

@ruffsl Were you able to retest with the fix in ros2/geometry2#119?

@ruffsl
Copy link
Member Author

ruffsl commented Aug 28, 2019

@ruffsl Were you able to retest with the fix in ros2/geometry2#119?

I just changed our dockerfile to build FROM rosplanning/navigation2:master.release and used rocker to spin up and it looks a lot better. However, rviz is glitching out in rendering the map and nav markers.
Here is a snapshot as the robot is planning/moving on its own across the map:

image

@crdelsey
Copy link
Contributor

The rviz thing has been reported upstream ros2/rviz#453. Looks like a workaround is to keep adding the map topic display till it finally looks right.

@mlherd mlherd added the performance This is a performance problem or enhancement label Sep 5, 2019
@ruffsl
Copy link
Member Author

ruffsl commented Sep 6, 2019

I've updated our demo for https://github.com/ros-swg/turtlebot3_demo , so you can check if you're able to reproduce the same performance usage changes. As a side note, perhaps maintainers here could help us triage a QoS/tf issue I think we might be having with navigation + sros2 . ros-swg/turtlebot3_demo#12

@crdelsey crdelsey modified the milestones: Aug 2019, September Sep 9, 2019
@crdelsey
Copy link
Contributor

crdelsey commented Oct 7, 2019

@ruffsl Is this still an issue?

@ruffsl
Copy link
Member Author

ruffsl commented Oct 7, 2019

I'll check back again after the improvements to fast rtps QoS are sync'ed into dashing:
https://discourse.ros.org/t/preparing-for-dashing-sync-and-patch-release-2019-10-10/10898

@ruffsl
Copy link
Member Author

ruffsl commented Oct 18, 2019

I tested this again using the latest recent dashing sync, and everything seems to be much better. The lion's share of the CPU usage remaining seems to be just from the gazebo and rviz GUIs:

image

@crdelsey , feel free to close this, however we are still having issues enabling security for navigation2:
See latest comment here: ros-swg/turtlebot3_demo#12 (comment)

@SteveMacenski
Copy link
Member

SteveMacenski commented Oct 18, 2019

@ruffsl dear god, what computer are you running on? Gazebo takes 3% CPU?

If there's some action items for the security stuff feel free to file tickets and we'll take a look

@ruffsl
Copy link
Member Author

ruffsl commented Oct 18, 2019

what computer are you running on? Gazebo takes 3% CPU?

It's about a 3yo CPU in a workstation. I was thinking perhaps the GPU ray tracing is working in my favor, but it doesn't seem like the turtlebot3 is using gpu_ray , so perhaps its my CPU core count.

Linux ubuntu 4.20.4-042004-generic #201901222207 SMP Tue Jan 22 22:09:25 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

image

@SteveMacenski
Copy link
Member

I need to get off this peewee laptop game then. My computer nearly catches fire when I run gazebo with a depth sensor for longer than 20 minutes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - High High Priority performance This is a performance problem or enhancement
Projects
None yet
Development

No branches or pull requests

7 participants