-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage even at idle #1015
Comments
@ruffsl are you using any custom config files or just the defaults? If I'm reading this right: you're seeing 12% CPU usage for the world model (global costmap), DWB controller (which also is running a costmap), and AMCL (woah, that's shouldn't be even remotely that high). Frankly 12% CPU seems about right for the costmap items since that's spinning regardless if anyone's using it. AMCL shouldn't be 12% ever. Any other errors/warnings like inability to transform laser scans? Particularly coming from AMCL? Also is this on dashing or bleed? Is this running with the encryption going or just normal? |
We're currently targeting dashing (with some tiny config changes to avoid errors in RViz dashing-devel...mikaelarguedas:mikael/dashing-sync).
I am seeing a bunch of dropped message and tf errors still:
Is it a common set of error messages or are we doing something bad somewhere? Currently looking at ros2/geometry2#146 and ros2/rviz#375 to see if these improve things.
I believe these results are just normal, with encryption CPU loads gets significantly higher on my machine
We're using these parameters: https://github.com/ROBOTIS-GIT/turtlebot3/blob/ros2/turtlebot3_navigation2/param/burger_params.yaml The diff from the ones from navigation2@master: $ diff burger_params.yaml nav2_params.yaml
60c60
< max_vel_x: 0.22
---
> max_vel_x: 0.26
64c64
< max_speed_xy: 0.22
---
> max_speed_xy: 0.26
86,87c86,87
< PathAlign.scale: 32.0
< GoalAlign.scale: 24.0
---
> PathAlign.scale: 0.0
> GoalAlign.scale: 0.0
103c103
< robot_radius: 0.105
---
> robot_radius: 0.22
125c125
< robot_radius: 0.105
---
> robot_radius: 0.22
145c145
< yaml_filename: "map.yaml"
---
> yaml_filename: "turtlebot3_world.yaml" |
You can see our repos file we are using in the
I'd still argue even that's aggressive, could that be toned down as well by shrinking the local cost map area?
We see lot of info messages from
Here is an capture from
|
To be frank, I've been mostly focused on capability and not yet performance, but it looks like I may refocus some of that effort to performance. If you're running in simulation you can drop the controller rate to 5hz and the global costmap update rate to 1hz. Set those The CPU 12% for the costmaps seems about right* given all the problems (TF dropping messages, processing a costmap by itself was never a super light weight thing, is there a depth camera involved there or just laser?), but I'm not saying that there isn't something wrong, just from hand wavey background knowledge of getting really deep into costmap 2D in ROS1 navigation 12% is like the lowest I've ever had especially when publishing that costmap at rate, which right now the Nav2 version doesnt have subscriber detection, but I havent run with only a laser. AMCL and the costmaps all running at 12% seem suspect, given they're all laser scan based and there's so many TF things happening. |
@mjeronimo do these numbers match what you've found so far from your performance analysis? |
In addition to what has been proposed, consider reducing the global costmap resolution and the local costmap window size. This will certainly have some impact on navigation quality, but it's worth experimenting. We want to try that at some point. Also, for simulation you can use
|
I think the curent global costmap resolution is 5cm, that's a pretty good number, maintaining it shouldn't be that hard if the ranges are setup for the laser to be small. The RPlidar is pretty short range and low resolution, I don't know that lowering the global costmap resolution will have much impact given the small amount of data being processed. I'm more suspicious that potentially the global costmap is being updated at 5hz in the default configs (will look today), that would add some extra "umph" when our replanning rate I think its 1hz default. I'm generally suspect of AMCL if those numbers can be reproduced. These items are on my queue to think about tomorrow or next week depending on time availability. |
I'm going to assign myself to start on this, but help from other folks is appreciated. |
My first guess as to the problem would be ros2/geometry2#119. As in, you need this PR integrated into whatever you are running. I think it is integrated in master now. Probably not dashing. All the nodes mentioned have TF listeners, so the above would be a problem for them. |
@orduno In my recent performance analysis, on the NUC I've been using I see about 3% CPU to spin a node. After running a navigation, the nodes should fall back to this level. One can compare a sample ROS2 application, such as a simple pub/sub example, for reference. Since the three modules mentioned are the ones that have the TF listener, as Carl mentions, I would suspect that the additional CPU utilization is coming from there. There have been recent changes TF listener and the latest ROS2 code should have addressed this. |
jeeze. AMCL used to take ~3% running the particle filter |
Another thing I just recalled is that we dont use the costmap updates topics yet since RVIZ doesn't support them as of today. Sending the full costmap every time isn't a cheap operation. |
I'm seeing crazy costmap update rates while debugging another issue
|
@SteveMacenski Are you still seeing "crazy" costmap updates? If so, can you open a separate issue for that? @ruffsl Were you able to retest with the fix in ros2/geometry2#119? |
I just changed our dockerfile to build |
The rviz thing has been reported upstream ros2/rviz#453. Looks like a workaround is to keep adding the map topic display till it finally looks right. |
I've updated our demo for https://github.com/ros-swg/turtlebot3_demo , so you can check if you're able to reproduce the same performance usage changes. As a side note, perhaps maintainers here could help us triage a QoS/tf issue I think we might be having with navigation + sros2 . ros-swg/turtlebot3_demo#12 |
@ruffsl Is this still an issue? |
I'll check back again after the improvements to fast rtps QoS are sync'ed into dashing: |
I tested this again using the latest recent dashing sync, and everything seems to be much better. The lion's share of the CPU usage remaining seems to be just from the gazebo and rviz GUIs: @crdelsey , feel free to close this, however we are still having issues enabling security for navigation2: |
@ruffsl dear god, what computer are you running on? Gazebo takes 3% CPU? If there's some action items for the security stuff feel free to file tickets and we'll take a look |
It's about a 3yo CPU in a workstation. I was thinking perhaps the GPU ray tracing is working in my favor, but it doesn't seem like the turtlebot3 is using
|
I need to get off this peewee laptop game then. My computer nearly catches fire when I run gazebo with a depth sensor for longer than 20 minutes |
Bug report
The security working group has been creating a SROS2 demo targeting the turtlebot 3 with a running navigation stack. However our meager mortal laptops @mikaelarguedas and I have (~2017 Dell XPS i7 32GB) are struggling to run the tb3 demo while maintaining a usable desktop, as about about 40% of our CPU is under load even when the robot is not planning or moving:
I fear many participants in the workshop with lesser machines will struggle to run the demo with added security cipher overhead or unavoidable virtual machines on mac/windows as workshops happen to do when linux is involved. Is there anything we can do to lighten this up, like rate limit spinning in
dwb_controler
,amcl
,map_model
, or patches we can pull into our .repo file targeting dashing?https://github.com/ros-swg/turtlebot3_demo
Required Info:
Steps to reproduce issue
See current tutorial README.md as of writing:
https://github.com/ros-swg/turtlebot3_demo/tree/ccc63b8462a714f6f870db97ae8797d167522463
Expected behavior
Ideal load for the navigation stack when not moving would be low like in ROS1, mostly dominated by gzviewer and rviz if not running in headless mode.
Actual behavior
The processes for
dwb_controler
,amcl
,map_model
take the lion's share of CPU, even with the robot is localized and not moving. Note that this is even without security or gzclient enabled.Additional information
The above is just from launching these two launch files
Where gzclient can be disabled in
navigation2.launch.py
via:The text was updated successfully, but these errors were encountered: