-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recording CPU load decrease through spin_some() #743
Conversation
Signed-off-by: Adam Dabrowski <adam.dabrowski@robotec.ai>
I think I would like to understand why this different is so big, before merging it. Is the standard Executor used in Would this difference show in a development machine using the rosbag2_performance benchmarks? The output of that suite would also help make this review easier. |
Of course, let me explain why the difference is so big. I tried to do so in the rclcpp issue linked in the description, notably this part:
Basically, this change causes the executables "cache" to be actually used , so that
Standard executor has spin_once, spin_some, spin_all, and spin, basically offers several ways to process executables already, where spin() is for consuming wait set items as promptly as possible (without any delay) and other methods offer different strengths. So what we are doing here is just using the standard executor in a different, but fully supported manner. CPU measurements before and after, exactly the same setup: The bag file played to get this comparison (play into record) contains some non-public messages and I agree that it would be great to get something that is easy to replicate. I will find some time to do this. Bag file rough profile: 3.7GB in 452k messages over 100 seconds with about 200 topics. This should be fully reproducible with benchmarking publishers. |
Running benchmarks didn't confirm that this gain is more generic, and in some cases (lower executables/sec count) CPU load is apparently even increased by this patch. So I am not a fan of this change anymore. I am investigating further. What is so crucially different in these scenarios? These results don't make much sense to me yet. |
Thanks for taking a deep dive into this, it is reassuring to see this level of rigor applied to an important issue. I would be interested in a profiling result, to see exactly where in the code the 50% CPU usage difference comes from in your sample case - a function profile will probably tell us a lot. |
@emersonknapp As far as I understand from @christophebedard analysis main overhead in executor in Introducing sleep between I don't see an easy fix for this problem now. While executors have a good abstraction and easy to use API they have a lot of overhead inside. |
Perhaps using ros2/design#305 the "Events Executor" instead of a |
I will dive deeper into this |
@adamdbrw is this PR still relevant, or should we close it as stale? |
Closing as stale |
The rationale behind this PR is explained here: #737 and here: ros2/rclcpp#1637.
In short, it is cutting CPU load from
ros2 bag record
from 110% to 55% in a certain use-case on a certain platform without apparent drawbacks. Notably, the use-case is quite a representative automotive one and the platform is also unexceptional. I believe this extrapolates well into reducing CPU use overall when the number of executables (e. g. subscriptions) to process each second is high. There might be some additional testing needed to build even more confidence in the broader conclusion.Update: I have no confidence in the generalized claim anymore, see comment below.