-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce costmap_2d_ros nodes #2489
Reduce costmap_2d_ros nodes #2489
Conversation
This seems good, but are there no changes that need to be made to the lower level costmap layers / subscribers / executors? I almost can't believe this just 'works' with all of those layers creating subscribers to potentially Keep in mind when developing on costmap_2d that this is a vector of plugins and each are running in parallel with background threads to process data. Think of it like there are 100+ of these layers in use. That is obviously ridiculously large, but it should help you think about how to isolate them working so that it can support many layers. Having 5-10 layers running is totally not out of the question. I don't have hard data on this point, but in my experience, having ~7 running is pretty typical and usually half or more of those are processing high bandwidth data like 30fps depth images or laser scans. This is in contrast to something like a planner, where we can also have If/when you make changes to the costmap layers, make sure to update any on the lists
As these are the 'supported' / mature layers that get alot of use |
yes, i don't create new callback group, executor or thread in costmap. i'm not sure whether these costmap layers or fillters are independent with each other (a layer won't wait for another layer). at least, in turtlebot3 navigation demo, there's no deadlock, we can run navigation successfully. if we create many layers with high bandwidth data, we maybe have delays (refer to #2456 (comment)) because we only have a single thread executor to handle these callback. if we create a single-threaded executor for each layers, it's overhead if we have 100+ layers. how about using multi-threaded executor for costmap ? because maybe some of layers are idle and some of layers are busy. |
I think it might be good if you did a quick analysis of the costmap layers in the stack (+ STVL) to see how they used the input rclcpp node / client nodes before as they were given and what new issues might arise with these changes. I think at least one of those nodes was exclusively reserved for the layers and now the single node is also being shared with the costmapROS's implementation, TF, and potentially more. Also where's the I am slightly surprised not to see any updates to the static layer - that also has a subscriber. Maybe it used the normal node, but I didn't look into it. @wjwwood would you recommend having each costmap layer having a separate executor or having a single executor shared among all of them? In the BT, we made each node have their own separate one and that seems like the most clean way from a designer's perspective. But I'm uncertain about the amount of overhead that might contribute to. Do you have a sense about this? |
I look at source code of Costmap Layer:
Costmap Filter:
|
Got it, that's exactly what I meant. I think we should pause for @wjwwood to give us his thoughts on the executors since that will influence the direction we should go. I'm leaning towards each layer having its own executor to manage its own subscriptions with this model -- but only if the overhead wouldn't be super high. If it is, maybe we have them all share a single callback group / executor like we do now and address it if it becomes a problem in the future. There's been a sync with rolling, retriggering CI |
I think the case is a bit different from the BT. the executors of each BTNode use
but i don't know which layer need a separated callback group / executor. currently, maybe i could create a separated callback group / executor for layer which use anyway, i think we need a separated callback group / executor with dedicated thread when there are deadlocks or delays in our node. (currently i couldn't find a deadlock in Costmap2DROS )
i need wait for docker image |
I don't think the overhead of each executor should be too significant, or at least it shouldn't be exceptionally more than just the overhead of a thread. That being said I have not tried to measure the overhead of the executor in terms of memory. There will be some CPU overhead, most likely, but again it's hard for me to say if approaches a concerning amount or not. Ideally, you wouldn't have more threads than cores on the CPU, since at the end of the day the threads share the CPU cores. Using a thread should be to block on I/O or to keep long running tasks isolated. If you designed you callbacks to be as quick as possible (good idea) then a thread-pool is likely the best option. The thread pool could be just a Multi-threaded executor and N costmap layers, each with their own callback group. This approach works well if the callbacks in the costmap are decoupled, i.e. they don't access a shared (and therefore synchronized) resource. Things that access a shared resource should ideally be the same mutually exclusive callback group, because otherwise they will just be sitting in their own executor thread waiting for the resource. That makes the overhead of that waiting thread basically all overhead. Sorry, I'm not sure if I'm explaining it well, but basically I'd say: more executors is not insanely expensive (probably) but if multiple callbacks are using the same shared resource they should absolutely be in the same callback group (and therefore for the same executor) for maximum efficiency. |
I think you explained it well. I would be interested to know what the overhead in terms of memory the executor is, but if the CPU overhead is just a thread, that element's a good trade off. But if that is information that isn't readily available, I think we can proceed optimistically. It's never been my intent to focus too much on the embedded use-cases for Nav2 since compute resources are only getting faster and with more {cores, ram, etc}. The usual 'raspberry pi with a lidar' educational demo won't need 5+ costmap layers, so they should be served fine either way as the most simple case. None of the callbacks should be accessing any shared resources. In costmap2D, they mostly just buffer the data for the Since I like to know my options, I think from this discussion we have the following options:
Generally I like "complete" solutions, things that we can decide on now and handle bulletproof any situation someone could throw at it so that we don't just kick the can down the road (even at the expense of some performance). To me, that leaves options 3 and 4, but I'd be open to 2 just given the small number of plugins we have in costmap_2d package. I would default to William's suggestion. That generally sounds like the best answer to me, I just stopped thinking about multithreaded executors when we decided against that route for other PRs in this project. But I think costmap_2d is a pretty unique case and that makes alot of sense to me. That would also give examples of all the major ways of working with executors in Nav2: multithreaded, a bunch of independent executors, single server with single node which would make a nice reference for the future. |
Actually, I would use a Multi-threaded executor and let it set the number of threads which defaults to the number of cores on your CPU. Not one thread per costmap. Because even if you have five things that could be running at the same time (so therefore five threads) but you only have four cores, one of them will be waiting anyway. The only reason to have more threads than cores is if you have long running tasks that run in parallel and sync with each other. In this case we have relatively short lived tasks which are not syncing with each other, so using fewer threads in a thread pool is generally better. |
Got it, sounds good, thanks for the clarification. @gezp do you like this direction? |
I like this option, for the developers of costmap layer, them needn't consider whether or not to use a separate thread, let multi threaded executor to schedule them automatically. and a separate callback group is necessary for each layers, because they don't access a shared resource in their callback. i will update the PR later. |
Let us know when we should take a look and review! |
I have updated this PR, please review again. |
cda4d69
to
5d87b5c
Compare
I'm putting this on the backburner until we come to a conclusion about the AMCL PR since they have some related elements and I don't want to distribute the conversation across multiple PRs |
This pull request is in conflict. Could you fix it @gezp? |
@gezp where are we on this? Can you ping the ticket / PR that is blocking this work from being completed? I really don't want this to continue sitting around when it could be resolved |
5d87b5c
to
b590666
Compare
the description of this PR was updated, and i create a dedicated |
Now that all of them are being serviced by I think this is what you mention about the Though,
So
Alternatively, we could make each costmap layer have its own executor. That feels like that might be approaching excessive, but I'm not sure what else we could do to guarantee that data from LayerA isn't being blocked by use of LayersC-F. I suppose though that all of the layers (in open source) just buffer data in their callbacks, except for the static layer, so that shouldn't be blocked too badly. Thinking about it, ROS 1 would have only had a single thread as well and I don't recall that being problematic (right @DLu?) with many subscribers / costmap layers. So sharing all 1 executor might be fine and I'm overdesigning. |
yes, the behavior from before is not changed.
There may be a potential deadlock problem when we use numerous layers which create numerous subscribers. but any way, it works well now. To solve this potential problem, we can use isolated executor for each layers or multi thread executor (like I think this is a new problem we should discuss in another ticket, it's not easy to solve.
yes, i'm also not sure about this problem. |
So if we say that we don't think that having all of the layers on the same executor is a problem (which I think its not; given how current layers are implemented + ROS 1 have everything on a single thread) is there anything blocking this PR? It sounds like we're going to kick adding individual callback groups / executors per-layer down the road until there's an actual issue that needs to be resolved. I suspect it'll come up sometime in the future, but I don't think we need to proactively over design for something that a user would have experienced before this PR and in ROS 1 anyway. |
nothing, this PR is completed.
i agree, we can ignore it now, and solve this problem when it occurs |
Please pull in main / rebase, there are tests failing that I want to see gone before merging |
…of inner node Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com> update comment Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com>
fe68af7
to
972c433
Compare
The Edit: Still failing |
Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com>
eb3c794
to
d7ab41c
Compare
sorry, this is my mistake. I reproduced this failed result on my local computer by multiple tests, and found the test i compare modified codes with before, i find i add some unnecessary modifications. in fact, only After updated, i tested |
No worries! |
Again, huge thanks for your help here, this is really fantastic and your work here has had a massive improvement on my own understanding of callback groups and executors and it was a great opportunity to further improve the code quality of Nav2 to be the best practices for all of ROS 2 code. This is really nice to have in finally and thanks very much for sticking with it and contributing to rclcpp and TF! You never know the places you go in open source 😉 |
I think the last bit here is to remove the option for creating an extra |
This PR seems to break drive heading, back up and assisted teleop. Specifically, it seems to break the CostmapTopicCollisionChecker. I will look into this tomorrow and hopefully provide some more information! :) |
sorry, i tested this on ubuntu20.04 with some modifications and it worked well, then i updated my local environment and test on ubuntu22.04 locally, there were some weird problems. but i'm not sure why and i will spend some time to handle it. @jwallace42 |
* use a dedicated callback group for costmap plugin and filter instead of inner node Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com> update comment Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com> * remove unnecessary modifications Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com>
Nice work! Just came across this PR as I am in the process of switching to Humble. |
@gezp
|
Signed-off-by: zhenpeng ge zhenpeng.ge@qq.com
Basic Info
Description of contribution in a few bullet points
As described in #816,
rclcpp_node_
andclinet_node_
inclass Costmap2DROS
need be removed, you can find more details(other nodes which need be removed) in #816 (comment).clinet_node_
isn't used, so just remove it simplily.rclcpp_node_
is used forTransformListerner
andMessageFilter
, as same as Reduce amcl_node nodes #2483 , so just useshared_from_this()
instead ofrclcpp_node_
.rclcpp_node_
is also used by costmap plugins and filters (onlyObstacleLayer
).Layer::initialize()
is changed: remove the argumentsclient_node
which is unused and replacerclcpp_node
withcallback_group
use dedicatedcallback_group
to create ros subscribers in all costmap plugins and filters.Test
colcon test
navigation demo test (turtlebot3 in gazebo)
Future work that may be required in bullet points
the work way of costmap plugins and filters are not changed, and all costmap plugins and filters only use a dedicated
callback_group
(a default callback_group inrclcpp_node_
is used before this PR) for subscribers, but there may be a potential problem in this way.the different way can be considered for parallel subscribers and no potential deadlock in costmap plugins and filters:
callback_group
with single thread executorcallback_group
with multi thread executorcallback_group
with single thread executor for each costmap plugins and filters (isolated)For Maintainers: