-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce amcl_node nodes #2483
Reduce amcl_node nodes #2483
Conversation
ddb9042
to
1ef65cb
Compare
@ruffsl it looks like the underlay failed to build and use the message filters package in the underlay.repos file. I see in the CI job it recognizes that its there, but I don't see any logging that it was built |
The underlay appears to have built correctly, but it seems that a dependency was added without also amending the respective package.xlm and CMake files. Also I think the message filter package already ships with rolling, so you could leave it commented out in the repos file, and let rosdep install it from apt for the overlay workspace. |
AMCL does have the callback for the subscriber, It looks like the TF buffer has an argument in the constructor for a node that is defaulted to making a new node just to create a getFrames service. This doesn't have as clear of a solution -- this feels out of place. Shouldn't that be in the TF Listener perhaps? It seems odd to me that the buffer would have a ROS interface at all. Seems like it should have the utility functions to get the set of frames, but then the actual service should reside in the listener using the buffer. That would also remove the need for a node in the TF buffer at all. The TF listener then creates a node internally to spin and work with. This has a clearer solution described below.
Agreed, for now, I think its worth letting TF do its own thing and control its own node for the time being until we find a solution on that front. I looked over these files https://github.com/ros2/geometry2/blob/ros2/tf2_ros/src/transform_listener.cpp#L68-L91 and I think there's a really reasonable change that can be made here to solve our problems, at least as relates to the Listener.
@ruffsl What do you mean? No new packages were added. We just want to use a newer |
@ruffsl could you give me more details? i don't know how to fixed the CI error. |
Ah, ok. I'm on my phone, so I didn't browes the code outside of the PR changes, and just assumed that the additional package to the repos file was still undelared in the manifests. Never the less, the underlay seems to be building fine:
@gezp , what distro or source are you developing this PR with locally? Are you using the latest Rolling release, or are you building everything from ros2 master? Check if the Nav2 project Dockerfile builds fine locally for your PR:
|
I don't know if tf2 has been updated to make use of the fact that you can add callback groups to dedicated executors yet. You might want to bring it up on the geometry2 issue tracker. Or provide a pull request to that effect (rather than it having its own node). I don't use tf2 often enough to answer your architectural question about the buffer interface and how it should or shouldn't depend on ROS. |
Ok @gezp can you open the PR to implement what I describe above for the listener and open a ticket on the listener / buffer node usages? We can continue the discussion with Tully over there. For now, lets leave TF alone to do its own thing and fix it over there instead. I'd hold off on the getFrames service moving part, they might have a good reason for that and we should take their feedback and come up with a compromise that meets everyone's core needs. |
@SteveMacenski I opened a ticket ros2/geometry2#440 ,and PR ros2/geometry2#442 for listener. |
@ruffsl Thank you, i will wait for the PR (ros/rosdistro#30350) to be merged so that i could use the feature which i need on the latest Rolling release. |
@gezp those don't get released to rolling immediately, it requires a sync. The last sync was only a couple of weeks ago so it might be more than another month before we get another one. We should add the master to underlay and get this building so we can keep moving forward In parallel to our work, we should also continue nudging and pushing the TF issue / PR forward. But for now, we should list anywhere that needs to be addressed in the ticket bullet point list and move onto non-TF problems in Nav2 while working on the TF problems at the same time in TF itself. |
I guess the I install ROS2 Galactic form binary on my local computer , and delete @ruffsl i can't wait for the next ROS rolling version ,and i don't know how to pass CI , could you give me some suggestions? |
I'm not sure I understand why the message filter package can't be overlay'ed in this particular case. It being a core package shouldn't it self prevent this. One can overload an existing packe by including it in a workspace, where upon setup, the package sourced from the top most overlay should take environmental president over any duplicates in the underlay workspaces. Complications can occur when trying to link to dependencies that were compiled against a different version of the common dependency from a deeper underlay, but in this case, from how you described your current development with monkey patching galactic and overlay confined to navigation2, that doesn't seem to be an issue already. Is there anything else you're including in your underlay besides the master branch of the message filter package, such as master branches of the message filter package's own dependencies? |
I only put currently, I try to use Docker to build nav2. I add --- stderr: nav2_rviz_plugins
CMake Warning at CMakeLists.txt:50 (add_library):
Cannot generate a safe runtime search path for target nav2_rviz_plugins
because there is a cycle in the constraint graph:
dir 0 is [/opt/overlay_ws/install/nav2_msgs/lib]
dir 1 is [/opt/ros/rolling/lib]
dir 7 must precede it due to runtime library [libmessage_filters.so]
dir 2 is [/opt/overlay_ws/install/nav2_lifecycle_manager/lib]
dir 3 is [/opt/overlay_ws/install/nav2_util/lib]
dir 4 is [/opt/ros/rolling/opt/yaml_cpp_vendor/lib]
dir 5 is [/opt/ros/rolling/opt/rviz_ogre_vendor/lib]
dir 6 is [/opt/ros/rolling/lib/x86_64-linux-gnu]
dir 7 is [/opt/underlay_ws/install/message_filters/lib]
dir 1 must precede it due to runtime library [libmessage_filters.so]
Some of these libraries may not be found correctly.
--- then i modify Dockerfile and add some cmds before line70 in Dockerfile # berore . /opt/ros/$ROS_DISTRO/setup.sh
# remove message_filters
RUN rm -rf /opt/ros/rolling/include/message_filters && \
rm -rf /opt/ros/rolling/share/message_filters &&\
rm -rf /opt/ros/rolling/libmessage_filters.so above warning is still existed, but i can build the Dockerfile successfully by this way. so do i need change CI ? or do you have a better way? @ruffsl |
mhm, that is complicated. I think the only real solution would be building it all from absolute scratch and replace the message filters with our branch at that point. That's probably not realistic for us to do in CI, so the only realistic option I see is to wait for a new rolling sync on this PR and just move on to other topics in the meantime while we wait. |
https://discourse.ros.org/t/preparing-for-rolling-sync-2021-08-05/21701 Good news! A sync is happening tomorrow |
the new release of ROS Rolling is available ros rolling/2021-08-06 , but the docker image |
The CI image should be kept up to date using this action that checks for apt upgrades.
https://github.com/ros-planning/navigation2/runs/3277796714?check_suite_focus=true
I'm not sure why this hasn't triggered the image rebuild in the next step, I suspect there might still be a syntax issue with bash or GitHub action error in the config file.
Update: this should help to update the CI image tonight #2500
|
This is building now -- but I think we are still blocked here with respect to TF or no? |
i don't think we should wait TF. because tf should spin with self's thread (to avoid delay or block), we needn't consider the implementation of tf (use internal Node or callback group+executor). if TF is updated, we just need a small update in Nav2 (use another constructor of TF listener) |
5c8268d
to
afcda28
Compare
Wouldn't the services / map / initial pose callbacks block the laser scan thread? As this PR stands. I think I need to revisit the TF code. I'm not concerned about giving the buffer the clock from this node and the TF broadcaster is just making a publisher with the node, but I do want to understand what the create Timer ROS function is doing with the timer / base interface. |
there is no block in services / map / initial pose callbacks
subscriptions:
|
Yeah it was an issue before, but its different now because the services are involved. If you followed my example, the main concern I had was longer running things (services) blocking long enough to cause a flood of message filter warnings / errors. The TF/laser scan issue is also there, but isn't concerning to me because what you mentioned, its been like that for about 2 years. The no-update service is pretty trivial and not an issue but the global relocalization one is a bit more so. Can we separate the services onto another executor? Do you think that's a rational move (it would probably need then some shared resource protections)? Not the best solution in my mind though. The best solution to me is to make TF isolated from the rest of the node's interfaces so that it isn't being interfered with based on user services or data callbacks. If that was the case, then we shouldn't have an issue with message filters not being able to transform and we'd just be dropping laser scans in the case of long-running services which is the right behavior. |
which services i need separate ? do you mean |
yes, though as I said below, I think the best solution is sufficiently isolating TF |
do you mean |
It doesn't have to do with that, it has to do with the services blocking the message filter from doing its thing, via the TF timer sharing execution with the services/topics/other interfaces, if my understanding of the situation is correct. This will be a similar issue with Costmap2D's PR, which is why I recommend looking for a solution related to TF / message filters instead so that we have a general solution to the core concern rather than patching it for this particular situation. TFs |
sorry, the message filter of tf is difficult to understand for me. in my understanding, when receive a laser_scan message:
if you want to isolate TF timer , i have a idea to do it :
in my opinion, we needn't isolate TF timer, because timer is only used to cancel TransformableRequests (maybe wrong), it isn't a issue to cancel with delay. |
Thusly, a blocked executor from a service could easily mess up the message filter's ability to function, throwing exceptions from timeouts and returning lots of errors/warnings to the logs if the transform is not immediately available to transform, which is very well possible / probable at times that a message filter would have to wait a bit to get a valid transformation. |
afcda28
to
133298f
Compare
I have updated PR to isolate TF timer :
please review again. @SteveMacenski |
What you suggested is not unreasonable, but I'd like to focus on ros2/geometry2#446 to actually fix the underlying issue. I'd be curious if you could find a clean way to fix this in TF itself (maybe take in a node and have the timer interfaces in its own separate callback group / executor background thread) like we did with the other TF PR. That would have a significantly larger impact on the community longer term preventing issues like this from reappearing elsewhere. I would much rather solve this problem forever than have to rethink about this every time we add a new server to Nav2 or building applications. It doesn't seem very clean to give TF the "main" executor for the node and relegate the actual application code to another executor to keep TF happy. |
This pull request is in conflict. Could you fix it @gezp? |
@gezp where are we on this? Can you ping the ticket / PR that is blocking this work from being completed? I really don't want this to continue sitting around when it could be resolved |
@SteveMacenski , it's blocked by ros2/geometry2#447. |
OK - I poked them and it should be merged within a day |
Merged! |
133298f
to
e58bc65
Compare
@gezp, your PR has failed to build. Please check CI outputs and resolve issues. |
CI failed, we need wait for a new rolling sync to get feature ros2/geometry2#447. |
Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com> add callback group and executor Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com> update Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com>
e58bc65
to
1ce7253
Compare
CI passed, please review this pr again. @SteveMacenski |
Thanks! Nice to get this in finally 😄 I think there's just 1 more pending to complete this all. Thanks for being so on top of it once we got back on Rolling! |
Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com> add callback group and executor Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com> update Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com>
Signed-off-by: zhenpeng ge zhenpeng.ge@qq.com
Basic Info
Description of contribution in a few bullet points
As described in #816,
rclcpp_node_
inclass AmclNode
need be removed, you can find more details(other nodes which need be removed) in #816 (comment)message_filters::Subscriber
inLifecycleNode
. and updatetools/underlay.repos
to sync the feature.AmclNode
, so just useshared_from_this()
instead ofrclcpp_node_
forTransform
andMessageFilter
Test
colcon test
navigation demo test (turtlebot3 in gazebo)
Future work that may be required in bullet points
TransformListener
has a argumentspin_thread
with default valuetrue
, which create a internal node to subscribe tf topic and a dedicated thread to spin this node.ros2/geometry
to discuss these.For Maintainers: