-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast-RTPS services and network discovery regression (Local costmap not appearing or clear costmap not called) #1772
Comments
Is this the same rviz window or a new one on a new navigation launch? Can you verify that if you toggle the rviz display types for the costmap (or relaunch rviz) that it appears? I think what you're seeing has nothing to do with navigation but rather a failure in the visualization tools. Even 0s would show up here with a boundary because of the changes in transparency between the 2 costmap settings. I think you would see 0s on the costmap in your pictures if the costmap were actually being shown. Not sure it relates, but Buster is also not a Tier 1 supported OS so it may be that the DDS vendors / RMW layers don't do detection properly on that or something. Not sure its related, but could certainly be. I was 4/4 in launching them just now - so might want to take a second look and make sure what's happening is what you think is happening. |
I face this issue too on Ubuntu 18.04 so it shouldn't be related to Buster. |
I looked into this a bit more, the local costmap is set to all 0s for some reason. As @dkuenster said, this happens in more than 50% of the times when starting up the simulation. When this happens, only the static layer seems to be working for the global costmap as well. The local costmap also has non-zero values if a static layer is added to it and it shows up every time. I'm not completely sure what the error with the rest of the layers is exactly, trying to look into it. |
Your image makes it seem like the laser is up and running given that I see some red in the center pole that is off the map (from robot localization quality and laserscans). But I get your point if that happens but this isn't a good example of that. Let me know what you find out. |
It was a new window with a new navigation launch. When switching the visualization, I get the same result that can be seen in the screenshot of @naiveHobo. |
I also found that each time the Local Costmap doesn't appear, the Controller only gets 0 as initial velocity in the twist message of the computeVelocityCommand, despite the robot moving and the odom topic containing the correct velocities. On a start where the Local Costmap starts correctly on the other hand the actual current velocity gets passed to the controller. The pose parameter however works correctly in both cases. |
While echo constantly shows msgs on the "odom" topic in both cases, the OdomSubscriber in the Controller gets messages on some starts and on others the callback method never gets called. Each time it doesn't get messages, we also get the problem with the local costmap plugins, as soon as we set the initial pose. |
Same problem with the LaserScanSubscriber in the Obstacle Layer. On the starts where the OdomSubscriber callback never gets called, the callback in the LaserScan subscriber also doesn't get called despite echo showing messages on "scan". |
Just to verify, what you're describing are specific instances of topics that are being published that have not yet connected to the costmaps, correct? Can you try seeing if switching DDS vendors to Cyclone DDS resolves those issues? I'm wondering if there was a regression or an issue with the local discovery with Fast-RTPS. What version of ROS2 are you on right now (eloquent, master, foxy, etc) |
Yes.
Switching to Cyclone DDS indeed solves this problem. |
Ah ok, yeah that appears to be the same issue at #1788 and ros2/ros2#931. Can you quickly verify that the commit eProsima/Fast-DDS@a9bd1a9 is the offender? If so, we can merge these 2 tickets together and track them. |
Yes, it works right until commit eProsima/Fast-DDS@d5c9d6b (the commit right before eProsima/Fast-DDS@a9bd1a9) and then breaks on eProsima/Fast-DDS@a9bd1a9 |
I'm rolling in the scope of #1788 into this one so we have 1 ticket per issue and renaming this issue to |
I checked this using commit 69977cd + current ros2 master, and running several experiments. For each experiment I followed this procedure:
As I work with Windows, I ran the experiments using VirtualBox to run Ubuntu Focal on a virtual machine. I have checked with rmw_cyclonedds_cpp and rmw_fastrtps_cpp. For the latter, I have checked with eProsima/Fast-DDS@b710b1f (current head of 2.0.x branch) as long as with eProsima/Fast-DDS@d5c9d6b I have never been able to see the expected image. Some times rviz crashed. Other times I could correctly navigate, but the local costmap was not shown. A summary of the results so far...
My impression is that now that both implementations have workarounds to make services more reliable, this issue is always reproduced, so maybe there is something wrong in navigation2 that is now reproducibly failing. NB: It would be nice if someone could check this with RTI connext |
For rviz crashing, I can't help you on that unless its a result of the navigation2 plugins, but I don't think that's the case. If you run with debug symbols and its our fault, I'll look into it, but I think that's rviz. Keep in mind its not just about the costmap showing up, the issue we're talking about is services, which those experiments don't do anything to measure. Services can be trivially tested without the navigation stack with some simple call-response nodes. @daisukes thoughts? I'm not read up or tracking fast-rtps commits so those hashes or the specific changes don't mean much to me (I'm an expert in robotics, not DDS/networking). Have you reproduced the service problem at all from the reports? That's the best starting point that I have also experienced and we still see in the navigation2 CI. Once you've reproduced the problem, I think that's more clear to show that those changes actually fixes the underlying problem. |
As I investigated the commits of Fast-DDS, it worked fine until this commit.
We also had rviz2 crash if we use the latest binary (after June 25th), so we use the source build with rviz2 v8.1.1 not v8.2.0. |
Can you file a ticket if one doesnt exist on rviz2 for that? Make sure someone knows there's a problem Thanks for the experiment and specification. That will definitely help clear things up. |
FYI: I made a ticket ros2/rviz#574 |
@SteveMacenski @daisukes It seems we found the issue. Could you give a try to eProsima/Fast-DDS#1295 ? |
@MiguelCompany I have built the branch and confirmed that the service_test works well and also my own simulation works well with |
@SteveMacenski As eProsima/Fast-DDS#1295 has been merged, and @daisukes checked correct behavior, I think this issue can be closed? |
@MiguelCompany has it been released into foxy? |
I don't think so, but I think we should ask @jacobperron about it. |
@naiveHobo there's been a foxy sync so this might be OK now |
Fast-DDS 2.0.0 is currently version in Foxy. Once a 2.0.1 tag exists, we can make a new release containing eProsima/Fast-DDS#1295. |
@jacobperron |
@SteveMacenski @daisukes |
I confirmed its been released now - closing. |
Bug report
Required Info:
Steps to reproduce issue
use tb3_simulation_launch.py to start the gazebo simulation and nav2 stack. Then use "2D Pose Estimate to localize the robot".
Expected behavior
Local Costmap should show every time, e.g:
Actual behavior
In more than 50% of all initializations the Local Costmap doesn't show, eg:
Echoing /local_costmap/costmap shows the costmap constains all zeroes despite being in the same position as in the working case, where it contains actual values.
Rviz doesn't report any issues with the topics.
Additional information
console_output_empty_costmap.txt
console_output_working_costmap.txt
I can't find any differences or errors in the console output. Can anyone reproduce the issue or has any idea what is happening?
The text was updated successfully, but these errors were encountered: