Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconfiguring planner server fails #3768

Closed
Aposhian opened this issue Aug 21, 2023 · 10 comments · Fixed by #3786
Closed

Reconfiguring planner server fails #3768

Aposhian opened this issue Aug 21, 2023 · 10 comments · Fixed by #3786
Labels
bug Something isn't working

Comments

@Aposhian
Copy link
Contributor

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04
  • ROS2 Version:
    • Humble
  • Version or commit hash:
    • 1.1.9
  • DDS implementation:
    • CycloneDDS

Steps to reproduce issue

ros2 launch nav2_bringup navigation.launch.py
ros2 lifecycle set /planner_server deactivate
ros2 lifecycle set /planner_server cleanup
ros2 lifecycle set /planner_server configure

Expected behavior

Planner server reconfigures with no issue

Actual behavior

Planner server throws an exception and crashes.

[planner_server-3] [INFO] [1692651875.999218146] [planner_server]: Configuring
[planner_server-3] [INFO] [1692651875.999248989] [global_costmap.global_costmap]: Configuring
[planner_server-3] [INFO] [1692651876.000645715] [global_costmap.global_costmap]: Using plugin "static_layer"
[planner_server-3] [INFO] [1692651876.000701514] [global_costmap.global_costmap]: Subscribing to the map topic (/map) with transient local durability
[planner_server-3] [INFO] [1692651876.000898492] [global_costmap.global_costmap]: Initialized plugin "static_layer"
[planner_server-3] [INFO] [1692651876.000919229] [global_costmap.global_costmap]: Using plugin "obstacle_layer"
[planner_server-3] [INFO] [1692651876.000967905] [global_costmap.global_costmap]: Subscribed to Topics: scan
[planner_server-3] [INFO] [1692651876.001283202] [global_costmap.global_costmap]: Initialized plugin "obstacle_layer"
[planner_server-3] [INFO] [1692651876.001304907] [global_costmap.global_costmap]: Using plugin "inflation_layer"
[planner_server-3] [INFO] [1692651876.001413722] [global_costmap.global_costmap]: Initialized plugin "inflation_layer"
[planner_server-3] terminate called after throwing an instance of 'std::runtime_error'
[planner_server-3]   what():  Node '/global_costmap/global_costmap' has already been added to an executor.
[planner_server-3] [INFO] [1692651876.003720405] [planner_server]: Created global planner plugin GridBased of type nav2_navfn_planner/NavfnPlanner

Additional information

I think the underlying reason for this might be that planner server runs costmap_ros in its own thread, while costmap_ros already creates its own thread.
https://github.com/ros-planning/navigation2/blob/6ef3d7bfd0f617bd0751cdbe7a58c3f9f66fe882/nav2_planner/src/planner_server.cpp#L88
https://github.com/ros-planning/navigation2/blob/6ef3d7bfd0f617bd0751cdbe7a58c3f9f66fe882/nav2_costmap_2d/src/costmap_2d_ros.cpp#L253

@SteveMacenski
Copy link
Member

SteveMacenski commented Aug 22, 2023

Thanks for the re-report that this is still an issue (I closed the original report of this since I thought the Humble sync's change resolved this).

while costmap_ros already creates its own thread

That thread is for the TF2 buffer updates and the plugins themselves, not the core costmap which the planner server version entails. You can note that by the callback group that it is given. The thread established in the planner/controller server deals with things like the footprint subscription/publication, the costmap publication, and clear costmap service.

While its a little messy, I was concerned that long-updates in the costmap utils would block updating of sensor data to the individual layers causing data to be dropped (which is bad for the costmap to do!). We might be able to re-add this all into one executor again and make costmap layers each themselves handle their own executor, but I decided at the time that N executors wasn't a great option in terms of compute overhead.

Anyway... its been like that for a long time now and I don't think that the fact that there are the 2 executors is the cause of any problem or else we'd see this happen on the first startup as well. The issue we have is on resetting only, so I think something's not being shutdown / reset that needs to be.

The update from 1.1.9 is that we moved the planner/controller server node threads to be in configure not the constructor so that resetting could work #3548 @BriceRenaudeau do you see this problem?

Perhaps we need to have the costmap configure happen after the costmap_thread_ creation in on_configure of the planner/controller server? Or there's something not being reset properly (maybe an explicit executor_ / callback_group_ reset?).

@SteveMacenski SteveMacenski added the bug Something isn't working label Aug 22, 2023
@BriceRenaudeau
Copy link
Contributor

We moved to Iron and we don't have this issue.

We are using the nav2_lifecycle_manager and nav2_lifecycle_manager_client to activate/reset the navigation nodes.

@SteveMacenski
Copy link
Member

@Aposhian can you confirm you don't see this on Iron? If so, that gives me a relatively easy diff to look at for the solution. I have Jury Duty next week (ugh) so I should have some downtime to look at this between events

@Aposhian
Copy link
Contributor Author

I can check on iron

@Aposhian
Copy link
Contributor Author

Aposhian commented Aug 24, 2023

Yes I can confirm this issue doesn't show up on iron, but it does on humble. I confirmed with the following for both humble and iron:

docker run -it --rm --name nav2-test ros:$ROS_DISTRO
apt update && apt install ros-$ROS_DISTRO-navigation2 ros-$ROS_DISTRO-nav2* ros-$ROS_DISTRO-rmw-cyclonedds-cpp
export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
ros2 launch nav2_bringup navigation_launch.py
docker exec -it nav2-test /ros_entrypoint.sh bash
ros2 lifecycle set /planner_server cleanup
ros2 lifecycle set /planner_server configure

@SteveMacenski
Copy link
Member

Did this not used to happen in Humble or is this just something new you're experiencing because you're trying something new (e.g. checking if a regression or just uncovering an issue)?

If uncovering, I suspect why Iron works is due to 4aaefad fixing some lifecycle transition items which wasn't backported to Humble due to ABI breaking changes. Its possible however that we could find another way to do it in Humble with less major changes.

If its a regression, I'm at a bit of a loss unless it has nothing to do with Nav2 at all but rclcpp. In which case, I'd ask you to try the second-to-last Humble release to see if its still an issue to show that its not a change we made but something lower-level. At that point, a trace would be the only way to see what line is specifically triggering and try to backtrack from there

@Aposhian
Copy link
Contributor Author

I'm not sure if this is something new to the latest humble release. This was triggered by controller_server segfaulting unexpectedly and triggering a lifecycle reset of everything. I'll file an issue on that separate cause when I have more info.

@SteveMacenski
Copy link
Member

I'm going to play with this this afternoon and figure something out

@SteveMacenski
Copy link
Member

@Aposhian try this on for size, seems to work for me: #3786

@SteveMacenski
Copy link
Member

Closing, issue resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants