Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch nav2_map_server to use modern CMake idioms. #4557

Merged

Conversation

clalancette
Copy link
Contributor


Basic Info

Info Please fill out this column
Ticket(s) this addresses Follow up to #4357
Primary OS tested on Ubuntu 24.04
Robotic platform tested on N/A
Does this PR contain AI generated software? No

Description of contribution in a few bullet points

This commit does a number of things:

  1. Switches to using target_link_libraries in nav2_map_server, which gives us finer-grained control over what dependencies are exported to downstream as public or private.
  2. Moves the include directory down one level to include/${PROJECT_NAME}, which is best practice in ROS 2 since Humble.
  3. Makes sure to export nav2_map_server as a CMake target, so downstream users of it can use that target.
  4. Adds in lifecycle_msgs as a dependency.
  5. Possibly most controversially, changes one test. While running the tests I noted some odd output from test_map_saver_cli.cpp. In particular, it was complaining that __node:=map_saver_test_node was deprecated. That is true, but adding in --ros-args --remap didn't work, and that is because a space was missing from the front of that. With that change now in place, I also had to update the return value from the command, since it now returns 256 instead of 65280.

Description of documentation updates required from your changes

None needed


Future work that may be required in bullet points

The rest of the packages in this repository need a similar treatment (PRs will be upcoming).

For Maintainers:

  • Check that any new parameters added are updated in docs.nav2.org
  • Check that any significant change is added to the migration guide
  • Check that any new features OR changes to existing behaviors are reflected in the tuning guide
  • Check that any new functions have Doxygen added
  • Check that any new features have test coverage
  • Check that any new plugins is added to the plugins page
  • If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists

Signed-off-by: Chris Lalancette <clalancette@gmail.com>
nav2_map_server/CMakeLists.txt Show resolved Hide resolved
return_code = system(command.c_str());
EXPECT_EQ(return_code, 65280);
EXPECT_EQ(return_code, 256);
Copy link
Member

@SteveMacenski SteveMacenski Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No note assuming CI test is happy, but noting this for myself to check. Map server package seems to have failed its CI tests, but still running so I can't see where yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it looks like there are bunch of failing external symbols in the tests. That means that something is wrong with this PR. I'll take a closer look and see if I can reproduce locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can't reproduce, and looking at the code again there is nothing obviously wrong jumping out at me.

I'm not that familiar with CircleCI, but in the job below it never seems to run the "Setup Workspace" or "Build Workspace" steps that are listed in https://github.com/ros-navigation/navigation2/blob/main/.circleci/config.yml . Looking at those logs would be helpful for me to try and reproduce.

@SteveMacenski Do you know why those steps aren't present in that run, and/or how I might get access to those logs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release_build job below does the building of the main software stack (system_build does the system tests separately). The job that's failing (release_test) is only doing the colcon test bits on those built and cached artifacts from the builds.

You can find the build artifacts here: https://app.circleci.com/pipelines/github/ros-navigation/navigation2/12291/workflows/ec12a613-f542-4479-9c2b-640744e3fbaf/jobs/37306/artifacts
You can find the test artifacts here: https://app.circleci.com/pipelines/github/ros-navigation/navigation2/12291/workflows/ec12a613-f542-4479-9c2b-640744e3fbaf/jobs/37308/artifacts

I'm also retriggering the build from the start to validate its not a fluke, but it does seem like its a problem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else worth trying is to update v26 to v27 in the 3x places in the circle config file to break the cache which may contain the previous works and rebuild from scratch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for pointing out where it was built. Following that, I was able to see that the CircleCI job uses a number of mixins, and compiling locally with those mixins I can reproduce. Now to see what is going on here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I found the issue. 66e300b should fix it.

@SteveMacenski
Copy link
Member

SteveMacenski commented Jul 24, 2024

@clalancette unrelated to this PR, but wanted to point something out since it relates to the issues we reported during the Jazzy migration:

The CI test failure we see in goal_updater_node that you remarked on in the last PR seems to be caused by spin-ing not working the first time its called. When we duplicate the spin_all at https://github.com/ros-navigation/navigation2/blob/main/nav2_behavior_tree/plugins/decorator/goal_updater_node.cpp#L60, it works again. This is aligned with what we had to do in some tests during the migration to get them to work on Jazzy. Perhaps its just spin_all with a timeout that's broken? Not sure, but a note in case this rhythms with other reports in rclcpp.

Blaming Executor::spin_some_impl(std::chrono::nanoseconds max_duration, bool exhaustive), I see there's been alot of poking around there recently in the last few months, and a PR to fix some other regression in May, which makes me think maybe there's another regression lurking in that corner.

Test A:

  callback_group_executor_.spin_all(std::chrono::milliseconds(200));

Pushing the 50ms up to 200ms still doesn't work

Test B:

  callback_group_executor_.spin_all(std::chrono::milliseconds(50));
  callback_group_executor_.spin_all(std::chrono::milliseconds(1));

This works

Test C:

   callback_group_executor_.spin_some(std::chrono::milliseconds(50));

Attempting to use spin some instead of spin all also fails

Test D:

Lets try ticking the node multiple times while having the single spin_all(50ms) (which would cause the spin to be called multiple times)

  tree_->rootNode()->executeTick();
  tree_->rootNode()->executeTick();
  tree_->rootNode()->executeTick();

This works


Thus, I can change it to (#4558):

  callback_group_executor_.spin_all(std::chrono::milliseconds(1));
  callback_group_executor_.spin_all(std::chrono::milliseconds(49));

to keep the same behavior and get it working again, but this is worth noting to someone on the ROS 2 core side

It requires it for proper operation.

Signed-off-by: Chris Lalancette <clalancette@gmail.com>
@clalancette
Copy link
Contributor Author

Blaming Executor::spin_some_impl(std::chrono::nanoseconds max_duration, bool exhaustive), I see there's been alot of poking around there recently in the last few months, and a PR to fix some other regression in May, which makes me think maybe there's another regression lurking in that corner.

Yes, there has been a lot of poking at that recently.

Something that is new-ish here is ros2/rclcpp#2517, which was merged back on May 2nd, but may or may not have been released until recently (it is a little hard to tell exactly when something was merged). If you go back to rclcpp version 28.2.0, does that make a difference?

@SteveMacenski
Copy link
Member

SteveMacenski commented Jul 24, 2024

I tested and no double-spin is required for this test with 28.2.0 (i.e. patch in #4558 to fix our CI is not required). So that's a new change in behavior from rclcpp causing the issue in our CI.

Note that we've seen some 'warm up spins' required since the initial Jazzy migration to 24.04 #4298 (late May) which introduced a double spin or 'while not received -> spin some' logic. Rolling's release on April 30 had 28.2.0 https://discourse.ros.org/t/new-packages-for-ros-2-rolling-ridley-2024-04-30/37485 so we had also seen it then in some other cases - but perhaps those were fixed by the later changes and no longer required. I'm investigating that now.

@SteveMacenski SteveMacenski merged commit e03013b into ros-navigation:main Jul 24, 2024
9 of 10 checks passed
@SteveMacenski
Copy link
Member

Your changes fixed it (minus the test we're discussing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants