Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ros2/rmw_connext with rticommunity/rmw_connextdds #9

Closed
asorbini opened this issue Mar 5, 2021 · 51 comments
Closed

Replace ros2/rmw_connext with rticommunity/rmw_connextdds #9

asorbini opened this issue Mar 5, 2021 · 51 comments
Assignees

Comments

@asorbini
Copy link
Collaborator

asorbini commented Mar 5, 2021

This issue tracks several PRs aimed at switching the RMW implementation for RTI Connext DDS included in the ROS2 source tree from the one provided by ros2/rmw_connext (rmw_connext_cpp) to the one provided by rticommunity/rmw_connextdds (rmw_connextdds) in preparation for the ROS2 Galactic release.

This ros2.repos can be used to build the source tree using forks of all affected repositories.

@emersonknapp
Copy link
Contributor

Based on this ticket - are you also planning to remove rmw_connext from the default ros2.repos here https://github.com/ros2/ros2/blob/master/ros2.repos#L290 ?

@wjwwood
Copy link
Member

wjwwood commented Mar 5, 2021

@emersonknapp Once we get everything in place, I think that's the plan. We still have to review these pull requests of course.

@emersonknapp
Copy link
Contributor

👍 just trying to help make sure that list is complete - https://github.com/ros2/ci will need updates too

@asorbini
Copy link
Collaborator Author

asorbini commented Mar 5, 2021

@emersonknapp I plan on creating a PR for ros2/ci next. I've already gone through the exercise of finding all required changes, but I haven't yet got around to applying them in a fork.

Once I create that PR, I'll likely need some guidance to make sure things look good, since I'm not sure how we will approach these updates (especially when it comes to making sure nothing breaks catastrophically :) ).

In general you are right that all these changes will need to be coordinated with a timely update of the default ros2.repos (for which I will also create a PR soon), and all the necessary updates to ci.ros2.org.

I also modified ros2.repos to build the source tree with all the forked repos, which I've been using to test the changes using ci_launcher.

@asorbini asorbini added the enhancement New feature or request label Mar 6, 2021
@asorbini asorbini self-assigned this Mar 6, 2021
@asorbini asorbini removed the enhancement New feature or request label Mar 6, 2021
@clalancette
Copy link
Contributor

By the way, I realized one other thing. Once we merge all of these PRs and get a successful run on https://ci.ros2.org, we are also going to have to do a bloom release and get it onto https://build.ros2.org right away. That's because all Rpr jobs are going to fail until we get a successful rmw_connextdds release done via bloom.

@ivanpauno
Copy link
Member

I think that some of the PRs above are needed to make CI pass with rmw_connextdds, but others are only removing support for rmw_connext_cpp.

I would first only merge the ones in the first group, which is going to make things easier IMO.

PRs required to make rmw_connextdds pass CI:

Nice to merge:

Only removing support for rmw_connext_cpp/rmw_connext_dynamic_cpp:

@emersonknapp
Copy link
Contributor

emersonknapp commented Mar 8, 2021

we are also going to have to do a bloom release and get it onto https://build.ros2.org right away

For sequencing - I would expect ^ to be the first step once it has been determined to work correctly. It seems to me that there is definitely an order of operations that results in nothing being broken, and without having to synchronize any of the events. From my perspective, that looks something like:

  • Release rmw_connextdds
  • Add ci.ros2.org support for rmw_connextdds
  • Remove skipped tests for rmw_connext in various repositories (definitely not critical to adding the new implementation!)
  • Remove rmw_connext

Much of the above consideration seems to be concerned with atomically replacing, rather than the above approach of adding the new thing, stabilizing, then removing the old thing.

@asorbini
Copy link
Collaborator Author

asorbini commented Mar 8, 2021

Thank you all for your input. I'd like to combine your suggestions, in that I think we could rank the PRs in order of priority, and do staggered merges, while also proceeding with a more incremental policy.

Indeed all these PRs try to replace rmw_connext_cpp with rmw_connextdds "atomically", and we might be making the transition more complicated than it needs to be.

From your suggestions, I'd like to propose the following plan:

  • (@asorbini) Test rmw_connextdds along side rmw_connext_cpp on ci.ros2.org.
    • So far, I was able to get successful test runs for ci_linux, ci_osx, ci_windows, and ci_windows (Debug), using a ros2.repos that built only rmw_connextdds (and the fork of each repository).
    • If we want to proceed incrementally, I should first verify that similar results can be obtained with a ros2.repos that only adds rmw_connextdds without removing (or switching) any of the other repositories.
    • Tests (run with this ros2.repos) pass on ci_linux (two failures in a ros2param test run with rmw_cyclonedds_cpp), ci_osx (similar failures as Linux), ci_windows (a timer test in rcl failed, first time I see it fail, it passed when ran again).
    • (@asorbini) Run tests again to get green status.
  • (@asorbini) Release rmw_connextdds (0.1.0) with bloom
  • (@asorbini) New PR for ros2/ci to replace rmw_connext_cpp with rmw_connextdds on ci.ros2.org
    • Or should this change also be incremental?
    • Opened ros2/ci#547 to add rmw_connextdds while disabling rmw_connext_cpp,
  • (@asorbini) New PR for ros2/ros2 that adds rmw_connextdds to the list (but keeps rmw_connext_cpp).
    • Updated ros2/ros2#1099 to include both ros2/rmw_connext and rticommunity/rmw_connextdds.
  • Merge PRs required to make rmw_connextdds pass all tests:
    • ros2/sros2#253
      • PR updated to add rmw_connextdds next to rmw_connext_cpp.
    • ros2/rcl#895
      • The fix is required to make the test logic work, otherwise calls to rcl_get_node_names() after the first one will not call the RMW implementation.
      • I could split the fix for the test into a separate PR.
      • PR updated to add rmw_connextdds next to rmw_connext_cpp.
    • ros2/rmw_implementation#182
      • I could create a new PR that adds rmw_connextdds and doesn't remove rmw_connext_cpp.
      • PR updated to add rmw_connextdds next to rmw_connext_cpp.
    • (@asorbini) update this list after re-running tests with both rmw_connext_cpp and rmw_connextdds in the source tree.
    • ros2/system_tests#463
      • This PR is required to restore some test exceptions for rmw_connext_cpp, but most importantly, add some to prevent testing between rmw_connext_cpp and rmw_connextdds, since they do not interoperate OOTB (because of incompatible types propagated via discovery).
    • ros2/rosbag2#671
    • ros2/rclcpp#1574
      • These PRs are not strictly required, but they can be merged, since I've restored some test exceptions for rmw_connext_cpp.
    • ros2/rcl#900
      • This PR is needed to fix a test in rcl_action and make it pass with rmw_connextdds.
  • Merge PRs for ros2/ros2 and ros2/ci.
  • Merge other currently open PRs which will:

Does this sound reasonable and along the lines of what you had in mind?

Please feel free to edit/add to the list.

@ivanpauno
Copy link
Member

(@asorbini) New PR for ros2/ci to replace rmw_connext_cpp with rmw_connextdds on ci.ros2.org
Or should this change also be incremental?

IMO, removing the old implementations from CI directly is fine.

Does this sound reasonable and along the lines of what you had in mind?

The plan sounds perfect to me, @clalancette can you double check if this is fine?

@clalancette
Copy link
Contributor

IMO, removing the old implementations from CI directly is fine.

Yes, agreed. I think we should aim to completely replace rmw_connext_cpp, I just want to make sure we aren't causing problems for everyone else while we are doing it.

Two additional items:

  1. As we've discussed elsewhere, we need to change the cmake WARNING message in rmw_connextddsmicro to a STATUS message (so our CI will go green).
  2. In order for us to automatically branch from Rolling -> Galactic, the release repository needs to be under ros2-gbp. So I've gone ahead and created a release repository for you there (https://github.com/ros2-gbp/rmw_connextdds-release). If you need help with the bloom-release, please feel free to reach out to us; it can be a bit tricky the first time.

Other than that, the above plan looks great.

@asorbini
Copy link
Collaborator Author

I have made progress on several tasks (as reflected in the edited comment above).

@clalancette can you help me complete the "blooming" and releasing of rmw_connextdds? I also need help in figuring out how to make the proposed changes to ci.ros2.org (and if what I found is enough to disable rmw_connext_cpp and enable rmw_connextdds of course).

@ivanpauno can you help me progress the different PRs towards merging?

Thanks!

@clalancette
Copy link
Contributor

@clalancette can you help me complete the "blooming" and releasing of rmw_connextdds?

I think you've figured this out, since you successfully opened the PR. I've now fixed the name of the release repository under ros2-gbp, so you should be able to re-release there. Ping me again once you've re-opened the PR, I'll take a look at the release repository.

I also need help in figuring out how to make the proposed changes to ci.ros2.org (and if what I found is enough to disable rmw_connext_cpp and enable rmw_connextdds of course).

Essentially you can change the code on https://github.com/ros2/ci , and then there is a field in https://ci.ros2.org/job/ci_launcher/build?delay=0sec where you can specify the CI branch to use. Unfortunately, that only works for branches pushed directly to https://github.com/ros2/ci. I've gone ahead and created a new branch https://github.com/ros2/ci/tree/asorbini/rmw_connextdds ; when you think you have the fixes you want to the CI code, open a PR against that branch and I'll merge it immediately. Then you can run the CI job and test it out. Once we've figured out all of the changes necessary, we can squash them all together and do a proper review.

@asorbini
Copy link
Collaborator Author

asorbini commented Mar 11, 2021

Ping me again once you've re-opened the PR, I'll take a look at the release repository.

I bloomed a new release (0.2.1), and opened a new PR with the ros2-gbp/rmw_connextdds-release repository (ros/rosdistro#28572).

Essentially you can change the code on https://github.com/ros2/ci , and then there is a field in https://ci.ros2.org/job/ci_launcher/build?delay=0sec where you can specify the CI branch to use. Unfortunately, that only works for branches pushed directly to https://github.com/ros2/ci. I've gone ahead and created a new branch https://github.com/ros2/ci/tree/asorbini/rmw_connextdds ; when you think you have the fixes you want to the CI code, open a PR against that branch and I'll merge it immediately. Then you can run the CI job and test it out. Once we've figured out all of the changes necessary, we can squash them all together and do a proper review.

Thank you for the setup work, I'll try my hand at running with these changes and report back!

@clalancette
Copy link
Contributor

Do you think it might have something to do with the failure from debhelper to find nddsc.so?

Another difference is that rmw_connextdds links nddsc with the full path instead of just as -lnddsc:

Certainly one of those two things could be the problem. I don't really know since I wasn't involved in setting up rmw_connext_cpp originally.

Finally, is there a way for me to make changes to the repository and test changes locally without first blooming a new release and have its PR merged?

You can either do the "by-hand" method that I pointed out, or the docker method that @nuclearsandwich pointed out. Once you have things reasonably working there, I don't know of a further way to do testing except "live" on the buildfarm.

So my suggestion is that you make things work for you locally, then submit a PR and merge it. Then open the rosdistro PR. The cycle time to test things out is a little slow, but numbers are cheap so it doesn't matter if it takes a few iterations to get it right.

@ivanpauno
Copy link
Member

After some investigation, I noticed that the build for rmw_connext_shared_cpp and other packages loading Connext via connext_cmake_module and its CMake find script, will link nddsc and its dependencies after -Wl,--no-as-needed (e.g. -Wl,--no-as-needed -lnddsc -lnddscore -lnddscpp -lrticonnextmsgcpp -lpthread -ldl -Wl,--as-needed).

I tracked down why that was the case, and adding that flag seems to be recommended in the rti Connext 5.3.1 platform notes:

For i86Linux3gcc4.8.2, x64Linux3gcc4.8.2, i86Linux3gcc5.4.0, i86Linux3.xgcc4.6.3,
x64Linux3gcc5.4.0, and x64Linux3.xgcc4.6.3, when running on Ubuntu CPU for dynamic
release and dynamic debug libraries, also use the following:
-Wl,--no-as-needed

We're using newer gcc versions, so IDK if that's still needed or not.
The current error doesn't seem related to the lack of that flag anyway, that was causing undefined references to some symbols (see ros2/rcl#55 for historical examples).

@asorbini
Copy link
Collaborator Author

asorbini commented Mar 15, 2021

You can either do the "by-hand" method that I pointed out, or the docker method that @nuclearsandwich pointed out. Once you have things reasonably working there, I don't know of a further way to do testing except "live" on the buildfarm.

So my suggestion is that you make things work for you locally, then submit a PR and merge it. Then open the rosdistro PR. The cycle time to test things out is a little slow, but numbers are cheap so it doesn't matter if it takes a few iterations to get it right.

Yep, I was able to reproduce locally with @nuclearsandwich's instructions. I was thinking more along the lines of "is there a way for me to test the release process with a custom branch/tag of the repository so that I may make changes to the source repo and not have to regenerate the *.orig.tar.gz by hand?" (because I "suspect" the Docker container creates everything from the info in rolling/distributions.yaml and the release repo).

I think I'll do some tests with the "manual approach" that you suggested, and recreate an upstream tarball with modifications as needed.

@asorbini
Copy link
Collaborator Author

asorbini commented Mar 15, 2021

I think I found the issue!

When building with Connext 5.3.1, the CMake helper function rti_find_connextpro() defines an imported target RTIConnextDDS::c_api to mimic the behavior of the FindRTIConnextDDS.cmake script in 6.0.1. Unfortunately, it looks like the imported library is currently missing property IMPORTED_SO_NAME, which is causing CMake to use the full path of nddsc when linking rmw_connextdds_common.so, instead of -lnddsc.

Setting IMPORTED_SO_NAME true seems to fix the issue, and allows dh-shlibdeps to successfully find nddsc.

I will push the change and re-bloom the repository to validate the fix (also re-run some CI tests to make sure the property works across platforms).

EDIT: new PR for 0.3.1-1

@clalancette
Copy link
Contributor

It looks like it built! Woohoo!

So here is a follow-up list:

  1. It looks like the package https://build.ros2.org/job/Rbin_uF64__rmw_connextddsmicro__ubuntu_focal_amd64__binary/ is an empty package. That makes sense, since we don't currently have Connext Micro installed anywhere, and so the build just doesn't do anything. Given that that is the case, I'll highly suggest that we remove that package, so we don't give users the wrong impression on what is supported. The easiest way to do this is to add a file to the release repository that contains a list of the packages that you want ignored. See https://github.com/ros2-gbp/joystick_drivers-release/blob/master/rolling.ignored for an example. I'll suggest you create a "rolling.ignored" file containing "rmw_connextddsmicro", and then re-bloom.
  2. At this point you should be able to point your apt installation at the testing repository and install the packages. Something like echo "deb http://packages.ros.org/ros2-testing/ubuntu focal main" > /etc/apt/sources.list.d/ros2-latest.list, followed by sudo apt-get install ros-rolling-rmw-connextdds should do it. I'll suggest you do that and do some basic testing to make sure things work for you.
  3. Open up a PR based on https://github.com/ros2/ci/tree/asorbini/rmw_connextdds . We'll need to review and merge that.
  4. Merge Add support for rmw_connextdds rmw_implementation#182 .
  5. Merge Add rmw_connextdds ros2#1099 .
  6. Wait one overnight to make sure CI is happy.
  7. Assuming CI is happy, then we can start looking at removing the old RMW from ros2.repos and elsewhere.

We're getting close here!

@asorbini
Copy link
Collaborator Author

  1. It looks like the package https://build.ros2.org/job/Rbin_uF64__rmw_connextddsmicro__ubuntu_focal_amd64__binary/ is an empty package. That makes sense, since we don't currently have Connext Micro installed anywhere, and so the build just doesn't do anything. Given that that is the case, I'll highly suggest that we remove that package, so we don't give users the wrong impression on what is supported. The easiest way to do this is to add a file to the release repository that contains a list of the packages that you want ignored. See https://github.com/ros2-gbp/joystick_drivers-release/blob/master/rolling.ignored for an example. I'll suggest you create a "rolling.ignored" file containing "rmw_connextddsmicro", and then re-bloom.

Good idea! I had noticed the <release>.ignored in other release repositories, and I was already considering adding one. I added rmw_connextddsmicro to the ignored list and re-bloomed the repository.

  1. At this point you should be able to point your apt installation at the testing repository and install the packages. Something like echo "deb http://packages.ros.org/ros2-testing/ubuntu focal main" > /etc/apt/sources.list.d/ros2-latest.list, followed by sudo apt-get install ros-rolling-rmw-connextdds should do it. I'll suggest you do that and do some basic testing to make sure things work for you.

I did a local test, and the packages installed and ran a hello world talker/listener with no issue (including some basic command line tools, like ros2 node, ros2 param).

  1. Open up a PR based on https://github.com/ros2/ci/tree/asorbini/rmw_connextdds . We'll need to review and merge that.

I opened a PR on ros2/ci.

  1. Merge ros2/rmw_implementation#182 .

  2. Merge ros2/ros2#1099 .

  3. Wait one overnight to make sure CI is happy.

  4. Assuming CI is happy, then we can start looking at removing the old RMW from ros2.repos and elsewhere.

We're getting close here!

Definitely starting to see "the light at the end of the tunnel merge" 🎉

One final note on the CI tests: I re-ran the plans (with my CI branch), and it seems like the failure in test_tutorial_parameter_events__rmw_connextdds is consistent (see my previous comment in this thread).

Here are the results from my latest runs:

@ivanpauno
Copy link
Member

test_tutorial_parameter_events__rmw_connextdds

That one is new and started happening also with rmw_connext_cpp, I have opened a PR to fix it: ros2/demos#491.

@clalancette
Copy link
Contributor

clalancette commented Mar 23, 2021

All right, I've merged and deployed the changes to CI. Here's a job with no current changes, just to make sure things are working as I expect:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Here's a set of jobs with ros2/rmw_implementation#182 and ros2/ros2#1099 in place:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@clalancette
Copy link
Contributor

clalancette commented Mar 24, 2021

OK, we have a lot of this in now. Here's what I think still needs to be done:

Before freeze (April 5):

Ideally before freeze, but can be done after:

@clalancette
Copy link
Contributor

Here's a CI run with all of the open PRs to see where we are:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@clalancette
Copy link
Contributor

All of the failures in uncrustify were fixed by my latest commits in rclcpp, so that should go away. The only other failure was on Windows, but I think @asorbini knows about that one and is looking into it.

I'm going to go back through the open PRs and respond to comments now.

@asorbini
Copy link
Collaborator Author

The failure is slightly different from what I've seen before, but it still seems related to "clock jitter" on Windows (so to speak). I'll investigate the specific test case further to confirm it, but I would assess the failure as "minor" for now.

@asorbini
Copy link
Collaborator Author

@clalancette thank you for transferring the repository to the ros2 org 🎉

Here's some "follow up" items:

Q: should I run bloom again to generate a new release? (0.3.1-3, I still have a few things to verify before going to 0.4.0)

@clalancette
Copy link
Contributor

One more CI with all of the things that have been merged, plus the changes in ros2/rcl#903, ros2/rclpy#698, and ros2/rclcpp#1595:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@clalancette
Copy link
Contributor

Q: should I run bloom again to generate a new release? (0.3.1-3, I still have a few things to verify before going to 0.4.0)

No, it shouldn't be necessary. It will all just be in place for when the next release gets done.

@clalancette
Copy link
Contributor

All right, all things on the various lists here have been completed. In my opinion, this issue can be closed. @asorbini I'll let you do the honors.

@asorbini
Copy link
Collaborator Author

Haha, thank you @clalancette, appreciate the opportunity. I wouldn't have taken it personally though, I'm just glad we achieved this milestone 🎉 🎉 🎉

cwecht pushed a commit to cwecht/rmw_connextdds that referenced this issue Mar 2, 2023
Signed-off-by: Michael Carroll <michael@openrobotics.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants