Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique network flows #304

Merged
merged 20 commits into from
May 27, 2021
Merged

Unique network flows #304

merged 20 commits into from
May 27, 2021

Conversation

anamud
Copy link
Contributor

@anamud anamud commented Oct 16, 2020

This is a proposal for unique network flow identifiers for publishers and subscribers in communicating nodes. Unique network flow identifiers enable publishers and subscribers to obtain required QoS unambiguously from machine-machine networks such as 5G.

Signed-off-by: Ananya Muddukrishna ananya.x.muddukrishna@ericsson.com

@anamud
Copy link
Contributor Author

anamud commented Oct 16, 2020

@wjwwood Please have a look at this proposal. I would like to present it to the Middleware working group on a suitable occasion.

@anamud
Copy link
Contributor Author

anamud commented Nov 5, 2020

This PR was presented to the middleware working group on 04-11-2020 (minutes).

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anamud

thanks for the proposal. I got a few questions, I might be mistaken on some points but could you kindly enlighten me a bit?

articles/unique_network_flows.md Outdated Show resolved Hide resolved
articles/unique_network_flows.md Outdated Show resolved Hide resolved
articles/unique_network_flows.md Outdated Show resolved Hide resolved
Update to latest upstream
@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2020-11-19/17570/1

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-native-5g-support-from-ericsson-and-eprosima/17705/2

@anamud
Copy link
Contributor Author

anamud commented Dec 21, 2020

Implementation pull-requests are now submitted to relevant repositories.

Initial contributions stem from Ericsson and eProsima.

@eboasson
Copy link

@anamud I'm sorry I'm responding so late, that seems to happen more often than I'd like ...

What is the envisioned behaviour of a system where I have one publisher and two subscribers, and both subscribers request a unique network flow? Does that imply a multicast must be transformed into two unicast messages? And if those two subscribers happen to be in the same node (or DDS domain participant, so the same RMW context), is it then the intent that the same data is sent twice to the single subscribing process?

The implication from the design document seems to be that this is the case, but I don't think that's wise: in a pub/sub system, it is perfectly feasible to identify "unique flows" on the publisher side, but doing so at the subscriber side gets you into trouble.

I would say that assigning a unique flow id at the publisher side is meaningful. I would equally say that requesting separate data path on the both sides (i.e., socket, threads, buffers, the works) for processing data is also reasonable, but those are different concepts.

Furthermore, as discussed during the discussion in the MW working group some time back, I am not generally in favour of sprinkling such low-level details in application code at the level of readers and writers: the QoS needs are really determined at system level, and mostly bound to topics. So side-loading it would generally be better in my view, and I would then like being able to set them on topics.

Also mentioned in that call was that the point of abstracting the network with a pub/sub system is allowing you to avoid dealing with network protocols, addresses, port numbers, how many bits are available for uniquely identifying flows, &c. Here you explicitly introduce the IP model and protocol suite in the interface, but that means any other networking technology will have to fake things. For example, the LightFleet system introduced recently in a WG meeting gives you hardware reliable pub/sub, without IP addresses ...

The cleaner (to me at least) but admittedly much harder model is to let the middleware map transport-independent QoS settings in whichever way it sees fit to provide the requested QoS. I.e., DDS natively has QoS settings for transport priority and latency budget, and one could conceivably add something about bit error rate (or likelihood of message loss). How exactly that should be mapped to IP QoS settings/network flows is a difficult but interesting subject. I wouldn't be surprised if that requires side-loading mapping tables, and that, incidentally, fits well with how TSN is designed to operate.

@anamud anamud force-pushed the unique_network_flows branch from e698688 to 770d1ee Compare February 16, 2021 10:24
@anamud
Copy link
Contributor Author

anamud commented Feb 16, 2021

The implication from the design document seems to be that this is the case, but I don't think that's wise: in a pub/sub system, it is perfectly feasible to identify "unique flows" on the publisher side, but doing so at the subscriber side gets you into trouble.

@eboasson, thanks for the comment. I would like to clarify that the design proposes interfaces to create unique network flows for subscriptions and publishers. RMW implementations can decide to support either or both. Specifically, RMW implementations can decide not to support subscription-side unique flows if it complicates their design. Similarly, the rather odd case that two or more subscriptions to the same topic in the same node requesting for unique flows can be easily recognized and refused by RMW implementations.

@anamud
Copy link
Contributor Author

anamud commented Feb 16, 2021

Also mentioned in that call was that the point of abstracting the network with a pub/sub system is allowing you to avoid dealing with network protocols, addresses, port numbers, how many bits are available for uniquely identifying flows, &c. Here you explicitly introduce the IP model and protocol suite in the interface, but that means any other networking technology will have to fake things. For example, the LightFleet system introduced recently in a WG meeting gives you hardware reliable pub/sub, without IP addresses ...

@eboasson, I agree that low-level concepts should not bleed into the upper layers and upset the separation of concerns. In our case, however, the introduction of low-level information is harmless and even sought after by ROS users. Just to clarify, according to our proposal, the only low-level information coming up to the RCLCPP layer are read-only details about network flows i.e., IP addresses, port numbers, flow labels, and protocol details. These are details that ROS programmers generally care about while debugging. For example, there are hundreds of questions on ROS Answers just about IP addresses.

Also, an RMW that uses special networking (say an RMW based on LightFleet) can easily extend our proposed implementation for get_network_flow() with its own specific flow details. A flow is essentially a tuple with source and destination addresses and is fundamental to every communication mechanism.

@gavanderhoorn
Copy link
Contributor

gavanderhoorn commented Feb 16, 2021

These are details that ROS programmers generally care about while debugging. For example, there are hundreds of questions on ROS Answers just about IP addresses.

as someone who was answered many of those "hundreds of questions" about IP addresses on ROS Answers: I don't think those are relevant here.

@eboasson is concerned about breaking the abstraction by leaking implementation details upwards into higher level layers. That concern seems justified.

Providing infrastructure to facilitate debugging is always good. Doing it by punching holes in abstractions must be done only after careful consideration.

A flow is essentially a tuple with source and destination addresses and is fundamental to every communication mechanism.

isn't the point raised by @eboasson that you're introducing that assumption here but it may not actually be true? Communication may have sources and destinations, but those are not always called addresses, nor does there necessarily have to be a 1-to-1 mapping of those to network concepts.

@anamud
Copy link
Contributor Author

anamud commented Feb 16, 2021

Furthermore, as discussed during the discussion in the MW working group some time back, I am not generally in favour of sprinkling such low-level details in application code at the level of readers and writers: the QoS needs are really determined at system level, and mostly bound to topics. So side-loading it would generally be better in my view, and I would then like being able to set them on topics.

The cleaner (to me at least) but admittedly much harder model is to let the middleware map transport-independent QoS settings in whichever way it sees fit to provide the requested QoS. I.e., DDS natively has QoS settings for transport priority and latency budget, and one could conceivably add something about bit error rate (or likelihood of message loss). How exactly that should be mapped to IP QoS settings/network flows is a difficult but interesting subject. I wouldn't be surprised if that requires side-loading mapping tables, and that, incidentally, fits well with how TSN is designed to operate.

@eboasson I like the ideas of building middleware that map DDS QoS to network QoS and accept mapping tables. That QoS should be not an application programmer concern but a system concern is a principle I take to heart as well. However, these thread of thoughts are a bit out of scope of our proposal. Our proposal is motivated by (5G) network QoS configuration and concretely suggests a minimal side-loading-friendly interface that enables ROS2 users to request for unique network flows and understand network flows. There is no suggestion in our proposal about how ROS should configure (5G) network QoS.

@anamud
Copy link
Contributor Author

anamud commented Feb 16, 2021

@anamud I'm sorry I'm responding so late, that seems to happen more often than I'd like ...

@eboasson No problem! I understand the delay. Your comments helped me understand that the design proposal text is long-winded and does not describe the implementation choice clearly. I have now updated the design proposal to reflect the latest implementation. I believe this update improves readability and clears up confusion to a good degree. Please have a look and let me know if things are unclear.

@anamud
Copy link
Contributor Author

anamud commented Feb 16, 2021

Providing infrastructure to facilitate debugging is always good. Doing it by punching holes in abstractions must be done only after careful consideration.

I agree. This is why the implementation accompanying the design proposal is submitted to the scrutiny of the middleware working group and the general ROS community on GitHub. Please provide comments on the implementation if you have time.

@anamud
Copy link
Contributor Author

anamud commented Feb 16, 2021

isn't the point raised by @eboasson that you're introducing that assumption here but it may not actually be true? Communication may have sources and destinations, but those are not always called addresses, nor does there necessarily have to be a 1-to-1 mapping of those to network concepts.

This is indeed a valid concern. I understand that the idea of a flow can be different across networking stacks. Our proposal provides RMWs with the flexibility to define their own idea of a flow. They need not invent the concept of an address if they do not have such.

@anamud
Copy link
Contributor Author

anamud commented Feb 17, 2021

@eboasson and @gavanderhoorn: Thanks for raising the concern about leaking abstractions in the proposed design. We have now updated the proposal to reduce the leak. Before, we had introduced IP and transport layer specific data structures into the RMW. Now these are removed and replaced by a byte-array/string that RMW implementations can populate in their own custom manner, similar to the existing get_gid() method in rclcpp. Please have a look and let us know your thoughts.

@eboasson
Copy link

@anamud thanks for clarifying various points and updating the PR to address these concerns about leaky abstractions. I am unconvinced by the solution, but that touches upon a larger point that perhaps needs a discussion in a wider audience.

This proposal introduces a "unique flow" on subscriptions and publishers in the ROS 2 API, but in my current understanding, the meaning of both "unique" and "flow" are left unspecified, allowing them to be different between the two sides, not requiring them to do anything at all, and with any configuration information returned in an unspecified format in the operations for querying them.

When I write it down like this, I have a feeling that I may be coming across as a bit negative about it. There is an aspect that I am indeed negative about (the extent to which things are unspecified), but at the same time I really do think that what you are trying to do (being able to set network-level QoS) is very important.

I know there is a trade-off between trying to specify things tightly with the risk of being unable to express what you want or being unable to map it to the lower layers, and giving so much freedom in the interface that in order to use it effectively, you have to have knowledge of the specific lower layer (e.g., DDS over 5G) in the application for it to be useful. In my current view, this sits firmly on the side of too much freedom.

From your comment:

I like the ideas of building middleware that map DDS QoS to network QoS and accept mapping tables. That QoS should be not an application programmer concern but a system concern is a principle I take to heart as well. However, these thread of thoughts are a bit out of scope of our proposal. Our proposal is motivated by (5G) network QoS configuration and concretely suggests a minimal side-loading-friendly interface that enables ROS2 users to request for unique network flows and understand network flows. There is no suggestion in our proposal about how ROS should configure (5G) network QoS.

I take it that we're not really far apart but just have a different idea of the right way to get there.

I see adding this proposed interface as polluting the ROS 2 API with low-level details. Everything else that's there today tries to define clean abstractions and tries to limit itself to what is understood well enough to be confident that it will stand the test of time. (No doubt with some mistakes.) For example the pub/sub model with these QoS are (a subset of) an established model for abstracting communications; and executors that define how and when callbacks will fire avoids locking oneself into a single model that then turns out to be a bad choice (an area where things are changing right now).

To stick with the way the ROS interface is defined, I think one must try to express the applications needs in terms of fairly generic QoS settings, and then map those to the underlying network. The current QoS of ROS are insufficient, but if one adds priority and latency to them, you are much closer to being able to define a mapping to flows. Perhaps those two alone will be enough, perhaps not.

It will make things harder for the RMW implementations and the middleware, without a doubt. So be it.

@wjwwood
Copy link
Member

wjwwood commented Apr 2, 2021

Looks like there are some linter issues (uncrustify and cpplint) too.

@anamud
Copy link
Contributor Author

anamud commented Apr 2, 2021

CI is failing on rmw_connextdds for some reason, @anamud is this something you can investigate in the next 24 hours or so? If not, I can try, but I'm not sure I'll have time.

@wjwwood I have solved the error and updated the PR. Please see here. I would appreciate it if you start a new CI flow.

@anamud
Copy link
Contributor Author

anamud commented Apr 2, 2021

Looks like there are some linter issues (uncrustify and cpplint) too.

Will fix soon. Thanks!

@EduPonz
Copy link

EduPonz commented Apr 2, 2021

Hi @anamud @wjwwood , I think I've taken care of all the uncrustify and cpplint issues in ros2/rmw_fastrtps@f8d4dab

@wjwwood
Copy link
Member

wjwwood commented Apr 2, 2021

Here's new CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

The linkage issues on Windows look real though (different from the connextdds linkage issues). I think that will likely come up in the above CI again:

23:10:53 C:\ci\ws\src\ros2\rclcpp\rclcpp\include\rclcpp/network_flow_endpoint.hpp(96,6): error C2375: 'rclcpp::operator ==': redefinition; different linkage (compiling source file C:\ci\ws\src\ros2\rclcpp\rclcpp\src\rclcpp\time_source.cpp) [C:\ci\ws\build\rclcpp\rclcpp.vcxproj]

23:10:53 C:\ci\ws\src\ros2\rclcpp\rclcpp\include\rclcpp/network_flow_endpoint.hpp(77): message : see declaration of 'rclcpp::operator ==' (compiling source file C:\ci\ws\src\ros2\rclcpp\rclcpp\src\rclcpp\time_source.cpp) [C:\ci\ws\build\rclcpp\rclcpp.vcxproj]

23:10:53 C:\ci\ws\src\ros2\rclcpp\rclcpp\include\rclcpp/network_flow_endpoint.hpp(100,6): error C2375: 'rclcpp::operator !=': redefinition; different linkage (compiling source file C:\ci\ws\src\ros2\rclcpp\rclcpp\src\rclcpp\time_source.cpp) [C:\ci\ws\build\rclcpp\rclcpp.vcxproj]

23:10:53 C:\ci\ws\src\ros2\rclcpp\rclcpp\include\rclcpp/network_flow_endpoint.hpp(78): message : see declaration of 'rclcpp::operator !=' (compiling source file C:\ci\ws\src\ros2\rclcpp\rclcpp\src\rclcpp\time_source.cpp) [C:\ci\ws\build\rclcpp\rclcpp.vcxproj]

23:10:53 C:\ci\ws\src\ros2\rclcpp\rclcpp\include\rclcpp/network_flow_endpoint.hpp(104,16): error C2375: 'rclcpp::operator <<': redefinition; different linkage (compiling source file C:\ci\ws\src\ros2\rclcpp\rclcpp\src\rclcpp\time_source.cpp) [C:\ci\ws\build\rclcpp\rclcpp.vcxproj]

23:10:53 C:\ci\ws\src\ros2\rclcpp\rclcpp\include\rclcpp/network_flow_endpoint.hpp(81): message : see declaration of 'rclcpp::operator <<' (compiling source file C:\ci\ws\src\ros2\rclcpp\rclcpp\src\rclcpp\time_source.cpp) [C:\ci\ws\build\rclcpp\rclcpp.vcxproj]

@wjwwood
Copy link
Member

wjwwood commented Apr 3, 2021

I'm still seeing quite a few failures from ament_cpplint and ament_uncrustify.

This job did test the commit from @EduPonz:

16:14:54 === src/ros2/rmw_fastrtps (git) ===
16:14:54 commit f8d4dab0e2b5aff971fd73106b916a352b7ac596 (HEAD -> feature/unique_network_flows, origin/feature/unique_network_flows)
16:14:54 Author: EduPonz eduardoponz@eprosima.com
16:14:54
16:14:54 Fix linting

https://ci.ros2.org/job/ci_linux/14224/consoleFull

But still had test failures:

https://ci.ros2.org/job/ci_linux/14224/testReport/(root)/projectroot/cpplint_2/
https://ci.ros2.org/job/ci_linux/14224/testReport/(root)/projectroot/uncrustify/

There were also failures in rmw_connextdds_common:

https://ci.ros2.org/job/ci_linux/14224/testReport/junit/(root)/projectroot/cpplint/

And cyclonedds:

https://ci.ros2.org/job/ci_linux/14224/testReport/junit/(root)/projectroot/cpplint_3/
https://ci.ros2.org/job/ci_linux/14224/testReport/junit/(root)/projectroot/uncrustify_2/


Also, the windows compiler error I expected is still there:

16:59:21 C:\ci\ws\src\ros2\rclcpp\rclcpp\include\rclcpp/network_flow_endpoint.hpp(96,6): error C2375: 'rclcpp::operator ==': redefinition; different linkage (compiling source file C:\ci\ws\src\ros2\rclcpp\rclcpp\src\rclcpp\time_source.cpp) [C:\ci\ws\build\rclcpp\rclcpp.vcxproj]

16:59:21 C:\ci\ws\src\ros2\rclcpp\rclcpp\include\rclcpp/network_flow_endpoint.hpp(77): message : see declaration of 'rclcpp::operator ==' (compiling source file C:\ci\ws\src\ros2\rclcpp\rclcpp\src\rclcpp\time_source.cpp) [C:\ci\ws\build\rclcpp\rclcpp.vcxproj]

16:59:21 C:\ci\ws\src\ros2\rclcpp\rclcpp\include\rclcpp/network_flow_endpoint.hpp(100,6): error C2375: 'rclcpp::operator !=': redefinition; different linkage (compiling source file C:\ci\ws\src\ros2\rclcpp\rclcpp\src\rclcpp\time_source.cpp) [C:\ci\ws\build\rclcpp\rclcpp.vcxproj]

16:59:21 C:\ci\ws\src\ros2\rclcpp\rclcpp\include\rclcpp/network_flow_endpoint.hpp(78): message : see declaration of 'rclcpp::operator !=' (compiling source file C:\ci\ws\src\ros2\rclcpp\rclcpp\src\rclcpp\time_source.cpp) [C:\ci\ws\build\rclcpp\rclcpp.vcxproj]

16:59:21 C:\ci\ws\src\ros2\rclcpp\rclcpp\include\rclcpp/network_flow_endpoint.hpp(104,16): error C2375: 'rclcpp::operator <<': redefinition; different linkage (compiling source file C:\ci\ws\src\ros2\rclcpp\rclcpp\src\rclcpp\time_source.cpp) [C:\ci\ws\build\rclcpp\rclcpp.vcxproj]

16:59:21 C:\ci\ws\src\ros2\rclcpp\rclcpp\include\rclcpp/network_flow_endpoint.hpp(81): message : see declaration of 'rclcpp::operator <<' (compiling source file C:\ci\ws\src\ros2\rclcpp\rclcpp\src\rclcpp\time_source.cpp) [C:\ci\ws\build\rclcpp\rclcpp.vcxproj]

@asorbini
Copy link

asorbini commented Apr 3, 2021

I spawned some more CI to validate these changes based on @anamud's request in rmw_connextdds#13, because I failed to notice that @wjwwood had already done so.

Linking them here for reference, feel free to stop them if they won't provide any new information:

  • Linux: Build Status
  • macOS: Build Status
  • Windows: Build Status

@EduPonz
Copy link

EduPonz commented Apr 3, 2021

I've gone over the longer issues related with rwm_fastrtps again. Unfortunately, that's all that I can do, I think the rest is up to @anamud , who has write access to the affected respositories. Thanks guys again!

@anamud
Copy link
Contributor Author

anamud commented Apr 3, 2021

@wjwwood @EduPonz @asorbini I have fixed the latest errors pointed by the CI system and addressed review comments. Please have a look and start a new CI flow if feasible. Appreciate your help with CI runs.

@wjwwood
Copy link
Member

wjwwood commented Apr 3, 2021

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@anamud
Copy link
Contributor Author

anamud commented Apr 4, 2021

@wjwwood The Windows build in the last CI flow failed due to errors rooted at missing forward declarations and incorrect scoping of friend functions. I have fixed the errors now and tested for correct working on a local Windows compilation setup.

Please start a new CI flow.

@wjwwood
Copy link
Member

wjwwood commented Apr 4, 2021

New CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@wjwwood
Copy link
Member

wjwwood commented Apr 5, 2021

There are some compiler warnings from Windows related to strncpy:

https://ci.ros2.org/job/ci_windows/14335/msbuild/new/

I think you could use https://github.com/ros2/rcutils/blob/5d3cecc8af601c10561f7c17d76eab740bb540ce/include/rcutils/snprintf.h#L59 instead of strncpy to avoid this warning without doing your own "if win32" kind of logic.

@MiguelCompany
Copy link

@wjwwood @anamud Just fixed the two warnings on rmw_fastrtps

@anamud
Copy link
Contributor Author

anamud commented Apr 5, 2021

@wjwwood @MiguelCompany I handled the remaining three warnings (all in rmw). See here. Thanks for the suggestion to use rcutils_snprintf().

Please start a new CI flow.

@wjwwood
Copy link
Member

wjwwood commented Apr 5, 2021

Thanks! New CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status (known flaky tests)
  • Windows Build Status (known cmake warning from cyclonedds)

@anamud
Copy link
Contributor Author

anamud commented Apr 5, 2021

Thanks for the new CI run.

@wjwwood @MiguelCompany @EduPonz Please let me know how to proceed. I am unable to resolve the two new unstable markers. They are unrelated to unique network flows.

One of them is being discussed as a flaky test here.

The other is a CMake warning about policy CMP0115 in src/eclipse-cyclonedds/cyclonedds/src/idl/CMakeLists.txt at line 41.

@wjwwood
Copy link
Member

wjwwood commented Apr 5, 2021

I think both of those are known, so I believe this CI is good!

Thanks for iterating and for making such a neat and organized set of pull requests!

@wjwwood
Copy link
Member

wjwwood commented Apr 5, 2021

I've merged everything but this design document. I'll try to pick that up sometime soon, but would love someone else to review and merge it if they have time.

@anamud
Copy link
Contributor Author

anamud commented Apr 5, 2021

Excellent work, everyone! Thanks for the timely support, that too during the long Easter weekend.

Signed-off-by: Ananya Muddukrishna <ananya.x.muddukrishna@ericsson.com>
Copy link
Contributor

@hidmic hidmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to pick that up sometime soon, but would love someone else to review and merge it if they have time.

LGTM too.

articles/unique_network_flows.md Outdated Show resolved Hide resolved
Ananya Muddukrishna added 2 commits May 21, 2021 15:55
Signed-off-by: Ananya Muddukrishna <ananya.x.muddukrishna@ericsson.com>
Signed-off-by: Ananya Muddukrishna <ananya.x.muddukrishna@ericsson.com>
Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@wjwwood
Copy link
Member

wjwwood commented May 27, 2021

Thanks to everyone for the discussion and thanks @anamud for the contributions!

@wjwwood wjwwood merged commit 4c9b374 into ros2:gh-pages May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants