-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unique network flows #304
Unique network flows #304
Conversation
@wjwwood Please have a look at this proposal. I would like to present it to the Middleware working group on a suitable occasion. |
This PR was presented to the middleware working group on 04-11-2020 (minutes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the proposal. I got a few questions, I might be mistaken on some points but could you kindly enlighten me a bit?
Update to latest upstream
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2020-11-19/17570/1 |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-native-5g-support-from-ericsson-and-eprosima/17705/2 |
4a998c7
to
e698688
Compare
Implementation pull-requests are now submitted to relevant repositories.
Initial contributions stem from Ericsson and eProsima. |
@anamud I'm sorry I'm responding so late, that seems to happen more often than I'd like ... What is the envisioned behaviour of a system where I have one publisher and two subscribers, and both subscribers request a unique network flow? Does that imply a multicast must be transformed into two unicast messages? And if those two subscribers happen to be in the same node (or DDS domain participant, so the same RMW context), is it then the intent that the same data is sent twice to the single subscribing process? The implication from the design document seems to be that this is the case, but I don't think that's wise: in a pub/sub system, it is perfectly feasible to identify "unique flows" on the publisher side, but doing so at the subscriber side gets you into trouble. I would say that assigning a unique flow id at the publisher side is meaningful. I would equally say that requesting separate data path on the both sides (i.e., socket, threads, buffers, the works) for processing data is also reasonable, but those are different concepts. Furthermore, as discussed during the discussion in the MW working group some time back, I am not generally in favour of sprinkling such low-level details in application code at the level of readers and writers: the QoS needs are really determined at system level, and mostly bound to topics. So side-loading it would generally be better in my view, and I would then like being able to set them on topics. Also mentioned in that call was that the point of abstracting the network with a pub/sub system is allowing you to avoid dealing with network protocols, addresses, port numbers, how many bits are available for uniquely identifying flows, &c. Here you explicitly introduce the IP model and protocol suite in the interface, but that means any other networking technology will have to fake things. For example, the LightFleet system introduced recently in a WG meeting gives you hardware reliable pub/sub, without IP addresses ... The cleaner (to me at least) but admittedly much harder model is to let the middleware map transport-independent QoS settings in whichever way it sees fit to provide the requested QoS. I.e., DDS natively has QoS settings for transport priority and latency budget, and one could conceivably add something about bit error rate (or likelihood of message loss). How exactly that should be mapped to IP QoS settings/network flows is a difficult but interesting subject. I wouldn't be surprised if that requires side-loading mapping tables, and that, incidentally, fits well with how TSN is designed to operate. |
e698688
to
770d1ee
Compare
@eboasson, thanks for the comment. I would like to clarify that the design proposes interfaces to create unique network flows for subscriptions and publishers. RMW implementations can decide to support either or both. Specifically, RMW implementations can decide not to support subscription-side unique flows if it complicates their design. Similarly, the rather odd case that two or more subscriptions to the same topic in the same node requesting for unique flows can be easily recognized and refused by RMW implementations. |
@eboasson, I agree that low-level concepts should not bleed into the upper layers and upset the separation of concerns. In our case, however, the introduction of low-level information is harmless and even sought after by ROS users. Just to clarify, according to our proposal, the only low-level information coming up to the RCLCPP layer are read-only details about network flows i.e., IP addresses, port numbers, flow labels, and protocol details. These are details that ROS programmers generally care about while debugging. For example, there are hundreds of questions on ROS Answers just about IP addresses. Also, an RMW that uses special networking (say an RMW based on LightFleet) can easily extend our proposed implementation for |
as someone who was answered many of those "hundreds of questions" about IP addresses on ROS Answers: I don't think those are relevant here. @eboasson is concerned about breaking the abstraction by leaking implementation details upwards into higher level layers. That concern seems justified. Providing infrastructure to facilitate debugging is always good. Doing it by punching holes in abstractions must be done only after careful consideration.
isn't the point raised by @eboasson that you're introducing that assumption here but it may not actually be true? Communication may have sources and destinations, but those are not always called addresses, nor does there necessarily have to be a 1-to-1 mapping of those to network concepts. |
@eboasson I like the ideas of building middleware that map DDS QoS to network QoS and accept mapping tables. That QoS should be not an application programmer concern but a system concern is a principle I take to heart as well. However, these thread of thoughts are a bit out of scope of our proposal. Our proposal is motivated by (5G) network QoS configuration and concretely suggests a minimal side-loading-friendly interface that enables ROS2 users to request for unique network flows and understand network flows. There is no suggestion in our proposal about how ROS should configure (5G) network QoS. |
@eboasson No problem! I understand the delay. Your comments helped me understand that the design proposal text is long-winded and does not describe the implementation choice clearly. I have now updated the design proposal to reflect the latest implementation. I believe this update improves readability and clears up confusion to a good degree. Please have a look and let me know if things are unclear. |
I agree. This is why the implementation accompanying the design proposal is submitted to the scrutiny of the middleware working group and the general ROS community on GitHub. Please provide comments on the implementation if you have time. |
This is indeed a valid concern. I understand that the idea of a flow can be different across networking stacks. Our proposal provides RMWs with the flexibility to define their own idea of a flow. They need not invent the concept of an address if they do not have such. |
@eboasson and @gavanderhoorn: Thanks for raising the concern about leaking abstractions in the proposed design. We have now updated the proposal to reduce the leak. Before, we had introduced IP and transport layer specific data structures into the RMW. Now these are removed and replaced by a byte-array/string that RMW implementations can populate in their own custom manner, similar to the existing |
@anamud thanks for clarifying various points and updating the PR to address these concerns about leaky abstractions. I am unconvinced by the solution, but that touches upon a larger point that perhaps needs a discussion in a wider audience. This proposal introduces a "unique flow" on subscriptions and publishers in the ROS 2 API, but in my current understanding, the meaning of both "unique" and "flow" are left unspecified, allowing them to be different between the two sides, not requiring them to do anything at all, and with any configuration information returned in an unspecified format in the operations for querying them. When I write it down like this, I have a feeling that I may be coming across as a bit negative about it. There is an aspect that I am indeed negative about (the extent to which things are unspecified), but at the same time I really do think that what you are trying to do (being able to set network-level QoS) is very important. I know there is a trade-off between trying to specify things tightly with the risk of being unable to express what you want or being unable to map it to the lower layers, and giving so much freedom in the interface that in order to use it effectively, you have to have knowledge of the specific lower layer (e.g., DDS over 5G) in the application for it to be useful. In my current view, this sits firmly on the side of too much freedom. From your comment:
I take it that we're not really far apart but just have a different idea of the right way to get there. I see adding this proposed interface as polluting the ROS 2 API with low-level details. Everything else that's there today tries to define clean abstractions and tries to limit itself to what is understood well enough to be confident that it will stand the test of time. (No doubt with some mistakes.) For example the pub/sub model with these QoS are (a subset of) an established model for abstracting communications; and executors that define how and when callbacks will fire avoids locking oneself into a single model that then turns out to be a bad choice (an area where things are changing right now). To stick with the way the ROS interface is defined, I think one must try to express the applications needs in terms of fairly generic QoS settings, and then map those to the underlying network. The current QoS of ROS are insufficient, but if one adds priority and latency to them, you are much closer to being able to define a mapping to flows. Perhaps those two alone will be enough, perhaps not. It will make things harder for the RMW implementations and the middleware, without a doubt. So be it. |
Looks like there are some linter issues ( |
Will fix soon. Thanks! |
Hi @anamud @wjwwood , I think I've taken care of all the uncrustify and cpplint issues in ros2/rmw_fastrtps@f8d4dab |
Here's new CI: The linkage issues on Windows look real though (different from the connextdds linkage issues). I think that will likely come up in the above CI again:
|
I'm still seeing quite a few failures from This job did test the commit from @EduPonz:
https://ci.ros2.org/job/ci_linux/14224/consoleFull But still had test failures: https://ci.ros2.org/job/ci_linux/14224/testReport/(root)/projectroot/cpplint_2/ There were also failures in https://ci.ros2.org/job/ci_linux/14224/testReport/junit/(root)/projectroot/cpplint/ And cyclonedds: https://ci.ros2.org/job/ci_linux/14224/testReport/junit/(root)/projectroot/cpplint_3/ Also, the windows compiler error I expected is still there:
|
I spawned some more CI to validate these changes based on @anamud's request in rmw_connextdds#13, because I failed to notice that @wjwwood had already done so. Linking them here for reference, feel free to stop them if they won't provide any new information: |
I've gone over the longer issues related with rwm_fastrtps again. Unfortunately, that's all that I can do, I think the rest is up to @anamud , who has write access to the affected respositories. Thanks guys again! |
@wjwwood The Windows build in the last CI flow failed due to errors rooted at missing forward declarations and incorrect scoping of friend functions. I have fixed the errors now and tested for correct working on a local Windows compilation setup. Please start a new CI flow. |
There are some compiler warnings from Windows related to https://ci.ros2.org/job/ci_windows/14335/msbuild/new/ I think you could use |
@wjwwood @MiguelCompany I handled the remaining three warnings (all in Please start a new CI flow. |
Thanks for the new CI run. @wjwwood @MiguelCompany @EduPonz Please let me know how to proceed. I am unable to resolve the two new unstable markers. They are unrelated to unique network flows. One of them is being discussed as a flaky test here. The other is a CMake warning about policy CMP0115 in |
I think both of those are known, so I believe this CI is good! Thanks for iterating and for making such a neat and organized set of pull requests! |
I've merged everything but this design document. I'll try to pick that up sometime soon, but would love someone else to review and merge it if they have time. |
Excellent work, everyone! Thanks for the timely support, that too during the long Easter weekend. |
Signed-off-by: Ananya Muddukrishna <ananya.x.muddukrishna@ericsson.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to pick that up sometime soon, but would love someone else to review and merge it if they have time.
LGTM too.
Signed-off-by: Ananya Muddukrishna <ananya.x.muddukrishna@ericsson.com>
Signed-off-by: Ananya Muddukrishna <ananya.x.muddukrishna@ericsson.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks to everyone for the discussion and thanks @anamud for the contributions! |
This is a proposal for unique network flow identifiers for publishers and subscribers in communicating nodes. Unique network flow identifiers enable publishers and subscribers to obtain required QoS unambiguously from machine-machine networks such as 5G.
Signed-off-by: Ananya Muddukrishna ananya.x.muddukrishna@ericsson.com