Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded ROS 2 Design Page #197

Open
wants to merge 58 commits into
base: gh-pages
Choose a base branch
from
Open

Conversation

iluetkeb
Copy link

Start of the design page for the ROS 2 Embedded effort.

Includes contributions by all of the OFERA consortium partners.

ralph-lange and others added 30 commits October 9, 2018 10:34
Fixes ARM processor name.
Updates architecture diagram and fixes text typo.
Call it "wishlist" instead of requirements.
Split 'MCU-SPEC'.
Fix ANDROID->ARDUINO
Refs #2616. Fixes deliverable link.
@iluetkeb
Copy link
Author

@clalancette I see that the action's PR is on a branch in this repo. Maybe more people can add commits that way? If yes, maybe we could re-target this to a branch, merge it there without discussion, and then do a PR from that branch to gh2-pages and re-commence the discussion on that? Just let me know.

@iluetkeb
Copy link
Author

@clalancette I saw that it is easily possible to re-target the PR to a different branch. If you guys would be willing to give me access to the design repo, I'd very much prefer creating an "embedded_ROS2" branch there and continue the discussion in that.

@clalancette
Copy link
Contributor

@clalancette I saw that it is easily possible to re-target the PR to a different branch. If you guys would be willing to give me access to the design repo, I'd very much prefer creating an "embedded_ROS2" branch there and continue the discussion in that.

I'll check what our policy with that is. For now, I'll just pull this PR and push directly to a branch on this repo.

@clalancette clalancette mentioned this pull request Oct 29, 2018
@clalancette
Copy link
Contributor

All right, closing this in favor of #198

@clalancette clalancette removed the in review Waiting for review (Kanban column) label Oct 29, 2018
articles/embedded.md Outdated Show resolved Hide resolved
articles/embedded.md Outdated Show resolved Hide resolved
articles/embedded.md Outdated Show resolved Hide resolved
articles/embedded.md Show resolved Hide resolved
articles/embedded.md Show resolved Hide resolved
articles/embedded.md Show resolved Hide resolved
articles/embedded.md Show resolved Hide resolved
articles/embedded.md Show resolved Hide resolved
articles/embedded.md Show resolved Hide resolved
articles/embedded.md Show resolved Hide resolved
@gbiggs
Copy link
Member

gbiggs commented Oct 31, 2018

It looks like the discussion is going to move back here, so I'll leave my comments on this pull request.

First, some general comments. I read both this document and the linked OFERA report.

The biggest problem I have is that there is an inherent assumption right from the start that something else needs to be created to support small-scale embedded devices, i.e. rmw/rcl cannot be used directly even with modifications. I haven't seen any evidence to support this assumption and without seeing some I can't agree with the direction this document proposes for how ROS2 will support small-scale devices. I would prefer to see rmw/rcl used directly and modified where necessary, unless it can be shown that making them work at a small scale will compromise them for medium and large scales.

Not using rcl directly does two things that I don't like:

  • It creates a separate implementation of core ROS2 functionality, the urcl library, which undermines the concept that rcl provides the core functionality and all other client libraries just wrap rcl, meaning every client library gets exactly the same behaviour and receive any changes and fixes together.
  • It requires a bridge to talk between "normal" ROS2 and embedded ROS2 nodes.

For the linked report's statements and requirements, I have some additional comments relevant to the above:

  • The statement that micro-ROS will split functionality into separate libraries to enable developers saving resources "by picking only those features they really need" is something that would be useful at any scale, and I see no reason rmw/rcl cannot enable this in some way (changes in library structure, or using compile-time flags to enable/disable features, for example). rcl already has separate libraries for lifecyle nodes and actions and while I don't like the current layout of the APIs in regards to this separation, structurally this is clearly possible.
  • The micro-ROS APIs for things like life cycles, predictable scheduling and system modes are apparently going to be much richer than those in rcl due to "advanced concepts specific to micro-ROS". What are these advanced concepts and why are they specific to tiny embedded devices? For example, I think that rich control over predictable scheduling is something we will be very interested in for a real time system at any scale. I'm sure Dejan and his people at Apex.AI would agree with me and they are working at the level of autonomous cars.
  • There are many statements saying things like "create a generic framework in the spirit of ROS" and "be ROS2 compatible". These imply that the OFERA goal is to create something compatible with ROS2, which makes me think "then why is this design document going on ROS2's design site?" It's probably just wording for a project report, but I still think that it sounds weird.
  • All the things listed in the table in 4.2, such as allowing static node and topic layouts and support for sleep states are things that are desirable in rcl as well.
  • All of the performance requirements listed in section 5.2 are desirable at any scale. Even in a large system, I still want things like rapid start-up times, no-copy communication within the same MCU, and minimal power usage. It therefore makes more sense to me that rmw/rcl be improved to satisfy these requirements rather than putting that effort into a separate implementation. Some of the requirements are already satisfied by rcl.

So to sum up, if someone can show a compelling argument why rmw/rcl cannot satisfy the very small scale devices use case that this document targets, then I can agree with the need for a separate implementation. But so far I have only seen this as an unsupported assumption, and the listed requirements seem reasonable for rmw/rcl. Similarly, the wishlist are all things that I would want at larger scales as well.

@iluetkeb
Copy link
Author

iluetkeb commented Oct 31, 2018

@gbiggs I think we got off a bit on the wrong foot here, most likely because you recall an earlier discussion on discourse whether rcl could be re-used or whether there's going to be a ucrlc. Because, really, it has not been my intention to convey that there is a decision to do a ucrlc. If there's text in this document that says otherwise please point it out (and I've taken note on the sentence with "specialized" above).

That said, there is observable evidence from existing embedded implementations for small and tiny devices that should give us pause on whether barging ahead and porting rmw/rcl at all costs is the right approach. That's why this document is using fuzzy language that postpones such a decision and instead commits to measuring it.

FWIW, right now, as part of the OFERA project, eprosima has provided an rmw implementation for Micro XRCE-DDS. They also ported rcl, and there is ongoing work to port rclcpp. We have identified several issues with that (e.g., related to 64bit atomics) and are talking to the OSRF about it. @BorjaOuterelo can probably provide more info.

In addition to that, several people, including Robotis, Amazon, and myself, have also pursued alternatives to that, for various reasons. You may not like it, but I would argue that you actually should, because these alternatives provide concrete examples of how to do things differently, which IMHO is a much better foundation for discussion than "what if's".
Update: and this also allows us to show something desirable without having to figure out to reconcile it with sometimes conflicting current design decisions in rmw/rcl. Because, for example, things like static allocation are not only about allocating all the memory at the beginning, but also about being able to make the assumption that structures are going to be valid without having to check it again over and over. And that's not going to fit in very well with the way rcl does things currently.

This is ongoing work, and benchmarking results will trickle in, so that we get more evidence for decision-making.

btw, let me also note that this is not only about resource use, but also about how to get rid of the plethora of checks in rcl and rmw on whether a pointer is still valid ,-) I very much hope that at least some of that will get into rcl proper :-)

@dirk-thomas dirk-thomas reopened this Oct 31, 2018
@dirk-thomas dirk-thomas added the in progress Actively being worked on (Kanban column) label Oct 31, 2018
@dirk-thomas dirk-thomas mentioned this pull request Oct 31, 2018
@gbiggs
Copy link
Member

gbiggs commented Nov 1, 2018

it has not been my intention to convey that there is a decision to do a ucrlc. If there's text in this document that says otherwise please point it out (and I've taken note on the sentence with "specialized" above).

This document is more ambiguous about the intended approach. The linked OFERA report is not. It seems quire clear in that report that the intention is to create a separate software stack that is ROS2-compatible to meet the requirements. There are statements like:

  • "The next layer is the micro-ROS client library (urcl) analgously to the ROS client library (rcl) in ROS 2."
  • "Since the micro-ROS aims to create a generic framework for robotics in the spirit of ROS"

and even a requirement:

  • "Micro-ROS bridge: functionalities that are specific to the bridge between ROS2 and micro-ROS."

I don't know how I can read these any other way than that OFERA has decided from the outset that a separate software stack will be created. It's making this decision with any evidence for why it is necessary that bugs me. Of course I have no input into OFERA so if that project wants to do that, then it can go right ahead. However, when it comes for how ROS2 is going to define its tiny-scale embedded support, then I think more evidence needs to be provided to support such a decision. The problem comes from using that report as support for this design document, and this document also being written in a way that implies creating a separate software stack is the goal due to statements like the one near the very start that talks about creating a ROS2-interoperable stack. Perhaps such statements could be rephrased to include reusing the existing client libraries as much as possible, and making modifications where possible, and using as little custom code as possible?

several people, including Robotis, Amazon, and myself, have also pursued alternatives to that, for various reasons. You may not like it, but I would argue that you actually should, because these alternatives provide concrete examples of how to do things differently, which IMHO is a much better foundation for discussion than "what if's".

I don't think I said I don't like that work being done. If it seems like I did, then that was not my intention and I apologise for the confusion. I want to see these approaches tried and I want to see using rmw/rcl as much as possible tried, so we have numbers that can be used to make a decision. I even said as much in one of my previous comments.

@iluetkeb
Copy link
Author

iluetkeb commented Nov 1, 2018

Update: I had some more text here earlier, but it's really besides the point.

In the OFERA project, we're hedging our bets with regard to how much we can re-use, because there's some technical and organizational uncertainty. Please do not overinterpret this, and instead lets move on to the technical discussion. If we didn't want to play by the communities terms, we wouldn't be here.

"Micro-ROS bridge: functionalities that are specific to the bridge between ROS2 and micro-ROS."

That's mainly the XRCE-DDS agent. Because we're not using the same middleware, we have to bridge. People have tried to bring DDS to small devices, and it didn't work. That's not just a matter of memory, it's also about enabling power-saving etc. (which doesn't go well with a protocol that assumes you're always listening).

Things which might distinguish this bridge from the "plain" XRCE DDS agent include ROS-specific things, such as TF filtering, etc.

@gbiggs
Copy link
Member

gbiggs commented Nov 2, 2018

I think that the changes made in 2b7cb6c address my concerns about the planned direction for the work.

@smorita-esol
Copy link

smorita-esol commented Nov 2, 2018

Why don't you discuss the rmw (or rcl) plugging multiple middlewares?

I think the embedded ROS aims to support non-DDS or non-XRCE DDS middlewares.
And there are some cases that multiple middlewares exist in one system (e.g. standard DDS and MQTT).
In those cases, at least one node straddling over plural middlewares is needed to subscribe the one side's topic (standard DDS) and process it, publish the processed topic to the other side (MQTT), or vice versa.

This function is apparently needed in the embedded system where suitable middlewares and/or wire protocols tend to vary for each purpose.
But, it is also beneficial for non-embedded systems. For example, SONY stated that they are considering if their middleware is suitable for ros2 or not (https://roscon.ros.org/2018/presentations/ROSCon2018_Aibo.pdf, p29). ROS2 developer like SONY will probably implement the function above to harmonize their middleware with the standard DDS.

@iluetkeb
Copy link
Author

iluetkeb commented Nov 2, 2018

Why don't you discuss the rmw (or rcl) plugging multiple middlewares?

This would be an RCL topic in general, wouldn't it?

In my experience, for MCUs, this is decided at deployment time. Do you have use cases where it's decided at runtime?

In those cases, at least one node straddling over plural middlewares is needed to subscribe the one side's topic (standard DDS) and process it, publish the processed topic to the other side (MQTT), or vice versa.

In our systems, we have an agent that does such things. It is running on the Linux side. This is not something we burden then MCU with.

Maybe I should add a system sketch early on in the document, to show the overall system architecture and eco-system.

@smorita-esol
Copy link

Thanks for your comments.

This would be an RCL topic in general, wouldn't it?

Currently, I don't have any idea about which layer (rmw or rcl) is better to add the function.

In my experience, for MCUs, this is decided at deployment time. Do you have use cases where it's decided at runtime?

To avoid misunderstanding, I'd like to mention that those plural middlewares have to be used concurrently (not exclusively) in runtime. Of course, the set of middlewares to be used in runtime has to be decided until deployment time.

In our systems, we have an agent that does such things. It is running on the Linux side. This is not something we burden then MCU with.

It may be true if we choose the set of standard DDS and XRCE DDS. But, there are some middlewares which have no functions of the agent publishing/subscrinbing standard DDS topic.
If we accept those middlewares, we should provide the function to bridge over different middlewares without agent.
Once we provide the function, we can also choose the system configuration where the system has no DDS middlewares(e.g. combination of ZeroMQ and EtherCAT).

@iluetkeb
Copy link
Author

@clalancette What's your view on this? Can we merge?

AFAICT, we resolved all major issues with this version of the document. We also had a SIG telco a while ago where we agreed on the further direction (i.e., sticking with stock rcl/rmw).

Of course, as design documents go, it definitely needs further work and we're going to further develop it as we go along. However, since there have been no more issues identified, I think this version could go in as a baseline for now.

@iluetkeb
Copy link
Author

@gbiggs would you agree that we have addressed your points? You made one comment in this direction, but I'm not sure whether the PR as a whole has been marked valid.

## Important Differences / Assumptions

One of the major difference for the embedded stack is to be able to
run not (only) on Linux, but also on Real-Time Operating Systems, or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason why the existing ROS 2 code couldn't run on a Real-Time Operating Systems? I guess this is aiming for a very specific sub-category of OSes. Maybe it makes sense to state them here explicitly.

Copy link
Author

@iluetkeb iluetkeb Dec 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer to this depends on what you mean by "existing ROS 2". See our design diagram for the layers we expect to be able to re-use.

In more detail:

  • We expect that RCL and RMW can run on RTOS's with POSIX APIs when targeting devices with at least 32kB of RAM. Some modifications might be useful to make them run better, but that's a different topic.
  • DDS cannot run on most target devices, as is well known, hence the use of XRCE-DDS. This then also excludes the corresponding rmw_ and type_support_ layers implementations. We did rmw_ and type_support_ implementations based on Micro-XRCE-DDS, however, so in principle this runs.
  • rclcpp is tricky, because of the C++ standard library, which is large and not trivial to port. Borja from eProsima did some experiments with libcxx and it doesn't look too bad, but there's also a few issues. Moreover, it might take so much FLASH and RAM that most people want to avoid it. This is something where we still need more benchmarks. An alternative might be rclc.

After that, we get into the realm of libraries above rclcpp.

  • TF is important, and I've already done a bit of work on that, with more to come in early 2019. Most likely I will produce a stripped down 'C' version, so that we don't require rclcpp just for TF.
  • Executors could also benefit from specific support, as mentioned above

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding TF on embedded systems, have you checked the https://github.com/ESROCOS/tools-transformer library from the ESROCOS (https://www.h2020-esrocos.eu/) European project?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@traversaro: that seems to be C++ based?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@traversaro thanks for the hint, I wasn't aware of that but will check it out. are you involved with the project, so that I could ask you about it?

@gavanderhoorn C++ is not strictly impossible, most embedded compilers support it. what's more problematic is the standard library. but yeah, it's always something to look critically at, unfortunately (I usually prefer C++ over 'C' due to its better consistency features, but have had to accept that it can be an issue).

Copy link

@traversaro traversaro Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@traversaro thanks for the hint, I wasn't aware of that but will check it out. are you involved with the project, so that I could ask you about it?

I am not involved with the project at all, but I learned a lot about it by reading the deliverables, that are detailed but readable documents: https://www.h2020-esrocos.eu/publications/deliverable-documents/ .

@traversaro: that seems to be C++ based?

It is indeed C++, but if I remember correctly from the deliverable is meant to run on the top of RTEMS or similar OS, so I think it is still worth checking it out.


To cover the broad range of use-cases for MCUs in robotics, the embedded
ROS 2 stack shall be usable with the default ROS 2 middleware standard
DDS, simple (custom) serial communication protocols just as common
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo the statement "usable with" is not specific enough in terms of goals. In the meaning of "compatible with on the wire" there is simple no other choice than DDS. In the meaning of "compatible using e.g. a bridge" I don't see much value in mentioning it - that part is explicitly covered by W-SEAMLESS. Technically even ROS 1 is compatible in that sense using the ros1_bridge.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's meant here is "usable" in the same way that you can exchange rmw_-implementations now.

Not sure how to rephrase that so it becomes clearer. Maybe just mentioning rmw would do the trick? Other suggestions welcome.

@gbiggs
Copy link
Member

gbiggs commented Dec 11, 2018

@gbiggs would you agree that we have addressed your points? You made one comment in this direction, but I'm not sure whether the PR as a whole has been marked valid.

Yeah, I guess so. From your recent updates it sounds like you are going in the direction I prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress Actively being worked on (Kanban column)
Projects
None yet
Development

Successfully merging this pull request may close these issues.