-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to rmw for deterministic execution #259
Comments
Tagging @wjwwood @mjcarroll @nburek for feedback. |
@iluetkeb Thanks for starting the discussion on making the scheduling more deterministic and for the details in the paper. One clarifying question, is the execution of callbacks out of order for a single subscription on a topic or are they just out of order across multiple topics/subscriptions? To me it makes more sense for the RMW layer to be providing the information needed for the rcl/rclcpp layers to be making the scheduling decision as opposed to allowing the rmw to fully dictate which are the highest priority. If you're entrusting the rmw layer to make decisions about priority instead of providing just metadata then it becomes harder to make behavior guarantees in rcl/rclcpp because it depends on the implementation and if you're going to dictate every rmw layer implement something the same way say that it is consistent then it makes sense to me to just move that up the stack. Do you have a separate design issue related to the changes that would be made in rcl/rclcpp to enable the deterministic scheduling? Have you considered alternatives to arrival priority scheduling? It's not unreasonable to me to imagine that ROS will eventually support two or more schedules, one being a more fair and one being a more hard realtime that would starve some topics to guarantee others. As the paper points out, the priority inversion caused by the way the current design polls for information and then handles it all the events before polling for more would also need to be solved to be more deterministic in other scenarios. I bring all this up because I think it's worth thinking now about the long term requirements needed for rcl/rclcpp to have different schedulers and how to enable all these future use cases now. That may be as simple as providing the additional metadata, like arrival time, when you poll for messages or it may require some other changes to how events are polled. |
The paper discusses the ordering of callbacks across multiple topics/services. Within a single topic, the paper assumes that messages are processed in arrival order (which, to the best of my knowledge, is the default setting for the DDS middleware)
I definitely agree. Generally, allowing more experimentation and domain-specific adjustments is a good thing. The main question is how expensive this is in terms of performance, and how this interacts with DDS QoS policies.
It is definitely possible that we end up with multiple schedulers, although I still hope that we are able to implement an efficient earliest-deadline-first scheduler (which would give us a single scheduler that at the same time provides strong real-time guarantees for users who take the time to configure it while behaving just like FIFO in the default state). Independent of whether that works out or not, I'm pretty sure we're going to want some kind of arrival-based scheduler if at all possible, just because they are easy to understand, are what people probably expect from the outset, and are starvation-free. I think it is very valuable to have at least a prototype implementation of that as soon as possible, even if the long-term requirements are not clear yet. |
As a preface let me say that, all other things being equal, I generally prefer to put functionality in the place where the necessary information is. In this case, the middleware has all the information, so it would be fairly natural to do that. btw, please note that this includes more than just timestamp: The middleware also has QoS settings and it has information about where a message came from. However, I get the feeling that you guys think that executor is something we have under control, but that we don't have the middleware under control, and that is why you want to put it into the executor? Is that right? If yes, please read on below.
I think I see your point, but the middleware can always make decisions in a way we cannot override, through QoS settings. It will drop messages when buffers are full, it will prioritize messages when QoS says so, etc. One of my goals is to keep DDS QoS mechanisms working. If that wasn't a goal, then the easiest way to get a deterministic executor would be to have a thread that just polls the middleware as fast as it can and If we accept this in general, then it might not be such a bad idea to have the middleware do the prioritization, because we can influence how it does it through QoS settings. Also, please note that we can always choose to ignore the ordering suggested by the middleware. Of course, it could be that QoS settings are too complex, or not powerful enough to do what we need. At the moment, I can't really judge that, because I have not looked at them in detail, and I even more importantly, I don't have all the use cases in mind. If we want to do arrival ordering, it would be trivial, of course. If you have other things in mind, please let me know. What I have looked at in detail is rmw_fastrtps and the relevant bits of Fast-RTPS, and they make me lean towards option 2. In any case, I would think that the executor can still achieve some measure of control by waiting on just some of the communication objects, instead of all of them.
Not yet. However, I think it can be fairly easily implemented by having a new memory_strategy with the current executor, and by treating timers differently. The main part seems to be getting the info.
Well, using topics with differing priorities in the same executor could probably be called a mixed-criticality setting (@tobiasblass correct me if I get the terms wrong...) This is in general hard to get right. For the moment, I would be happy with real-time-safe nodes at all, and assuming that they only consume information of (roughly) equal importance. This would basically partition the system into real-time and non-real-time parts. btw, some reasons why mixed criticality is hard:
Absolutely agreed. At the moment, the only thing I can think of to address this is that we have to ask the middleware for an update after every callback. Unfortunately, in the current implementation of both the executor and the rmw-implementation, this approach is prohibitively expensive, because the overhead is so huge. My ideal approach would be that the rmw-implementation does not have to completely re-create the wait_set on every call to rmw_wait, but that it could just update it incrementally with things that have changed since the last call. |
I think the main issue is that ROS 2 supports many different middlewares, so it's hard to make assumptions about what the middleware supports. Anything that is part of the executor is the same across all ROS 2 configurations.
I agree with you, but let me point out that this is not necessarily about QoS settings but about QoS settings that are provided by the user without ROS knowing about it. It is at least conceivable that a deterministic executor could pick the appropriate middleware QoS settings, which would not suffer from this problem.
At least in the real-time community, mixed-criticality is a fixed term for something different (roughly speaking, this is about changing your workload and scheduling parameters to a safer state with less features if you notice that, e.g., some task overruns its expected execution time). What you are describing would just be called real-time scheduling on the executor level.
True, although this problem does not appear on the executor but on the process level. As far as I know, the inbound and outbound queues are shared between all executors in the same process.
I think that is a separate decision. It is in some sense more difficult than in the preemptive case, but people have successfully built real-time systems based on non-preemptive schedulers. The main difficulty is that your longest-running callback determines the reaction time of everything else, but if you account for that this works just fine. |
Sure. But expecting the middleware to be able to sort messages in arrival order is not asking much, I would say.
Well, I think I mean something slightly different. What I mean is that we're already using the middleware to make some decisions for us. For example, when configuring queue sizes -- that decision has an impact on scheduling. The middleware is the layer that implements this choice.
I'll check that. In any case, I would certainly like to consider this case, but it goes far beyond what we can achieve simply by looking at the rmw API.
This requires guarantees on all callbacks, though, right? If we have a system where messages on some topics have deadlines, but messages on other topics don't, it might be that we cannot give this guarantee either. In that case, I would suggest partitioning differently. |
I would agree with you in the case where it makes sense for the component with the information to be making those evaluations. In this case, it feels like we are pushing too much of the business logic of scheduling execution down into the messaging middleware layer.
From the perspective of someone writing a node that will be used by other people (think MoveIt) the only guarantees that you have are those provided by the rcl/rclcpp layer that you're programming against (and by translation the guarantees provided to them by the rmw api). Right now the scheduling is handled in rclcpp such that they get some level of guarantees to behavior regardless of which which rmw implementation is loaded by the end user at runtime. Right now nothing in the rmw api enforces a contract for how anything is scheduled. So if we do go down the route of allowing the rmw layer to dictate scheduling priorities I personally would want to see it defined in the API contract. However, doing that starts to add a lot of burden on the rmw maintainers to implement all the supported priority management. It will also lead to a lot of duplication of logic in every rmw implementation (of which I think there are at least 4-5 that people actively worked on in the last 6 months, two of which aren't DDS based). The reason I would push toward having the business logic of the scheduling in the rcl/rclcpp layer would be because it would continue to give some level of guarantee to the node authors while not pushing a lot of additional burden down into the rmw layers. In reality, the final solution might be some combination of the two.
The reason I was asking if you had thought about any alternative scheduling policies besides just first arrival of all messages was because I was wondering if we can make a case for improving the rmw interface. If we have a set of requirements for schedulers in general and the current rmw layer isn't meeting them or is two expensive then I think we can evaluate if there are changes that would not only enable arrival based scheduling, which I totally agree is a very valuable first step, but also make it possible to write other scheduling policies. It's hard to know we're making the right decision on changing the rmw layer with future proofing in mind if we don't think a little more broadly.
Yes, I was thinking about different executors in different processes (or entire systems) wanting different types of scheduling, but I suppose it would also be possible to have a mixture of executors with different policies in the same process. Though I don't know if it's a real use case to have multiple executors working on a shared object, so I don't think we need to worry about those complexities.
I don't think it is, but nothing in the existing API contract enforces it and so you will start to have greater disparity between RMW layers. |
@nburek I see your points, and I generally agree, particularly about the need for explicit contracts. What might be a relevant observation here, is that just providing timestamps to the upper layers is already non-trivial, and also puts expectations on the middleware. At least that's the impression I got from reading through rmw_fastrtps and Fast-RTPS source code. Firstly, it requires reading more data than is currently done before taking the message. Secondly, what we really want is a source timestamp, which DDS has, but not all middlewares do. Therefore, telling the middleware "give me N messages in arrival order", might actually be easier for the rmw-implementors than having them provide timestamps for all currently available messages. It would also be a very explicit contract. But I agree that it is very specific contract as well, and that we would do well to consider other use cases before embarking on changing the rmw interface.
As @tobiasblass said, a major scheduling approach we are looking at "earliest deadline first" (EDF) scheduling. btw, note that DDS also has a concept of a deadline, which can be set in both the Writer and the Reader. At the moment I'm not sure whether this is actually used for prioritization, or only to remove data that has expired, but it is certainly useful to consider information provided by the sender as well as the reader. Anyway, a simple approach would be define the deadline as "source timestamp + acceptable delay" (which may be different per topic, thus giving us prioritization). In case we don't have a source timestamp, we would substitute arrival time, and then end up with arrival-based ordering. Of course, incoming data is not the only source of work. We also need to look at periodic or explicitly invoked callbacks from within the process. This should fit into that approach as well. The second major approach, which we have already implemented, is static scheduling. That's simple, and has very little requirements on the rmw interface. Regarding the rmw API in general: I think that right now, the rmw_wait call is very inefficiently implemented at least by rmw_fastrtps, and so are the interface-related parts of the executor and its memory_strategies. It is probably to improve upon that within the constraints of the current API, but I also have some ideas on a better information exchange interface. That is future work, though. |
btw, I asked Arne about mixed-criticality, and he explained that it's about a mixture of Safety Integrity Levels on the same device. This has some connection to scheduling, of course, but its origin is elsewhere. |
Mixed-criticality is fun because you have to proove that the lower SIL aspects are not going to interfere with the higher SIL aspects. Things like not having priority inversions, not allowing a lower SIL process to suck up all the CPU time or memory, no deadlocks, etc. Hypervisors and partitioning are often used for this sort of thing. I think in ROS you would be unlikely to put different safety levels in a single node. It would make your safety-critical aspects harder to prove correct. My feeling is that you would want to separate out your safety-critical architecture from the rest of your application. However you still need a way to interact between the two so having priorities within a single node may still be necessary. |
Priorities should be configurable but furthermore currently executed callbacks need to be interrupted should a higher priority task come in.
With DDS DDS allows ordering by reception, sender or custom timestamp (destination order QOS), where a custom timestamp usually would be the sensor data timestamp or similar. Note that having a custom timestamp is the only deterministic option that leads to same outcome on the same data set in successive runs of an application.
I think copying messages is a no-go for most modern robotics applications such as autonomous driving.
We implement real-time nodes using See the following diagram on how we do this in detail: The above architecture allows:
I think implementing a true real-time executor in C++ currently fails on the fact that function can not be suspended externally (even C++20 coroutines can only yield from inside the function itself AFAIK).
Fully agree!
In our internal RMW implementation, we do exactly that and only apply diffs to our waitset. Still, the current design of RMW, especially the waitset forces such an inefficient implementation that we are forced to sidestep the RMW abstraction to achieve satisfying performance. |
@deeplearningrobotics sorry for the late reply, we've had a holiday here in Germany.
I see two answers to this: One is, as @tobiasblass mentioned above, that you can give RT guarantees without preemption. They might not be as tight, but still. The other answer is that you can already have pre-emption by using multiple executors. However, in both cases, we have a problem, and that is that we don't have deadline information right now. RMW's current API doesn't give that information to us. That's mainly what this issue is about.
Yeah, but not by RMW. In any case, if we use the middleware functionality for this (which I like), that would be option 2 of my original list. If we pass on deadlines to the higher layers, that would be option 1. We're currently trying to decide which option is better. Any input on that would be much appreciated.
I wasn't aware of #256 so far, thanks for the link. However, as far as I can discern, that mainly targets zero-copy. It does not seem to address how to figure out whether, and if yes how much, data is available. This is what rmw_wait is currently used for, but its API is problematic.
Exactly! That's what we're trying to improve here. Since you seem to have done this already: Do you have performance data on the current approach vs. yours that you'd be willing to share? |
Hey guys, sorry for not reply here, but I have been following it with interest. I hope to discuss a lot of this with you guys at ROSCon and then follow up here, but in the meantime I'll just make a few comments on what I re-read today.
I actually think this is non-trivial. Even in DDS ordering them across multiple subscriptions will require reading the data for the sample info's and manually tracking which you should take from next. Unless you use a Listener, but that's more or less equivalent an Executor in that you don't have much control on ordering among multiple data readers. But I could be wrong.
I think it is only used to notify the user when data was not received on time. The QoS "lifespan" is used to remove data that is "too old" usually. Neither control presentation order as far as I know, but that is a separate QoS in DDS.
I think this was addressed, but I'll just re-iterate that there's no way to do this without the code to be preempted offering preemption points periodically. (for example see |
It looks like from the discussion so far, the callback execution order would generally be scheduled based on a timestamp and a explicit priority of a topic/subscription that would be specified by the user. We shouldn't forget that the callbacks of timers ( Given this, I think it makes more sense to bubble the timestamps up to the Regarding the discussion for option 1, I think the data size for keeping track of message timestamps would be bounded by the depth of the history QoS policy (which could mean unbounded). |
I thought preemption is a bit out of scope for this issue, but anyway... There is probably no way to interrupt a callback that is currently executing, but we can at least rearrange the callback execution order while a callback is executing and update the next callback to be executed before the current callbacks are finished. |
While this is a common approach, it is not as easy to get right as one might think. Therefore, we're favoring a deadline-based approach instead of a priority-based one. btw, I just learned at ROSCon that our notion of a deadline is different from the DDS one. We consider a deadline to be the latest time by which the data has to be handled by a user callback. Nevertheless, the deadline can also be user-specified information, so I agree with your general argument that the ordering will have to take information from various sources into account. That said, I am increasingly coming to the view that my original question raised in this ticket cannot be answered without a more fine-grained design for both options, where we have considered all the implications fully. |
Okay, I see. That just means that the callback execution will be ordered based on a
Right. Your notion of deadline sounds more like the |
In principle, it should be
Yes, the definition of that sounds right. btw, I would expect that a real-time capable middleware implementation, when running on a real-time capable transport, could use this information to configure the transport appropriately. |
See @iluetkeb's PRs ros2/rcl#591 and ros2/rmw#200 |
I strongly object to the changes in ros2/rmw#200, ros2/rcl#591
|
Hi @rotu, thank you for your comments. I'm sorry you strongly object, but I hope I can address your concerns, since we have actually considered many of the same points in our design discussions.
I'm not sure what this refers to, but not reinventing what is provided by DDS already has been a major concern during the design.
We do not ignore it. In fact the SampleInfo is used by my reference implementation, ros2/rmw_fastrtps#359, for all types (subscriptions, clients and services) and we can switch to that as soon as Fast-RTPS 1.10 is included. Until then, the version for Fast-RTPS 1.9.x, ros2/rmw_fastrtps#363, uses SampleInfo for clients and services. It does not yet use it for subscriptions, because a method is lacking that allows us to retrieve SampleInfo without read or take (using either of those would change semantics). For clients and services, they are always taken immediately, so we can avoid this. btw, the version for 1.10 does not use the source timestamp anymore, but the newly introduced receive timestamp, also from the SampleInfo. We have discussed which timestamp is more appropriate and currently believe that the receive timestamp is more appropriate. If you object, we can discuss this, of course.
I'm not sure which ambiguity you refer to. In any case, the timestamp we use is always the one that matches the time the message has been received originally. As said above, this is retrieved from the SampleInfo.
Agreed. It is not the intention to run an event as soon as things are happening. What we aim for is that, if you have messages on multiple sources available at the same time, that their arrival order is available for consideration by the executor.
I share the view that the current execution pipeline has many inefficiencies, and am actively involved in efforts to change this. However, I believe this PR does not make the situation worse. Yes, it allocates more memory to hold the timestamps, but this memory can be allocated once, just like the rest of the wait_set. Nothing in the rcl_wait API requires that you dynamically re-allocate it, unless the size of the wait-set changes (which does not happen often in most applications). It is possible that some executors do this, but that can (and should) be changed independently. Lastly, in comparison to how much memory we spend in other places, I think the list of timestamps is small fry.
I don't understand this part. Maybe you can look at what I wrote above, regarding our use of the SampleInfo and the reference implementation, and tell me if this comment is still valid?
This information is only available after you take the message. However, we want to take messages only when we actually intend to process them immediately, because otherwise we would interfere with the history QoS. This is one of those things were we have not opted for the "simple" solution, precisely because we don't want to reimplement what DDS already provides. That means, we need the timestamp first, and then we can decide what to take. That is important because it may well be that by the time we're done processing the selected message, the one that would have been second has already been replaced by new data. |
DDS already has a concept of waitset which the An entity in a DDS wait set becomes "ready" when its status matches some given mask. This change means that when processing the wait set, the application now has to not simply wait for the status flag to be true, it has to look into that sample and expose information that should be available through other means. Worse, if there is any message history, this forces you to call
I mean we don't expose the DDS SampleInfo In the DDS spec,
The ambiguity I'm referring to is that the condition in the wait set is broader than what the timestamp represents. A subscription should be ready when the ReadCondition becomes true, but this means that the timestamp should now change when a sample is received or taken, even if the ReadCondition is unchanged.
It can be made faster, but it's disingenuous to expect it will be.
Yes, still valid. If the timestamp is supposed to refer to when the ReadCondition in the waitset becomes true, it is the wrong timestamp. If it doesn't, the waitset is the wrong level of abstraction for it.
If single ticks of the executor are routinely picking up enough work to stall out the system long enough to evict history significantly, then no amount of prioritizing here will fix that; it might even be desirable to read data rather than drop it on the floor or else topics with shorter histories might never get work done if backlogs are serious. If you need to borrow the data temporarily before seeing whether you want to actually deserialize and use it, you can use a DDS |
OK, ros2/rclpy#542 has the first attempt at Python support. |
@rotu are you going to be looking at rmw_cyclonedds for this, or do you know if somebody else is doing that? |
I’m on it |
@rotu @wjwwood I just pushed some more changes related to clients and services. for that, I also added timestamps to rmw_request_id_t. While doing so, I noticed that the timestamp which is currently copied over from the sample-info in rmw_fastrtps isn't quite what I expected. that's why the message_info test in rcl is currently failing. I am not currently sure why that is -- maybe the sample-info has a different notion of what the timestamp should contain. or maybe I'm doing the copying wrong. I'll work on that tomorrow. @rotu please be aware that this is currently enabled for fast-rtps only. if you want to check it on cyclone-dds, please modify the condition in the test/cmakelists.txt for test_message_info. |
Okay, here we go for services:
This is the service part of #259 Update: Added rmw_cyclonedds. And here is a repos file to get all of those at once. |
And the message-info changes are in
Edit I've now provided my own PR for rmw_cyclonedds which works with the new timestamp storage. |
Well. The message-info changes should be ready. The service info changes are mostly complete, but they lack a dedicated test. Such a test could be easily added to test_service and test_client in rcl, in the same way that I added the timestmap test to test_subscription. However, it's 1am here and I think if I continue, I will make more of a mess than help. @wjwwood if you want to pick this up, it would be very much appreciated! |
I'll do my best. |
* Add timestamps to message info See ros2/design#259 Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com> * Add initializer for message_info_t Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com> * Use rmw_time_point_value_t instead of rmw_time_t Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>
* Fill message_info timestamp Implements ros2/design#259 Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com> * Use rmw_time_point_value_t instead of rmw_time_t Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>
Alright firstly, @wjwwood thanks for merging the message info changes! Happy we got that in! @jacobperron the service timestamp changes are ready from my side. hope CI is okay. Since the PRs come from an organizational fork, maintainers cannot directly edit them, but if there is a need for further modifications, @Karsten1987 has write permissions on all the repos. |
* replace rmw_request_id_t with rmw_service_info_t on select functions for ros2/design#259 Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com> * use rmw_time_point_value_t instead of rmw_time_t Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com> * Remove duplicate typedef Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>
* remember the sample info on requests and responses Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com> * This implements services timestamps Requires ros2/rmw#217 Realizes the 2nd part of ros2/design#259 Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com> * use rmw_time_point_value_t instead of rmw_time_t Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com> * snake_case Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>
Alright, the work set out in this ticket is largely done: We've merged API changes for acquiring timestamps for both topics and services and we have at two implementations of those already (at varying degrees of completeness, but both sufficient for now). This has been quite a bit more work than expected, but I'm happy we've got a good solution now. A big "thank you" to everybody who participated, particularly @wjwwood, @rotu, @eboasson and @dirk-thomas for sticking with it, and getting it into Foxy. @rotu and @eboasson, thanks for your objections and explanations -- I think it has made a big difference to the result. Hopefully for the better, but of course, as they say, all remaining errors are mine ;-) To those participating in the earlier part of the discussion: Thanks for the reference information on your respective solutions to this. The work on the executor to actually make use of all this will keep us busy for a while. I'm keeping this ticket open for the moment -- I would like to merge some documentation at least, summarizing all that we have discussed here. Since I had to push other work away to complete this, it will take a moment until I can return, but feel free to ping me if you think it takes too long ;-) |
This issue has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros2-middleware-change-proposal/15863/18 |
@iluetkeb friendly ping; what's the status of this ticket? |
Background
In Casini et al, 2019, it is shown that execution of callbacks can happen in a different order than messages coming in. See this We also have an example for proof. This is usually not what is expected, and it happens only by accident, too.
Goal
We want to execute messages in importance order, which -- in the absence of other priorities -- is usually message arrival order.
Problem
To execute in message-arrival order, the executor needs ordering information. This is currently not possible in the
rmw
API, because thermw_wait_set
only has binary information: Either data is available on a topic or not. We don't know how old it is. Moreover, a topic may have more than one message waiting, where some or all may be older than a message for a different topic. These other messages will currently be skipped until the next invocation of rmw_wait.Options
I see two general approaches to address this:
Q: Can anybody think of different options?
Discussion
Option 1) keeps the current
rmw
design, but adds more data. This appears more straightforward at first, but since there may be multiple messages waiting for each object, the data size is unpredictable. Also, it is not trivial to obtain this information from the middleware. The naive implementation has to deserialize all messages to get the SampleInfo. Alternatively, we could keep a listener attached at all times, and use it to determine arrival time. Or, we could modify the middleware to maintain this information for us without having to deserialize.Option 2) either changes rmw_wait, or adds a new function with these new semantics. This will likely require more modifications in the rmw-implementations, but it would likely provide better options for the rmw-implementations to optimize obtaining this information. It would also limit the data-size, and could even make use of QoS information on the middleware layer.
The text was updated successfully, but these errors were encountered: