Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to rmw for deterministic execution #259

Closed
iluetkeb opened this issue Sep 24, 2019 · 53 comments
Closed

Improvements to rmw for deterministic execution #259

iluetkeb opened this issue Sep 24, 2019 · 53 comments
Labels
enhancement New feature or request

Comments

@iluetkeb
Copy link

iluetkeb commented Sep 24, 2019

Background

In Casini et al, 2019, it is shown that execution of callbacks can happen in a different order than messages coming in. See this We also have an example for proof. This is usually not what is expected, and it happens only by accident, too.

Goal

We want to execute messages in importance order, which -- in the absence of other priorities -- is usually message arrival order.

Problem

To execute in message-arrival order, the executor needs ordering information. This is currently not possible in the rmw API, because the rmw_wait_set only has binary information: Either data is available on a topic or not. We don't know how old it is. Moreover, a topic may have more than one message waiting, where some or all may be older than a message for a different topic. These other messages will currently be skipped until the next invocation of rmw_wait.

Options

I see two general approaches to address this:

  1. We ask the middleware for timestamps on all waiting messages and perform ordering on the executor level.
  2. We ask the middleware "which object(s), of the ones in the wait_set, should we handle next?" where "next" is typically decided by "has the oldest data".

Q: Can anybody think of different options?

Discussion

Option 1) keeps the current rmw design, but adds more data. This appears more straightforward at first, but since there may be multiple messages waiting for each object, the data size is unpredictable. Also, it is not trivial to obtain this information from the middleware. The naive implementation has to deserialize all messages to get the SampleInfo. Alternatively, we could keep a listener attached at all times, and use it to determine arrival time. Or, we could modify the middleware to maintain this information for us without having to deserialize.

Option 2) either changes rmw_wait, or adds a new function with these new semantics. This will likely require more modifications in the rmw-implementations, but it would likely provide better options for the rmw-implementations to optimize obtaining this information. It would also limit the data-size, and could even make use of QoS information on the middleware layer.

@iluetkeb
Copy link
Author

Tagging @wjwwood @mjcarroll @nburek for feedback.

@iluetkeb iluetkeb changed the title rmw API does not permit deterministic execution rmw API and deterministic execution Sep 24, 2019
@iluetkeb iluetkeb changed the title rmw API and deterministic execution Improvements to rmw for deterministic execution Sep 24, 2019
@nburek
Copy link
Contributor

nburek commented Sep 24, 2019

@iluetkeb Thanks for starting the discussion on making the scheduling more deterministic and for the details in the paper.

One clarifying question, is the execution of callbacks out of order for a single subscription on a topic or are they just out of order across multiple topics/subscriptions?

To me it makes more sense for the RMW layer to be providing the information needed for the rcl/rclcpp layers to be making the scheduling decision as opposed to allowing the rmw to fully dictate which are the highest priority. If you're entrusting the rmw layer to make decisions about priority instead of providing just metadata then it becomes harder to make behavior guarantees in rcl/rclcpp because it depends on the implementation and if you're going to dictate every rmw layer implement something the same way say that it is consistent then it makes sense to me to just move that up the stack.

Do you have a separate design issue related to the changes that would be made in rcl/rclcpp to enable the deterministic scheduling? Have you considered alternatives to arrival priority scheduling? It's not unreasonable to me to imagine that ROS will eventually support two or more schedules, one being a more fair and one being a more hard realtime that would starve some topics to guarantee others. As the paper points out, the priority inversion caused by the way the current design polls for information and then handles it all the events before polling for more would also need to be solved to be more deterministic in other scenarios. I bring all this up because I think it's worth thinking now about the long term requirements needed for rcl/rclcpp to have different schedulers and how to enable all these future use cases now. That may be as simple as providing the additional metadata, like arrival time, when you poll for messages or it may require some other changes to how events are polled.

@tobiasblass
Copy link

@iluetkeb Thanks for starting the discussion on making the scheduling more deterministic and for the details in the paper.

One clarifying question, is the execution of callbacks out of order for a single subscription on a topic or are they just out of order across multiple topics/subscriptions?

The paper discusses the ordering of callbacks across multiple topics/services. Within a single topic, the paper assumes that messages are processed in arrival order (which, to the best of my knowledge, is the default setting for the DDS middleware)

To me it makes more sense for the RMW layer to be providing the information needed for the rcl/rclcpp layers to be making the scheduling decision as opposed to allowing the rmw to fully dictate which are the highest priority. If you're entrusting the rmw layer to make decisions about priority instead of providing just metadata then it becomes harder to make behavior guarantees in rcl/rclcpp because it depends on the implementation and if you're going to dictate every rmw layer implement something the same way say that it is consistent then it makes sense to me to just move that up the stack.

I definitely agree. Generally, allowing more experimentation and domain-specific adjustments is a good thing. The main question is how expensive this is in terms of performance, and how this interacts with DDS QoS policies.
The most straightforward way to implement rcl-level scheduling policies, for example, is to move all messages from the DDS to the rcl layer as soon as possible. However, this means that the messages are considered to be successfully delivered by the DDS middleware, which breaks some DDS QoS policies (e.g., deadline enforcement). On the other hand, if we leave all the messages at the DDS layer and copy them over for scheduling, we doubled the memory consumption.
The holy grail is to find some middle ground between these two extreme options, where we get enough information about the available messages without breaking DDS QoS options.

Do you have a separate design issue related to the changes that would be made in rcl/rclcpp to enable the deterministic scheduling? Have you considered alternatives to arrival priority scheduling? It's not unreasonable to me to imagine that ROS will eventually support two or more schedules, one being a more fair and one being a more hard realtime that would starve some topics to guarantee others. As the paper points out, the priority inversion caused by the way the current design polls for information and then handles it all the events before polling for more would also need to be solved to be more deterministic in other scenarios. I bring all this up because I think it's worth thinking now about the long term requirements needed for rcl/rclcpp to have different schedulers and how to enable all these future use cases now. That may be as simple as providing the additional metadata, like arrival time, when you poll for messages or it may require some other changes to how events are polled.

It is definitely possible that we end up with multiple schedulers, although I still hope that we are able to implement an efficient earliest-deadline-first scheduler (which would give us a single scheduler that at the same time provides strong real-time guarantees for users who take the time to configure it while behaving just like FIFO in the default state). Independent of whether that works out or not, I'm pretty sure we're going to want some kind of arrival-based scheduler if at all possible, just because they are easy to understand, are what people probably expect from the outset, and are starvation-free. I think it is very valuable to have at least a prototype implementation of that as soon as possible, even if the long-term requirements are not clear yet.

@iluetkeb
Copy link
Author

iluetkeb commented Sep 25, 2019

As a preface let me say that, all other things being equal, I generally prefer to put functionality in the place where the necessary information is.

In this case, the middleware has all the information, so it would be fairly natural to do that. btw, please note that this includes more than just timestamp: The middleware also has QoS settings and it has information about where a message came from.

However, I get the feeling that you guys think that executor is something we have under control, but that we don't have the middleware under control, and that is why you want to put it into the executor? Is that right? If yes, please read on below.

To me it makes more sense for the RMW layer to be providing the information needed for the rcl/rclcpp layers to be making the scheduling decision as opposed to allowing the rmw to fully dictate which are the highest priority. If you're entrusting the rmw layer to make decisions about priority instead of providing just metadata then it becomes harder to make behavior guarantees in rcl/rclcpp because it depends on the implementation and if you're going to dictate every rmw layer implement something the same way say that it is consistent then it makes sense to me to just move that up the stack.

I think I see your point, but the middleware can always make decisions in a way we cannot override, through QoS settings. It will drop messages when buffers are full, it will prioritize messages when QoS says so, etc.

One of my goals is to keep DDS QoS mechanisms working. If that wasn't a goal, then the easiest way to get a deterministic executor would be to have a thread that just polls the middleware as fast as it can and put_back everything into a shared queue, for a single thread to pop from. That is basically how execution works in ROS 1.

If we accept this in general, then it might not be such a bad idea to have the middleware do the prioritization, because we can influence how it does it through QoS settings. Also, please note that we can always choose to ignore the ordering suggested by the middleware.

Of course, it could be that QoS settings are too complex, or not powerful enough to do what we need. At the moment, I can't really judge that, because I have not looked at them in detail, and I even more importantly, I don't have all the use cases in mind. If we want to do arrival ordering, it would be trivial, of course. If you have other things in mind, please let me know.

What I have looked at in detail is rmw_fastrtps and the relevant bits of Fast-RTPS, and they make me lean towards option 2.

In any case, I would think that the executor can still achieve some measure of control by waiting on just some of the communication objects, instead of all of them.

Do you have a separate design issue related to the changes that would be made in rcl/rclcpp to enable the deterministic scheduling?

Not yet. However, I think it can be fairly easily implemented by having a new memory_strategy with the current executor, and by treating timers differently. The main part seems to be getting the info.

Have you considered alternatives to arrival priority scheduling? It's not unreasonable to me to imagine that ROS will eventually support two or more schedules, one being a more fair and one being a more hard realtime that would starve some topics to guarantee others.

Well, using topics with differing priorities in the same executor could probably be called a mixed-criticality setting (@tobiasblass correct me if I get the terms wrong...)

This is in general hard to get right. For the moment, I would be happy with real-time-safe nodes at all, and assuming that they only consume information of (roughly) equal importance. This would basically partition the system into real-time and non-real-time parts.

btw, some reasons why mixed criticality is hard:

  • you also need to ensure that delivery and reception of important messages is prioritized over less important messages. Otherwise it could happen (and does happen) that your important message is just getting stuck in some buffer while the unimportant bulk data transfer is hogging the line...
  • we might even need pre-emption, because when an important message comes in while we are busy processing some other data, we might want to execute it immediately.

As the paper points out, the priority inversion caused by the way the current design polls for information and then handles it all the events before polling for more would also need to be solved to be more deterministic in other scenarios.

Absolutely agreed. At the moment, the only thing I can think of to address this is that we have to ask the middleware for an update after every callback.

Unfortunately, in the current implementation of both the executor and the rmw-implementation, this approach is prohibitively expensive, because the overhead is so huge.

My ideal approach would be that the rmw-implementation does not have to completely re-create the wait_set on every call to rmw_wait, but that it could just update it incrementally with things that have changed since the last call.

@tobiasblass
Copy link

However, I get the feeling that you guys think that executor is something we have under control, but that we don't have the middleware under control, and that is why you want to put it into the executor? Is that right? If yes, please read on below.

I think the main issue is that ROS 2 supports many different middlewares, so it's hard to make assumptions about what the middleware supports. Anything that is part of the executor is the same across all ROS 2 configurations.

To me it makes more sense for the RMW layer to be providing the information needed for the rcl/rclcpp layers to be making the scheduling decision as opposed to allowing the rmw to fully dictate which are the highest priority. If you're entrusting the rmw layer to make decisions about priority instead of providing just metadata then it becomes harder to make behavior guarantees in rcl/rclcpp because it depends on the implementation and if you're going to dictate every rmw layer implement something the same way say that it is consistent then it makes sense to me to just move that up the stack.

I think I see your point, but the middleware can always make decisions in a way we cannot override, through QoS settings. It will drop messages when buffers are full, it will prioritize messages when QoS says so, etc.

I agree with you, but let me point out that this is not necessarily about QoS settings but about QoS settings that are provided by the user without ROS knowing about it. It is at least conceivable that a deterministic executor could pick the appropriate middleware QoS settings, which would not suffer from this problem.
Of course, if we want allow users to pick any QoS settings they like, I fully agree with your argument.

Have you considered alternatives to arrival priority scheduling? It's not unreasonable to me to imagine that ROS will eventually support two or more schedules, one being a more fair and one being a more hard realtime that would starve some topics to guarantee others.

Well, using topics with differing priorities in the same executor could probably be called a mixed-criticality setting (@tobiasblass correct me if I get the terms wrong...)

This is in general hard to get right. For the moment, I would be happy with real-time-safe nodes at all, and assuming that they only consume information of (roughly) equal importance. This would basically partition the system into real-time and non-real-time parts.

At least in the real-time community, mixed-criticality is a fixed term for something different (roughly speaking, this is about changing your workload and scheduling parameters to a safer state with less features if you notice that, e.g., some task overruns its expected execution time). What you are describing would just be called real-time scheduling on the executor level.
However, I think the point that @nburek was making is that you could choose one of these schedulers per executor, so you would still have the separation. Please correct me if I'm wrong, though :-).

btw, some reasons why [real-time scheduling] is hard:

you also need to ensure that delivery and reception of important messages is prioritized over less important messages. Otherwise it could happen (and does happen) that your important message is just getting stuck in some buffer while the unimportant bulk data transfer is hogging the line...

True, although this problem does not appear on the executor but on the process level. As far as I know, the inbound and outbound queues are shared between all executors in the same process.
One way to deal with that would be to give the middleware communication threads the highest priority, or pin them to a different CPU core than your ROS executors. This is definitely out of scope for this ticket, though :-).

we might even need pre-emption, because when an important message comes in while we are busy processing some other data, we might want to execute it immediately.

I think that is a separate decision. It is in some sense more difficult than in the preemptive case, but people have successfully built real-time systems based on non-preemptive schedulers. The main difficulty is that your longest-running callback determines the reaction time of everything else, but if you account for that this works just fine.

@iluetkeb
Copy link
Author

iluetkeb commented Sep 25, 2019

I think the main issue is that ROS 2 supports many different middlewares, so it's hard to make assumptions about what the middleware supports. Anything that is part of the executor is the same across all ROS 2 configurations.

Sure. But expecting the middleware to be able to sort messages in arrival order is not asking much, I would say.

but let me point out that this is not necessarily about QoS settings but about QoS settings that are provided by the user without ROS knowing about it.

Well, I think I mean something slightly different. What I mean is that we're already using the middleware to make some decisions for us. For example, when configuring queue sizes -- that decision has an impact on scheduling. The middleware is the layer that implements this choice.

At least in the real-time community, mixed-criticality is a fixed term for something different (roughly speaking, this is about changing your workload and scheduling parameters to a safer state with less features if you notice that, e.g., some task overruns its expected execution time).

I'll check that.

In any case, I would certainly like to consider this case, but it goes far beyond what we can achieve simply by looking at the rmw API.

It is in some sense more difficult than in the preemptive case, but people have successfully built real-time systems based on non-preemptive schedulers.

This requires guarantees on all callbacks, though, right? If we have a system where messages on some topics have deadlines, but messages on other topics don't, it might be that we cannot give this guarantee either. In that case, I would suggest partitioning differently.

@nburek
Copy link
Contributor

nburek commented Sep 25, 2019

As a preface let me say that, all other things being equal, I generally prefer to put functionality in the place where the necessary information is.

In this case, the middleware has all the information, so it would be fairly natural to do that. btw, please note that this includes more than just timestamp: The middleware also has QoS settings and it has information about where a message came from.

I would agree with you in the case where it makes sense for the component with the information to be making those evaluations. In this case, it feels like we are pushing too much of the business logic of scheduling execution down into the messaging middleware layer.

However, I get the feeling that you guys think that executor is something we have under control, but that we don't have the middleware under control, and that is why you want to put it into the executor? Is that right? If yes, please read on below.

From the perspective of someone writing a node that will be used by other people (think MoveIt) the only guarantees that you have are those provided by the rcl/rclcpp layer that you're programming against (and by translation the guarantees provided to them by the rmw api). Right now the scheduling is handled in rclcpp such that they get some level of guarantees to behavior regardless of which which rmw implementation is loaded by the end user at runtime. Right now nothing in the rmw api enforces a contract for how anything is scheduled. So if we do go down the route of allowing the rmw layer to dictate scheduling priorities I personally would want to see it defined in the API contract. However, doing that starts to add a lot of burden on the rmw maintainers to implement all the supported priority management. It will also lead to a lot of duplication of logic in every rmw implementation (of which I think there are at least 4-5 that people actively worked on in the last 6 months, two of which aren't DDS based).

The reason I would push toward having the business logic of the scheduling in the rcl/rclcpp layer would be because it would continue to give some level of guarantee to the node authors while not pushing a lot of additional burden down into the rmw layers. In reality, the final solution might be some combination of the two.

Unfortunately, in the current implementation of both the executor and the rmw-implementation, this approach is prohibitively expensive, because the overhead is so huge.

My ideal approach would be that the rmw-implementation does not have to completely re-create the wait_set on every call to rmw_wait, but that it could just update it incrementally with things that have changed since the last call.

The reason I was asking if you had thought about any alternative scheduling policies besides just first arrival of all messages was because I was wondering if we can make a case for improving the rmw interface. If we have a set of requirements for schedulers in general and the current rmw layer isn't meeting them or is two expensive then I think we can evaluate if there are changes that would not only enable arrival based scheduling, which I totally agree is a very valuable first step, but also make it possible to write other scheduling policies. It's hard to know we're making the right decision on changing the rmw layer with future proofing in mind if we don't think a little more broadly.

However, I think the point that @nburek was making is that you could choose one of these schedulers per executor, so you would still have the separation. Please correct me if I'm wrong, though :-).

Yes, I was thinking about different executors in different processes (or entire systems) wanting different types of scheduling, but I suppose it would also be possible to have a mixture of executors with different policies in the same process. Though I don't know if it's a real use case to have multiple executors working on a shared object, so I don't think we need to worry about those complexities.

Sure. But expecting the middleware to be able to sort messages in arrival order is not asking much, I would say.

I don't think it is, but nothing in the existing API contract enforces it and so you will start to have greater disparity between RMW layers.

@iluetkeb
Copy link
Author

iluetkeb commented Sep 25, 2019

@nburek I see your points, and I generally agree, particularly about the need for explicit contracts.

What might be a relevant observation here, is that just providing timestamps to the upper layers is already non-trivial, and also puts expectations on the middleware. At least that's the impression I got from reading through rmw_fastrtps and Fast-RTPS source code.

Firstly, it requires reading more data than is currently done before taking the message. Secondly, what we really want is a source timestamp, which DDS has, but not all middlewares do. Therefore, telling the middleware "give me N messages in arrival order", might actually be easier for the rmw-implementors than having them provide timestamps for all currently available messages. It would also be a very explicit contract.

But I agree that it is very specific contract as well, and that we would do well to consider other use cases before embarking on changing the rmw interface.

The reason I was asking if you had thought about any alternative scheduling policies besides just first arrival of all messages was because I was wondering if we can make a case for improving the rmw interface. If we have a set of requirements for schedulers in general and the current rmw layer isn't meeting them or is to expensive then I think we can evaluate if there are changes that would not only enable arrival based scheduling, which I totally agree is a very valuable first step, but also make it possible to write other scheduling policies.

As @tobiasblass said, a major scheduling approach we are looking at "earliest deadline first" (EDF) scheduling.

btw, note that DDS also has a concept of a deadline, which can be set in both the Writer and the Reader. At the moment I'm not sure whether this is actually used for prioritization, or only to remove data that has expired, but it is certainly useful to consider information provided by the sender as well as the reader.

Anyway, a simple approach would be define the deadline as "source timestamp + acceptable delay" (which may be different per topic, thus giving us prioritization). In case we don't have a source timestamp, we would substitute arrival time, and then end up with arrival-based ordering.

Of course, incoming data is not the only source of work. We also need to look at periodic or explicitly invoked callbacks from within the process. This should fit into that approach as well.

The second major approach, which we have already implemented, is static scheduling. That's simple, and has very little requirements on the rmw interface.

Regarding the rmw API in general: I think that right now, the rmw_wait call is very inefficiently implemented at least by rmw_fastrtps, and so are the interface-related parts of the executor and its memory_strategies. It is probably to improve upon that within the constraints of the current API, but I also have some ideas on a better information exchange interface. That is future work, though.

@iluetkeb
Copy link
Author

btw, I asked Arne about mixed-criticality, and he explained that it's about a mixture of Safety Integrity Levels on the same device. This has some connection to scheduling, of course, but its origin is elsewhere.

@gbiggs
Copy link
Member

gbiggs commented Sep 29, 2019

Mixed-criticality is fun because you have to proove that the lower SIL aspects are not going to interfere with the higher SIL aspects. Things like not having priority inversions, not allowing a lower SIL process to suck up all the CPU time or memory, no deadlocks, etc.

Hypervisors and partitioning are often used for this sort of thing.

I think in ROS you would be unlikely to put different safety levels in a single node. It would make your safety-critical aspects harder to prove correct. My feeling is that you would want to separate out your safety-critical architecture from the rest of your application. However you still need a way to interact between the two so having priorities within a single node may still be necessary.

@deeplearningrobotics
Copy link

We want to execute messages in importance order, which -- in the absence of other priorities -- is usually message arrival order.

Priorities should be configurable but furthermore currently executed callbacks need to be interrupted should a higher priority task come in.

We ask the middleware for timestamps on all waiting messages and perform ordering on the executor level.

With DDS read and take this is already provided on the middleware layer.

DDS allows ordering by reception, sender or custom timestamp (destination order QOS), where a custom timestamp usually would be the sensor data timestamp or similar. Note that having a custom timestamp is the only deterministic option that leads to same outcome on the same data set in successive runs of an application.

On the other hand, if we leave all the messages at the DDS layer and copy them over for scheduling, we doubled the memory consumption.

I think copying messages is a no-go for most modern robotics applications such as autonomous driving.

This is in general hard to get right. For the moment, I would be happy with real-time-safe nodes at all, and assuming that they only consume information of (roughly) equal importance. This would basically partition the system into real-time and non-real-time parts.

We implement real-time nodes using read and take which will is also currently being discussed and hopefully added to the next ROS 2 release (#256).

See the following diagram on how we do this in detail:

SW

The above architecture allows:

  • To give tasks/nodes different priorities.
  • To interrupt a lower priority task for a higher one.
  • Full control over data ordering and further allows us to chose which topics to handle first.
  • Does not allocate, does not copy any data unnecessarily if the middleware itself does not allocate.
  • Reuse data if desired.

I think implementing a true real-time executor in C++ currently fails on the fact that function can not be suspended externally (even C++20 coroutines can only yield from inside the function itself AFAIK).

Unfortunately, in the current implementation of both the executor and the RMW implementation, this approach is prohibitively expensive, because the overhead is so huge.

Fully agree!

My ideal approach would be that the RMW implementation does not have to completely re-create the wait_set on every call to rmw_wait, but that it could just update it incrementally with things that have changed since the last call.

In our internal RMW implementation, we do exactly that and only apply diffs to our waitset. Still, the current design of RMW, especially the waitset forces such an inefficient implementation that we are forced to sidestep the RMW abstraction to achieve satisfying performance.

@iluetkeb
Copy link
Author

iluetkeb commented Oct 7, 2019

@deeplearningrobotics sorry for the late reply, we've had a holiday here in Germany.

Priorities should be configurable but furthermore currently executed callbacks need to be interrupted should a higher priority task come in.

I see two answers to this: One is, as @tobiasblass mentioned above, that you can give RT guarantees without preemption. They might not be as tight, but still. The other answer is that you can already have pre-emption by using multiple executors.

However, in both cases, we have a problem, and that is that we don't have deadline information right now. RMW's current API doesn't give that information to us. That's mainly what this issue is about.

We ask the middleware for timestamps on all waiting messages and perform ordering on the executor level.

With DDS read and take this is already provided on the middleware layer.

Yeah, but not by RMW. In any case, if we use the middleware functionality for this (which I like), that would be option 2 of my original list. If we pass on deadlines to the higher layers, that would be option 1.

We're currently trying to decide which option is better. Any input on that would be much appreciated.

We implement real-time nodes using read and take which will is also currently being discussed and hopefully added to the next ROS 2 release (#256).

I wasn't aware of #256 so far, thanks for the link. However, as far as I can discern, that mainly targets zero-copy. It does not seem to address how to figure out whether, and if yes how much, data is available. This is what rmw_wait is currently used for, but its API is problematic.

In our internal RMW implementation, we do exactly that and only apply diffs to our waitset. Still, the current design of RMW, especially the waitset forces such an inefficient implementation that we are forced to sidestep the RMW abstraction to achieve satisfying performance.

Exactly! That's what we're trying to improve here.

Since you seem to have done this already: Do you have performance data on the current approach vs. yours that you'd be willing to share?

@wjwwood wjwwood added the question Further information is requested label Oct 24, 2019
@wjwwood
Copy link
Member

wjwwood commented Oct 30, 2019

Hey guys, sorry for not reply here, but I have been following it with interest. I hope to discuss a lot of this with you guys at ROSCon and then follow up here, but in the meantime I'll just make a few comments on what I re-read today.


Sure. But expecting the middleware to be able to sort messages in arrival order is not asking much, I would say.

I actually think this is non-trivial. Even in DDS ordering them across multiple subscriptions will require reading the data for the sample info's and manually tracking which you should take from next. Unless you use a Listener, but that's more or less equivalent an Executor in that you don't have much control on ordering among multiple data readers. But I could be wrong.

At the moment I'm not sure whether this is actually used for prioritization, or only to remove data that has expired, but it is certainly useful to consider information provided by the sender as well as the reader.

I think it is only used to notify the user when data was not received on time. The QoS "lifespan" is used to remove data that is "too old" usually. Neither control presentation order as far as I know, but that is a separate QoS in DDS.

Priorities should be configurable but furthermore currently executed callbacks need to be interrupted should a higher priority task come in.

I think this was addressed, but I'll just re-iterate that there's no way to do this without the code to be preempted offering preemption points periodically. (for example see jthread where they use the phrase cooperatively interruptible: https://medium.com/@vgasparyan1995/a-new-thread-in-c-20-jthread-ebd121ae8906)

@mm318
Copy link
Member

mm318 commented Nov 3, 2019

It looks like from the discussion so far, the callback execution order would generally be scheduled based on a timestamp and a explicit priority of a topic/subscription that would be specified by the user. We shouldn't forget that the callbacks of timers (rcl_timer_t) and QoS events (rmw_events_t) should also be part of execution order calculation. Therefore, the data required for determining callback execution order is coming from all the different layers: timestamps for messages and QoS events are coming from the middleware/rmw_implementation, timestamps for timers are coming from rcl, and the explicit priorities are coming from rclcpp since they are user specified.

Given this, I think it makes more sense to bubble the timestamps up to the rclcpp layer (option 1, and echoing what @nburek was saying) than to pass everything needed down to the rmw layer for rmw_wait() to make a scheduling decision (option 2), because for one the rmw layer is just completely unaware of timers (unless we introduce a rmw_timer_t).

Regarding the discussion for option 1, I think the data size for keeping track of message timestamps would be bounded by the depth of the history QoS policy (which could mean unbounded). rmw_wait() can be modified to indicate the number of times rcl_take() can be called on each member of the rmw_wait_set_t (let's call them Waitables ;) ), and the output of each rcl_take() call will provide timestamp information to the rclcpp layer. In effect, this is starting to sound like option 1 with a sprinkle of option 2.

@mm318
Copy link
Member

mm318 commented Nov 3, 2019

I thought preemption is a bit out of scope for this issue, but anyway...

There is probably no way to interrupt a callback that is currently executing, but we can at least rearrange the callback execution order while a callback is executing and update the next callback to be executed before the current callbacks are finished.

@iluetkeb
Copy link
Author

iluetkeb commented Nov 4, 2019

It looks like from the discussion so far, the callback execution order would generally be scheduled based on a timestamp and a explicit priority of a topic/subscription that would be specified by the user.

While this is a common approach, it is not as easy to get right as one might think. Therefore, we're favoring a deadline-based approach instead of a priority-based one. btw, I just learned at ROSCon that our notion of a deadline is different from the DDS one. We consider a deadline to be the latest time by which the data has to be handled by a user callback.

Nevertheless, the deadline can also be user-specified information, so I agree with your general argument that the ordering will have to take information from various sources into account.

That said, I am increasingly coming to the view that my original question raised in this ticket cannot be answered without a more fine-grained design for both options, where we have considered all the implications fully.

@mm318
Copy link
Member

mm318 commented Nov 5, 2019

Therefore, we're favoring a deadline-based approach instead of a priority-based one.

Okay, I see. That just means that the callback execution will be ordered based on a timestamp + user_deadline criteria (instead of being based on some function f(timestamp, user_priority) that may be unintuitive), right? Or is there some special treatment for when a deadline cannot be met?

btw, I just learned at ROSCon that our notion of a deadline is different from the DDS one.

Right. Your notion of deadline sounds more like the latency_budget DDS QoS policy, but according to the DDS spec it is also just a hint to the middleware and doesn't sound like there's any guarantee it will even have any effect at all. (Or like the lifespan QoS policy, if the special treatment for when a callback execution deadline cannot be met is to drop the execution altogether.)

@iluetkeb
Copy link
Author

iluetkeb commented Nov 5, 2019

That just means that the callback execution will be ordered based on a timestamp + user_deadline criteria

In principle, it should be timestamp + deadline - worst-case-execution-time, such that the reaction is available by the deadline. Note that WCET is almost impossible to know exactly, so many people use the average execution time instead.

Your notion of deadline sounds more like the latency_budget DDS QoS policy, but according to the DDS spec it is also just a hint to the middleware and doesn't sound like there's any guarantee it will even have any effect at all.

Yes, the definition of that sounds right. btw, I would expect that a real-time capable middleware implementation, when running on a real-time capable transport, could use this information to configure the transport appropriately.

@ralph-lange
Copy link

See @iluetkeb's PRs ros2/rcl#591 and ros2/rmw#200

@rotu
Copy link

rotu commented Apr 9, 2020

I strongly object to the changes in ros2/rmw#200, ros2/rcl#591

  1. This attempts to rewrite parts of the DDS DCPS standard to fix our misinterpretation of it.
    (1) The SampleInfo already contains a source timestamp, which we completely ignore.
    (2) Waitsets do not maintain the timestamp of certain events and this introduces ambiguity. If the status changes but still matches the StatusCondition mask, what is the correct timestamp?
    (3) If it is important to run an action when an event happens as opposed to waiting, there is already a mechanism to do so
    in DCPS: listeners.

  2. This doesn't even solve the problem:
    (1) Recreating waitsets every time the user calls rmw_wait is measurably large source of overhead, largely due to the amount of dynamic memory management. This overhead contributes to the need for sequencing. Adding several new heap-allocated dynamic-length collections is going to make it slower.
    (2) A compliant implementation will still have a useless timestamp: the ReadCondition should stay true while samples are available (not become true on each incoming sample), so the timestamp wouldn't even allow proper sequencing of messages.

  3. There is a more obvious solution available. We've already added a destination timestamp to rmw_message_info_t, but the code is commented out https://github.com/ros2/rmw/blob/2f954d77a102042fa5c13540e2db300d549d7ad3/rmw/include/rmw/types.h#L335. If that's uncommented, it is a simple matter to use in rcl/rclcpp: pop available messages to a priority queue and have the executor process them in first-to-last order. Plus we don't need to spuriously create multiple wait sets when multiple messages come in fast succession.

@iluetkeb
Copy link
Author

iluetkeb commented Apr 9, 2020

Hi @rotu, thank you for your comments. I'm sorry you strongly object, but I hope I can address your concerns, since we have actually considered many of the same points in our design discussions.

This attempts to rewrite parts of the DDS DCPS standard to fix our misinterpretation of it.

I'm not sure what this refers to, but not reinventing what is provided by DDS already has been a major concern during the design.

(1) The SampleInfo already contains a source timestamp, which we completely ignore.

We do not ignore it. In fact the SampleInfo is used by my reference implementation, ros2/rmw_fastrtps#359, for all types (subscriptions, clients and services) and we can switch to that as soon as Fast-RTPS 1.10 is included.

Until then, the version for Fast-RTPS 1.9.x, ros2/rmw_fastrtps#363, uses SampleInfo for clients and services. It does not yet use it for subscriptions, because a method is lacking that allows us to retrieve SampleInfo without read or take (using either of those would change semantics). For clients and services, they are always taken immediately, so we can avoid this.

btw, the version for 1.10 does not use the source timestamp anymore, but the newly introduced receive timestamp, also from the SampleInfo. We have discussed which timestamp is more appropriate and currently believe that the receive timestamp is more appropriate. If you object, we can discuss this, of course.

(2) Waitsets do not maintain the timestamp of certain events and this introduces ambiguity. If the status changes but still matches the StatusCondition mask, what is the correct timestamp?

I'm not sure which ambiguity you refer to. In any case, the timestamp we use is always the one that matches the time the message has been received originally. As said above, this is retrieved from the SampleInfo.

(3) If it is important to run an action when an event happens as opposed to waiting, there is already a mechanism to do so in DCPS: listeners.

Agreed. It is not the intention to run an event as soon as things are happening. What we aim for is that, if you have messages on multiple sources available at the same time, that their arrival order is available for consideration by the executor.

2 This doesn't even solve the problem:

  1. Recreating waitsets every time the user calls rmw_wait is measurably large source of overhead, largely due to the amount of dynamic memory management.

I share the view that the current execution pipeline has many inefficiencies, and am actively involved in efforts to change this.

However, I believe this PR does not make the situation worse. Yes, it allocates more memory to hold the timestamps, but this memory can be allocated once, just like the rest of the wait_set. Nothing in the rcl_wait API requires that you dynamically re-allocate it, unless the size of the wait-set changes (which does not happen often in most applications). It is possible that some executors do this, but that can (and should) be changed independently. Lastly, in comparison to how much memory we spend in other places, I think the list of timestamps is small fry.

(2) A compliant implementation will still have a useless timestamp: the ReadCondition should stay true while samples are available (not become true on each incoming sample), so the timestamp wouldn't even allow proper sequencing of messages.

I don't understand this part. Maybe you can look at what I wrote above, regarding our use of the SampleInfo and the reference implementation, and tell me if this comment is still valid?

  1. There is a more obvious solution available. We've already added a destination timestamp to rmw_message_info_t, but the code is commented out

This information is only available after you take the message. However, we want to take messages only when we actually intend to process them immediately, because otherwise we would interfere with the history QoS. This is one of those things were we have not opted for the "simple" solution, precisely because we don't want to reimplement what DDS already provides.

That means, we need the timestamp first, and then we can decide what to take. That is important because it may well be that by the time we're done processing the selected message, the one that would have been second has already been replaced by new data.

@rotu
Copy link

rotu commented Apr 9, 2020

Hi @rotu, thank you for your comments. I'm sorry you strongly object, but I hope I can address your concerns, since we have actually considered many of the same points in our design discussions.

This attempts to rewrite parts of the DDS DCPS standard to fix our misinterpretation of it.

I'm not sure what this refers to, but not reinventing what is provided by DDS already has been a major concern during the design.

DDS already has a concept of waitset which the rmw_wait_set_t is getting further away from with this design. In DDS the wait set has the following actions: attach_condition, detach_condition, wait, and get_conditions. By adding a timestamp, you force some weird design decisions on the RMW.

An entity in a DDS wait set becomes "ready" when its status matches some given mask. This change means that when processing the wait set, the application now has to not simply wait for the status flag to be true, it has to look into that sample and expose information that should be available through other means. Worse, if there is any message history, this forces you to call rmw_wait on the waitset multiple times to get the associated timestamps.

(1) The SampleInfo already contains a source timestamp, which we completely ignore.

We do not ignore it. In fact the SampleInfo is used by my reference implementation, ros2/rmw_fastrtps#359, for all types (subscriptions, clients and services) and we can switch to that as soon as Fast-RTPS 1.10 is included.

Until then, the version for Fast-RTPS 1.9.x, ros2/rmw_fastrtps#363, uses SampleInfo for clients and services. It does not yet use it for subscriptions, because a method is lacking that allows us to retrieve SampleInfo without read or take (using either of those would change semantics). For clients and services, they are always taken immediately, so we can avoid this.

btw, the version for 1.10 does not use the source timestamp anymore, but the newly introduced receive timestamp, also from the SampleInfo. We have discussed which timestamp is more appropriate and currently believe that the receive timestamp is more appropriate. If you object, we can discuss this, of course.

I mean we don't expose the DDS SampleInfo source_timestamp, which is part of the DDS spec. The reception timestamp totally makes sense too as an extension of the spec, and I agree with using it.

In the DDS spec, read should already expose this sample info. So it would be much more compliant with the spec to keep it with the sample instead of stuffing it into the waitset.

(2) Waitsets do not maintain the timestamp of certain events and this introduces ambiguity. If the status changes but still matches the StatusCondition mask, what is the correct timestamp?

I'm not sure which ambiguity you refer to. In any case, the timestamp we use is always the one that matches the time the message has been received originally. As said above, this is retrieved from the SampleInfo.

The ambiguity I'm referring to is that the condition in the wait set is broader than what the timestamp represents. A subscription should be ready when the ReadCondition becomes true, but this means that the timestamp should now change when a sample is received or taken, even if the ReadCondition is unchanged.

(3) If it is important to run an action when an event happens as opposed to waiting, there is already a mechanism to do so in DCPS: listeners.

Agreed. It is not the intention to run an event as soon as things are happening. What we aim for is that, if you have messages on multiple sources available at the same time, that their arrival order is available for consideration by the executor.

2 This doesn't even solve the problem:

  1. Recreating waitsets every time the user calls rmw_wait is measurably large source of overhead, largely due to the amount of dynamic memory management.

I share the view that the current execution pipeline has many inefficiencies, and am actively involved in efforts to change this.
However, I believe this PR does not make the situation worse. Yes, it allocates more memory to hold the timestamps, but this memory can be allocated once, just like the rest of the wait_set. Nothing in the rcl_wait API requires that you dynamically re-allocate it, unless the size of the wait-set changes (which does not happen often in most applications). It is possible that some executors do this, but that can (and should) be changed independently. Lastly, in comparison to how much memory we spend in other places, I think the list of timestamps is small fry.

It can be made faster, but it's disingenuous to expect it will be. Executor::wait_for_work clears, resizes, and reallocates all dynamic data in the wait set every time. So in the short term, this is adding lots of memory allocation in a very active code path. This allocation/deallocation hit is doubled by this change.

(2) A compliant implementation will still have a useless timestamp: the ReadCondition should stay true while samples are available (not become true on each incoming sample), so the timestamp wouldn't even allow proper sequencing of messages.

I don't understand this part. Maybe you can look at what I wrote above, regarding our use of the SampleInfo and the reference implementation, and tell me if this comment is still valid?

Yes, still valid. If the timestamp is supposed to refer to when the ReadCondition in the waitset becomes true, it is the wrong timestamp. If it doesn't, the waitset is the wrong level of abstraction for it.

  1. There is a more obvious solution available. We've already added a destination timestamp to rmw_message_info_t, but the code is commented out

This information is only available after you take the message. However, we want to take messages only when we actually intend to process them immediately, because otherwise we would interfere with the history QoS. This is one of those things were we have not opted for the "simple" solution, precisely because we don't want to reimplement what DDS already provides.

That means, we need the timestamp first, and then we can decide what to take. That is important because it may well be that by the time we're done processing the selected message, the one that would have been second has already been replaced by new data.

If single ticks of the executor are routinely picking up enough work to stall out the system long enough to evict history significantly, then no amount of prioritizing here will fix that; it might even be desirable to read data rather than drop it on the floor or else topics with shorter histories might never get work done if backlogs are serious.

If you need to borrow the data temporarily before seeing whether you want to actually deserialize and use it, you can use a DDS read or take operation. These operations don't require deserialization at the sime time.

@iluetkeb
Copy link
Author

OK, ros2/rclpy#542 has the first attempt at Python support.

@iluetkeb
Copy link
Author

@rotu are you going to be looking at rmw_cyclonedds for this, or do you know if somebody else is doing that?

@rotu
Copy link

rotu commented Apr 16, 2020

I’m on it

@iluetkeb
Copy link
Author

@rotu @wjwwood I just pushed some more changes related to clients and services. for that, I also added timestamps to rmw_request_id_t.

While doing so, I noticed that the timestamp which is currently copied over from the sample-info in rmw_fastrtps isn't quite what I expected. that's why the message_info test in rcl is currently failing. I am not currently sure why that is -- maybe the sample-info has a different notion of what the timestamp should contain. or maybe I'm doing the copying wrong. I'll work on that tomorrow.

@rotu please be aware that this is currently enabled for fast-rtps only. if you want to check it on cyclone-dds, please modify the condition in the test/cmakelists.txt for test_message_info.

@Karsten1987 Karsten1987 added enhancement New feature or request and removed question Further information is requested labels Apr 22, 2020
@iluetkeb
Copy link
Author

iluetkeb commented Apr 22, 2020

@iluetkeb
Copy link
Author

iluetkeb commented Apr 22, 2020

Well. The message-info changes should be ready.

The service info changes are mostly complete, but they lack a dedicated test. Such a test could be easily added to test_service and test_client in rcl, in the same way that I added the timestmap test to test_subscription. However, it's 1am here and I think if I continue, I will make more of a mess than help. @wjwwood if you want to pick this up, it would be very much appreciated!

@wjwwood
Copy link
Member

wjwwood commented Apr 22, 2020

I'll do my best.

wjwwood pushed a commit to ros2/rmw that referenced this issue Apr 23, 2020
* Add timestamps to message info
See ros2/design#259

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>

* Add initializer for message_info_t

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>

* Use rmw_time_point_value_t instead of rmw_time_t

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>
wjwwood pushed a commit to ros2/rmw_fastrtps that referenced this issue Apr 23, 2020
* Fill message_info timestamp
Implements ros2/design#259

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>

* Use rmw_time_point_value_t instead of rmw_time_t

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>
@iluetkeb
Copy link
Author

iluetkeb commented Apr 23, 2020

Alright firstly, @wjwwood thanks for merging the message info changes! Happy we got that in!

@jacobperron the service timestamp changes are ready from my side. hope CI is okay.

Since the PRs come from an organizational fork, maintainers cannot directly edit them, but if there is a need for further modifications, @Karsten1987 has write permissions on all the repos.

wjwwood pushed a commit to ros2/rmw that referenced this issue Apr 24, 2020
* replace rmw_request_id_t with rmw_service_info_t on select functions for ros2/design#259

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>

* use rmw_time_point_value_t instead of rmw_time_t

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>

* Remove duplicate typedef

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>
wjwwood pushed a commit to ros2/rmw_fastrtps that referenced this issue Apr 24, 2020
* remember the sample info on requests and responses

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>

* This implements services timestamps

Requires ros2/rmw#217
Realizes the 2nd part of ros2/design#259

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>

* use rmw_time_point_value_t instead of rmw_time_t

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>

* snake_case

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>
@iluetkeb
Copy link
Author

Alright, the work set out in this ticket is largely done: We've merged API changes for acquiring timestamps for both topics and services and we have at two implementations of those already (at varying degrees of completeness, but both sufficient for now).

This has been quite a bit more work than expected, but I'm happy we've got a good solution now. A big "thank you" to everybody who participated, particularly @wjwwood, @rotu, @eboasson and @dirk-thomas for sticking with it, and getting it into Foxy. @rotu and @eboasson, thanks for your objections and explanations -- I think it has made a big difference to the result. Hopefully for the better, but of course, as they say, all remaining errors are mine ;-)

To those participating in the earlier part of the discussion: Thanks for the reference information on your respective solutions to this. The work on the executor to actually make use of all this will keep us busy for a while.

I'm keeping this ticket open for the moment -- I would like to merge some documentation at least, summarizing all that we have discussed here. Since I had to push other work away to complete this, it will take a moment until I can return, but feel free to ping me if you think it takes too long ;-)

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros2-middleware-change-proposal/15863/18

@clalancette
Copy link
Contributor

@iluetkeb friendly ping; what's the status of this ticket?

@iluetkeb iluetkeb closed this as completed Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests