Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call dds_entity_status_set independently of enabled #699

Conversation

mauropasse
Copy link

Call dds_entity_status_set independently of enabled

Otherwise if an event listener callback is defined (invoke=true) dds_entity_status_set is not called, so the ROS2 waitset based executors can't identify when an event happened. This commit supports both events & waiset based ROS2 executors.

@eboasson
Copy link
Contributor

eboasson commented Mar 8, 2021

The interaction between listeners and waitsets is a bit of a mess in DDS, or more precisely, the spec tries to avoid a mess by making life complicated. It says somewhere that the listeners must be invoked prior to triggering waitsets. Typically, the listener does something that resets the condition, after which the waitset would not trigger anymore.

I agree that it looks like Cyclone currently doesn't even set the status when there is a listener to invoke, presumably on the (incorrect) assumption that the listener would reset it anyway and/or that no-one would be silly enough to actually try to combine listeners and waitsets for the same event.

Setting the status after invoking the listener is wrong, because then the listener can't handle it and prevent the waitset from triggering. If anything, it needs to be set prior to calling the listener, and then triggering the waitsets if the status is still set after the listener execution. But before making such a change to the behaviour, I'd better re-read the part of the spec that is about listener/waitset interaction to be certain that the status indeed must be set prior to calling the listener (it seems logical, but ...) and that I understand what it does to if multiple threads are it at the same time.

Regarding that final remark, suppose I have two threads, one sets status A and the other sets status B, and both A and B have listeners and both are waited on in a waitset. Then one wonders what "listeners must be invoked prior to triggering waitsets" means for the following schedule:

T1: set A
T1: invoke listener (resetting A)
T2: set B
T1: trigger waitsets
T3: wake up from wait()
T3: inspect status
(T2: invoke listener, &c.)

where I'm wondering of T3 may observe B in the status flags ...

And as a final remark: I'm quite pleased that a whole bunch of listener tests fails with this change 🙂

@mauropasse
Copy link
Author

Hi @eboasson, thanks for your comment!
Honestly I wasn't expecting this PR to fix the issue, and I also haven't thought about your proposed threads scenario since in ROS2 an entity which has been assigned to an executor can't be part of another one, i.e. we can' have both listeners (EventsExecutor) and be waited on in a waitset (SingleThreadedExecutor) for the same entity. But of course this situation may happen on CycloneDDS itself. I was more looking for guidance since the inners of DDS are mostly uncharted territory for me.
I'll see if I find something helpful on the DDS spec about listeners/waitset interaction. Please let me know if in the meantime you find a correct approach or workaround for this situation.

Otherwise if an event listener callback is defined
(invoke=true) dds_entity_status_set is not called,
so the ROS2 waitset based executors can't identify
when an event happened. This commit enables support
for events and waitset based executors.

Signed-off-by: Mauro Passerino <mpasserino@irobot.com>
@mauropasse mauropasse force-pushed the mauro/events-executor-listeners-support branch from ca3d06a to fbbb444 Compare April 8, 2021 15:11
This commit is not a solution, its purpose is just to
show where the issue is
@eboasson
Copy link
Contributor

@mauropasse thank you very much for your efforts trying make sense of the inner workings of listeners + waitset triggering and trying to fix this behaviour. (I'm writing "fix" here, I understand the logic behind your desire even though I have my reservations whether the spec'd behaviour really is something desirable.)

As I'm sure you've seen, it is rather tricky to do this, for example, there is this rule that says take/read reset the DATA_AVAILABLE flag, so if you set it after invoking the listener, a listener that reads the day would incorrectly leave the status set. If I am not mistaken, there is no way to do this without splitting dds_entity_status_set, so that the status can be set first, and the triggering of the waitset can be delayed.

That in turn opens up interesting possibilities for race conditions, TOCTTOU problems and whatnot, but I don't see an alternative just yet. I've tried doing that in https://github.com/eboasson/cyclonedds/tree/waitset-listener-interaction but I have my doubts about it. Still, with a more changes in tests that were failing because they explicitly checked that the status would be cleared if the listener got invoked, it does pass the CI. Perhaps you would like to have a look?

@mauropasse
Copy link
Author

Hi @eboasson, thanks for working on this!
I looked at your PR. The commit eboasson@4eebd96 does exactly what I think it should be done, and also the DDS specification states:

"if both mechanisms are enabled, then the listener mechanism is used first and then the WaitSet objects are signalled" (2.2.4.6)

So your commit still matches the DDS specification, correct me if I'm wrong:

  1. Sets the status, regardless of if there are listener callbacks for the entity:
    bool signal = dds_entity_status_set (&e->m_entity, status_mask);
    where signal is true if the new set status is different than the one it previously had.
  2. Invokes the callback (if any).
  3. Checks the entity status:
    signal = signal && (ddsrt_atomic_ld32 (&e->m_entity.m_status.m_status_and_mask) & status_mask);
    If the status was reset due the invocation of the callback, signal becomes false, and nothing else is done (that is, the wait set is not notified about any event).
    But if the listener callback did not modify the status (like if only had printed something) then the waiset is signaled:
    dds_entity_observers_signal (&e->m_entity, status_mask);

So you say that all tests pass (removing tests which checked status cleared after listener invocation). That's awesome!

About the race conditions and issues you mention, I think the function is not thread safe even with your new changes. Couldn't be all of this protected under a mutex?

@irobot-ros
Copy link

Hi @eboasson, I tested your branch and I confirm that with it the rclcpp tests succeeds

@mauropasse
Copy link
Author

Closing since #900 did the job

@mauropasse mauropasse closed this Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants