-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added test cased for missing events & double call to take_data #2371
Conversation
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
64e3d5e
to
58718f1
Compare
@mjcarroll here is the merge request for the tests. |
TYPED_TEST(TestExecutorsOnlyNode, missing_event) | ||
{ | ||
using ExecutorType = TypeParam; | ||
ExecutorType executor; | ||
|
||
rclcpp::Node::SharedPtr node(this->node); | ||
auto callback_group = node->create_callback_group( | ||
rclcpp::CallbackGroupType::MutuallyExclusive, | ||
true); | ||
|
||
auto waitable_interfaces = node->get_node_waitables_interface(); | ||
auto my_waitable = std::make_shared<TestWaitable>(); | ||
auto my_waitable2 = std::make_shared<TestWaitable>(); | ||
waitable_interfaces->add_waitable(my_waitable, callback_group); | ||
waitable_interfaces->add_waitable(my_waitable2, callback_group); | ||
executor.add_node(this->node); | ||
|
||
my_waitable->trigger(); | ||
my_waitable2->trigger(); | ||
|
||
// a node has some default subscribers, that need to get executed first, therefore the loop | ||
for (int i = 0; i < 10; i++) { | ||
executor.spin_once(std::chrono::milliseconds(10)); | ||
if (my_waitable->get_count() > 0) { | ||
// stop execution, after the first waitable has been executed | ||
break; | ||
} | ||
} | ||
|
||
EXPECT_EQ(1u, my_waitable->get_count()); | ||
EXPECT_EQ(0u, my_waitable2->get_count()); | ||
|
||
// block the callback group, this is something that may happen during multi threaded execution | ||
// This removes my_waitable2 from the list of ready events, and triggers a call to wait_for_work | ||
callback_group->can_be_taken_from().exchange(false); | ||
|
||
//now there should be no ready event | ||
executor.spin_once(std::chrono::milliseconds(10)); | ||
|
||
EXPECT_EQ(1u, my_waitable->get_count()); | ||
EXPECT_EQ(0u, my_waitable2->get_count()); | ||
|
||
//unblock the callback group | ||
callback_group->can_be_taken_from().exchange(true); | ||
|
||
//now the second waitable should get processed | ||
executor.spin_once(std::chrono::milliseconds(10)); | ||
|
||
EXPECT_EQ(1u, my_waitable->get_count()); | ||
EXPECT_EQ(1u, my_waitable2->get_count()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least on my machine, this test only fails for the static single threaded executor and then events executor:
% ~/ros2_ws/build/rclcpp/test/rclcpp/test_executors --gtest_filter=\*missing_event
Running main() from /Users/william/ros2_ws/install/src/gtest_vendor/src/gtest_main.cc
Note: Google Test filter = *missing_event
[==========] Running 4 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 1 test from TestExecutorsOnlyNode/SingleThreadedExecutor, where TypeParam = rclcpp::executors::SingleThreadedExecutor
[ RUN ] TestExecutorsOnlyNode/SingleThreadedExecutor.missing_event
[ OK ] TestExecutorsOnlyNode/SingleThreadedExecutor.missing_event (116 ms)
[----------] 1 test from TestExecutorsOnlyNode/SingleThreadedExecutor (116 ms total)
[----------] 1 test from TestExecutorsOnlyNode/MultiThreadedExecutor, where TypeParam = rclcpp::executors::MultiThreadedExecutor
[ RUN ] TestExecutorsOnlyNode/MultiThreadedExecutor.missing_event
[ OK ] TestExecutorsOnlyNode/MultiThreadedExecutor.missing_event (17 ms)
[----------] 1 test from TestExecutorsOnlyNode/MultiThreadedExecutor (17 ms total)
[----------] 1 test from TestExecutorsOnlyNode/StaticSingleThreadedExecutor, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor
[ RUN ] TestExecutorsOnlyNode/StaticSingleThreadedExecutor.missing_event
/Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:692: Failure
Expected equality of these values:
1u
Which is: 1
my_waitable->get_count()
Which is: 0
/Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:702: Failure
Expected equality of these values:
1u
Which is: 1
my_waitable->get_count()
Which is: 0
/Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:711: Failure
Expected equality of these values:
1u
Which is: 1
my_waitable->get_count()
Which is: 0
/Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:712: Failure
Expected equality of these values:
1u
Which is: 1
my_waitable2->get_count()
Which is: 0
[ FAILED ] TestExecutorsOnlyNode/StaticSingleThreadedExecutor.missing_event, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor (107 ms)
[----------] 1 test from TestExecutorsOnlyNode/StaticSingleThreadedExecutor (107 ms total)
[----------] 1 test from TestExecutorsOnlyNode/EventsExecutor, where TypeParam = rclcpp::experimental::executors::EventsExecutor
[ RUN ] TestExecutorsOnlyNode/EventsExecutor.missing_event
/Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:703: Failure
Expected equality of these values:
0u
Which is: 0
my_waitable2->get_count()
Which is: 1
[ FAILED ] TestExecutorsOnlyNode/EventsExecutor.missing_event, where TypeParam = rclcpp::experimental::executors::EventsExecutor (30 ms)
[----------] 1 test from TestExecutorsOnlyNode/EventsExecutor (30 ms total)
[----------] Global test environment tear-down
[==========] 4 tests from 4 test suites ran. (272 ms total)
[ PASSED ] 2 tests.
[ FAILED ] 2 tests, listed below:
[ FAILED ] TestExecutorsOnlyNode/StaticSingleThreadedExecutor.missing_event, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor
[ FAILED ] TestExecutorsOnlyNode/EventsExecutor.missing_event, where TypeParam = rclcpp::experimental::executors::EventsExecutor
2 FAILED TESTS
Is that your experience too @jmachowinski?
Also, this is a bit more complicated than it needs to be, I'll open a pr (cellumation#1) to make it simpler (without breaking its purpose I think) by only adding the callback group with the waitables to the executor, which you can take or leave/discuss if you see an issue with it. With that change I do get a slightly different result:
% ~/ros2_ws/build/rclcpp/test/rclcpp/test_executors --gtest_filter=\*missing_event
Running main() from /Users/william/ros2_ws/install/src/gtest_vendor/src/gtest_main.cc
Note: Google Test filter = *missing_event
[==========] Running 4 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 1 test from TestExecutorsOnlyNode/SingleThreadedExecutor, where TypeParam = rclcpp::executors::SingleThreadedExecutor
[ RUN ] TestExecutorsOnlyNode/SingleThreadedExecutor.missing_event
[ OK ] TestExecutorsOnlyNode/SingleThreadedExecutor.missing_event (118 ms)
[----------] 1 test from TestExecutorsOnlyNode/SingleThreadedExecutor (118 ms total)
[----------] 1 test from TestExecutorsOnlyNode/MultiThreadedExecutor, where TypeParam = rclcpp::executors::MultiThreadedExecutor
[ RUN ] TestExecutorsOnlyNode/MultiThreadedExecutor.missing_event
[ OK ] TestExecutorsOnlyNode/MultiThreadedExecutor.missing_event (17 ms)
[----------] 1 test from TestExecutorsOnlyNode/MultiThreadedExecutor (17 ms total)
[----------] 1 test from TestExecutorsOnlyNode/StaticSingleThreadedExecutor, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor
[ RUN ] TestExecutorsOnlyNode/StaticSingleThreadedExecutor.missing_event
/Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:705: Failure
Expected equality of these values:
1u
Which is: 1
my_waitable2->get_count()
Which is: 0
[ FAILED ] TestExecutorsOnlyNode/StaticSingleThreadedExecutor.missing_event, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor (47 ms)
[----------] 1 test from TestExecutorsOnlyNode/StaticSingleThreadedExecutor (47 ms total)
[----------] 1 test from TestExecutorsOnlyNode/EventsExecutor, where TypeParam = rclcpp::experimental::executors::EventsExecutor
[ RUN ] TestExecutorsOnlyNode/EventsExecutor.missing_event
/Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:696: Failure
Expected equality of these values:
0u
Which is: 0
my_waitable2->get_count()
Which is: 1
[ FAILED ] TestExecutorsOnlyNode/EventsExecutor.missing_event, where TypeParam = rclcpp::experimental::executors::EventsExecutor (30 ms)
[----------] 1 test from TestExecutorsOnlyNode/EventsExecutor (30 ms total)
[----------] Global test environment tear-down
[==========] 4 tests from 4 test suites ran. (213 ms total)
[ PASSED ] 2 tests.
[ FAILED ] 2 tests, listed below:
[ FAILED ] TestExecutorsOnlyNode/StaticSingleThreadedExecutor.missing_event, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor
[ FAILED ] TestExecutorsOnlyNode/EventsExecutor.missing_event, where TypeParam = rclcpp::experimental::executors::EventsExecutor
2 FAILED TESTS
It is still the events executor and static single threaded executors which fail. Note also that they fail in different ways, i.e. the EventsExecutor is a bit too eager and it executes it when your test assumes it should not be, and the static single threaded executor never executes the second waitable.
I still need to figure out why that is, but at the same time, I think the assumptions in this test are a bit flawed. Specifically, I think that assuming that spin_once()
will ever execute the event you think it should next is a dangerous assumption and I don't think it should be a condition we should try to enforce on the spin variants. Instead I think a variant of this test which uses futures and spins in a loop until they are complete, or some timeout is probably better. After all, we care that the events eventually get called and not double "called", not that they happen in a specific order or number of spin_once calls.
I'll open a pr with the futures alternative soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the pr for using futures: cellumation#2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that your experience too @jmachowinski?
Yes, I missed this during my testing, as I set a filter...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I had a quick look at the SingleStaticExecutor case, this looks like a lost wakeup to me.
@wjwwood This brings me back to a question I asked a while ago, if a lost wakeup would actually be the expected behavior for this test. At some point you mentioned, that an object using a guard condition must make sure that the signal does not get lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This brings me back to a question I asked a while ago, if a lost wakeup would actually be the expected behavior for this test. At some point you mentioned, that an object using a guard condition must make sure that the signal does not get lost.
Right, that's a good question. I think that the "TestWaitable
" in this test file has fallen into the same issue we discussed elsewhere, which is that if a waitable is triggered but not handled, should that waitable re-trigger itself or should the executor handle the retriggering. I believe it should probably be up to the waitable to ensure that behavior, leaving room for waitables that for some reason it makes sense to not retrigger (that is to say, if the conditions that cause it to be triggered no longer apply, for example if data is no longer available in a subscription due to it exceeding its lifetime QoS setting). So part of this test is testing "if you have two things ready, then wait, then execute one, then wait again, will you execute the second?". I think that's valid test, but whether it passes or not comes down to how we ensure that the second waitable "stays ready" and ensures the second wait doesn't block, and that could either be the executor or the waitable, which is the point of the question you asked. The pr #2109 tried (perhaps naively) to fix this at the executor level, but only for some of the executors, but as we've been suggesting, perhaps that pr isn't correct.
Based on that, I set out to simplify the test further to avoid that particular issue, and test the "other" part of this test only, doing something like this:
- create a mutually exclusive callback group
- create a
TestWaitable
instance and add it to the callback group - create the executor and add the callback group
- trigger the waitable
- manually set the callback group's
can_be_taken_from
tofalse
- spin until a future is complete, where the future is set when the waitable is executed, expecting a timeout
- assert it timed out (i.e. the waitable was not executed)
- manually set the callback group's
can_be_taken_from
totrue
- spin until a future is complete, where the future is set when the waitable is executed
- assert the waitable was executed
This is essentially testing whether or not an executor is adhering to the callback group. Another, more contrived, version of this would involve two waitables in the callback group, and ensuring that one is being executed while spinning on the other, but that requires a multi-threaded executor (executing one waitable while spinning on the other implies at least two threads). So it might be "ok" for a single threaded executor to ignore the callback group's can_be_taken_from
. Ideally they would not ignore this, even if in normal practice it should never be set while spinning, but it's an under-defined part of the interface. It will, however, become very important when/if we ever get a multi-threaded version of the events executor.
I got knocked out by a cold, I will take a look at this when I 'm recovered. |
Signed-off-by: William Woodall <william@osrfoundation.org>
Signed-off-by: William Woodall <william@osrfoundation.org>
I reworked the double take data test, it works not on the 'standard' multithreaded executor, showing the race that was introduces with #2109 |
b625939
to
12bd3b4
Compare
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
12bd3b4
to
fdb656d
Compare
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
fdb656d
to
3ceaa9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmachowinski just checking if this is still go? or is there anything else to replace this one?
I think we ca drop this PR. We came to the conclusion, that loss events is intended, and its up the the waitables to deal with this. @wjwwood did a PR for this, so we should be fine on that front. As for the double take data, the test was very specific to the old implementation, as the timing needed to be right for the test to trigger. The bug has also been fixed in jazzy / rolling, so I think its save to drop the test as well. |
Added a test, if a trigger event for a waitable was missed.
Added a test, if take_data was called two times in a row, without calling is_ready in between.
This is related to #2250.