-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Changes for Multihtreaded Events Executor #2466
Conversation
07eaf14
to
1404d45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I would suggest to add unit-tests for the Clock
object
Writing tests was a good suggestion, I screwed up the clock interface, it's not thread safe, I will redesign it... |
1404d45
to
36061ed
Compare
Redesigned the clock sleep interruption. Can the reviewers have a good lock at the mutex/lock/wait construct, its hard to get it right without data races.... |
I'll need to add a way to terminate the clock and all sleeps as soon as the shutdown call was done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, a couple of minor comments.
c118463
to
cb3e4cc
Compare
rebased to rolling |
Waiting for this to get in before running CI: ros2/rosidl_typesupport_fastrtps#117 |
cb3e4cc
to
1795139
Compare
rebased and (hopefully) fixed windows compile error |
1795139
to
eafb691
Compare
Dropped the test refactoring. |
@jmachowinski this needs a merge from |
eafb691
to
30e7304
Compare
rebased. test_executors_timer_cancel_behavior is flaky on my machine. I hope this gets fixed by #2500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with green CI
I'm highly suspect of the changes to the Executor API to make it so you don't need to inherit from |
30e7304
to
76c970b
Compare
rebased to rolling |
This is useful in case a second thread needs to wake up another thread, that is sleeping using a clock. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
This adds support for multiple threads waiting on the same clock, while an shutdown is invoked. Signed-off-by: Janosch Machowinski <j.machowinski@cellumation.com>
76c970b
to
d50b888
Compare
I reworked this PR. Now everything in rclcpp::Executor is made virtual. I changed two function, that were template function, without any reason to normal functions. I reworked the template function spin_until_future_complete, so that it can be fully overwritten by a subclass. I added an constructor, that will not touch anything in the system. @wjwwood can you do a review ? |
This commit makes every public funciton virtual, and adds virtual impl function for the existing template functions. The goal of this commit is to be able to fully control the everything from a derived class. Signed-off-by: Janosch Machowinski <j.machowinski@cellumation.com>
d50b888
to
34e2a80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty close, had a few small style things and a suggestion to scope down the changes.
Also, I think there might still be some style issues? please run ament_uncrustify path/to/rclcpp --reformat
if you haven't already (saw some if(
that should be if (
).
template<typename RepT = int64_t, typename T = std::milli> | ||
void | ||
RCLCPP_PUBLIC | ||
virtual void | ||
spin_node_once( | ||
rclcpp::node_interfaces::NodeBaseInterface::SharedPtr node, | ||
std::chrono::duration<RepT, T> timeout = std::chrono::duration<RepT, T>(-1)) | ||
{ | ||
return spin_node_once_nanoseconds( | ||
node, | ||
std::chrono::duration_cast<std::chrono::nanoseconds>(timeout) | ||
); | ||
} | ||
const rclcpp::node_interfaces::NodeBaseInterface::SharedPtr & node, | ||
std::chrono::nanoseconds timeout = std::chrono::nanoseconds(-1)); | ||
|
||
/// Convenience function which takes Node and forwards NodeBaseInterface. | ||
template<typename NodeT = rclcpp::Node, typename RepT = int64_t, typename T = std::milli> | ||
void | ||
RCLCPP_PUBLIC | ||
virtual void | ||
spin_node_once( | ||
std::shared_ptr<NodeT> node, | ||
std::chrono::duration<RepT, T> timeout = std::chrono::duration<RepT, T>(-1)) | ||
{ | ||
return spin_node_once_nanoseconds( | ||
node->get_node_base_interface(), | ||
std::chrono::duration_cast<std::chrono::nanoseconds>(timeout) | ||
); | ||
} | ||
const std::shared_ptr<rclcpp::Node> & node, | ||
std::chrono::nanoseconds timeout = std::chrono::nanoseconds(-1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you absolutely need to make these non-template? I'm afraid existing code may not compile against this if certain types of durations are passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really need to, but all other functions in the executor have the same signature and compile just fine. Any std::chono::duration type should auto convert to std::chrono::nanoseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably the others taking only nanoseconds
is a probably a bug, see: https://stackoverflow.com/a/22362867
It seems like in C++20 we could have a function that takes std::chrono::duration<auto>
, but we don't have that right now.
I'll remove this change in my fix up.
@@ -447,6 +403,10 @@ class Executor | |||
rclcpp::node_interfaces::NodeBaseInterface::SharedPtr node, | |||
std::chrono::nanoseconds timeout); | |||
|
|||
virtual FutureReturnCode spin_until_future_complete_impl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
virtual FutureReturnCode spin_until_future_complete_impl( | |
RCLCPP_PUBLIC | |
virtual FutureReturnCode | |
spin_until_future_complete_impl( |
Also, needs at least a minimal docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said above, I'd consider dropping this change, but if you keep it, consider the name spin_until_future_complete_nanoseconds()
since that would match other functions like spin_node_once()
-> spin_node_once_nanoseconds()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got two colliding styles here, there is also
spin_once -> spin_once_impl and spin_some -> spin_some_impl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got two colliding styles here, there is also
spin_once -> spin_once_impl and spin_some -> spin_some_impl
Yeah, I'll leave it as-is, but I think this is related to the templated functions that take duration vs just taking nanoseconds, I still believe the templated approach is better, and in that case the narrowing to the non-template protected methods are truly just to cast it down to nanoseconds, so that makes sense in those cases. Likely if I had noticed I would have suggesting the newer functions follow that pattern, but it's not that important now.
rclcpp/include/rclcpp/executor.hpp
Outdated
// TODO(wjwwood): does not work recursively; can't call spin_node_until_future_complete | ||
// inside a callback executed by an executor. | ||
|
||
// Check the future before entering the while loop. | ||
// If the future is already complete, don't try to spin. | ||
std::future_status status = future.wait_for(std::chrono::seconds(0)); | ||
if (status == std::future_status::ready) { | ||
return FutureReturnCode::SUCCESS; | ||
} | ||
|
||
auto end_time = std::chrono::steady_clock::now(); | ||
std::chrono::nanoseconds timeout_ns = std::chrono::duration_cast<std::chrono::nanoseconds>( | ||
timeout); | ||
if (timeout_ns > std::chrono::nanoseconds::zero()) { | ||
end_time += timeout_ns; | ||
} | ||
std::chrono::nanoseconds timeout_left = timeout_ns; | ||
|
||
if (spinning.exchange(true)) { | ||
throw std::runtime_error("spin_until_future_complete() called while already spinning"); | ||
return spin_until_future_complete_impl(std::chrono::duration_cast<std::chrono::nanoseconds>( | ||
timeout), [&future] () { | ||
return future.wait_for(std::chrono::seconds(0)); | ||
} | ||
RCPPUTILS_SCOPE_EXIT(this->spinning.store(false); ); | ||
while (rclcpp::ok(this->context_) && spinning.load()) { | ||
// Do one item of work. | ||
spin_once_impl(timeout_left); | ||
|
||
// Check if the future is set, return SUCCESS if it is. | ||
status = future.wait_for(std::chrono::seconds(0)); | ||
if (status == std::future_status::ready) { | ||
return FutureReturnCode::SUCCESS; | ||
} | ||
// If the original timeout is < 0, then this is blocking, never TIMEOUT. | ||
if (timeout_ns < std::chrono::nanoseconds::zero()) { | ||
continue; | ||
} | ||
// Otherwise check if we still have time to wait, return TIMEOUT if not. | ||
auto now = std::chrono::steady_clock::now(); | ||
if (now >= end_time) { | ||
return FutureReturnCode::TIMEOUT; | ||
} | ||
// Subtract the elapsed time from the original timeout. | ||
timeout_left = std::chrono::duration_cast<std::chrono::nanoseconds>(end_time - now); | ||
} | ||
|
||
// The future did not complete before ok() returned false, return INTERRUPTED. | ||
return FutureReturnCode::INTERRUPTED; | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend dropping this change for now. It's likely to collide with #2475 if we merge that one and I don't think it's absolutely necessary (smaller changes are easier to get in when there's a lot of activity).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep this change, as it allows me to use a second thread to wait for the future, instead of the current construct. But if you have a hard opinion on dropping it, I am open for it.
Executor::Executor(const std::shared_ptr<rclcpp::Context> & context) | ||
: spinning(false), | ||
entities_need_rebuild_(true), | ||
collector_(nullptr), | ||
wait_set_({}, {}, {}, {}, {}, {}, context) | ||
{ | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better (as a workaround), it will be easier (in my opinion) to deprecate when we implement a base class after the release, as opposed to re-restricting the functions that currently require the executor base class.
// make sure the thread is already sleeping before we send the cancel | ||
std::this_thread::sleep_for(std::chrono::milliseconds(100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of a weak assumption (that the thread will start within 100ms), it would be better if we verified the thread had started before continuing, though there would still be a race between setting a bool in the thread and the next line of the thread that does sleep_until()
is called. Ideally we'd be able to see when the clock is sleeping somehow. But I don't think that's possible. Not sure anything should change, just wanted to point it out.
Co-authored-by: William Woodall <wjwwood@gmail.com> Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>
Co-authored-by: William Woodall <wjwwood@gmail.com> Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>
Co-authored-by: William Woodall <wjwwood@gmail.com> Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>
Co-authored-by: William Woodall <wjwwood@gmail.com> Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>
I did that, uncrustify doesn't fix this (?). BTW, why are we not using clang-format ? In my experience it works way better then the uncrustify / cpplint construct. |
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
c341f21
to
7804051
Compare
This change allows it to use a second thread to wait for the future to become ready. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
7804051
to
163f727
Compare
I did this and if found the issues (and auto-formatted the fix when I used
Are you running it correctly? |
We've had many long, detailed conversations about this, some as recently as a month ago. While I won't rehash all of the arguments, the short of it is:
We have many real problems in the code, this doesn't seem to be one of them (in my opinion). |
Closing in favor of #2506 |
Also, for what it's worth, I think |
Here is a paste of my system, it is clearly not working, but why ? |
Oh, you know, we've recently updated to a new version of uncrustify (0.78.1), that does things slightly differently than the old way. What version of uncrustify are you using? |
Recent installation, uncrustify_vendor installs 0.78.1f
I even patched ament_uncrustify to check if it picked up the right version, and this one also reports Uncrustify-0.78.1_f I have an Uncrustify-0.72.0_f installed by apt, but even if I uninstall this version, my system still reports no style divergence. |
These are the API changes needed in order to implement an external Multihtreaded Events Executor
@mjcarroll @alsora @fujitatomoya