-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add time source and clock API to nodes #325
Add time source and clock API to nodes #325
Conversation
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick review for now, I'll have to do a more thorough one when I get a bit more time. Still, hopefully this is at least slightly useful...
rclrs/src/clock.rs
Outdated
let mut clock = self._rcl_clock.lock().unwrap(); | ||
let mut time_point: i64 = 0; | ||
unsafe { | ||
// SAFETY: The function will only fail if the clock is not initialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for the clock to be uninitialized at this point? If so, there should be a guard.
Also, SAFETY isn't necessarily warning about what will cause a function to fail - it's about what needs to be done to prevent undefined behaviour or breaking Rust's memory-safety guarantees. The unsafe
blocks are the only places where this can occur, so the comments are our way of letting future programmers know what needs to be accounted for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. The clock cannot be uninitialized at this point with the "private default constructor" trick above, otherwise it could be if we implement a default constructor, so this function should never fail.
Still, even having an uninitialized clock would not be undefined behavior, just the rcl
function doing nothing and the above zero initialization being returned as the result. Changed the note 978fd2e
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Co-authored-by: jhdcs <48914066+jhdcs@users.noreply.github.com>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
rclrs/src/node.rs
Outdated
@@ -71,6 +71,9 @@ pub struct Node { | |||
pub(crate) guard_conditions_mtx: Mutex<Vec<Weak<GuardCondition>>>, | |||
pub(crate) services_mtx: Mutex<Vec<Weak<dyn ServiceBase>>>, | |||
pub(crate) subscriptions_mtx: Mutex<Vec<Weak<dyn SubscriptionBase>>>, | |||
_clock: Arc<Mutex<Clock>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the inner _rcl_clock
is an Arc<Mutex<rcl_clock_t>>
is it really necessary for this _clock
to also be wrapped in an Arc<Mutex<Clock>>
? Its only other field is a ClockType
which is a simple enum so it's definitely send
and sync
.
I would recommend making this field a simple Clock
and changing get_clock
to return &Clock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am quite new at Arc
, Mutex
and Rust multithreading so I'll admit not being 100% sure of what I am doing.
The same Clock
instance has to be shared between the node (so users can do a node.get_clock().now()
) and the TimeSource
(so it can be driven by a /clock
topic) so shared ownership is a necessity and Arc
makes it a lot easier than having to deal with lifetimes.
However, what you say is true and since the inner _rcl_clock
is wrapped and thread safe, there is technically no need to have this as an Arc<Mutex<Clock>>
.
In fde0f35 I changed it from Arc<Mutex<Clock>>
to Arc<Clock>
, removing the cost of double locking and making the user face API a lot cleaner. I'm not 100% sure how to remove the need for Arc
because of the sharing I mentioned above but hopefully this is already a great win, thanks for catching it!
In the process of removing mutexes I also realized that we can just use atomic booleans instead of having a Mutex<bool>
and addressed that as well b808cc4, it was a fun investigation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanations, I'm understanding the situation better now.
I think we should tweak the way Clock
and TimeSource
are designed. The current approach feels to me like it has a fragile API, requiring the user to do the right things in the right way at the right time. I would suggest something like this:
/// Use the Clock to get time
pub struct Clock {
_type: ClockType,
// Use RwLock instead of Mutex because we don't want to block multiple readers
_rcl_clock: Arc<RwLock<rcl_clock_t>>,
}
/// Use the ClockSource to set the time of a clock
pub struct ClockSource {
_rcl_clock: Arc<RwLock<rcl_clock_t>>,
}
impl Clock {
pub fn system() -> Result<Self, RclrsError> {
Self::make(ClockType::SystemTime)
}
pub fn steady() -> Result<Self, RclrsError> {
Self::make(ClockType::SteadyTime)
}
pub fn with_source() -> Result<(Self, ClockSource), RclrsError> {
let clock = Self::make(ClockType::RosTime)?;
let source = ClockSource { _rcl_clock: clock._rcl_clock.clone() };
Ok((clock, source))
}
fn make(kind: ClockType) -> Result<Self, RclrsError> {
/* ... implemented the way new(ClockType) already is, but it's not public ... */
}
pub fn now(&self) -> Time { ... }
pub fn clock_type(&self) -> ClockType { ... }
}
impl ClockSource {
pub fn set_time(&self, nanoseconds: i64) {
/* ... lock mutex and use rcl_set_ros_time_override here ... */
}
}
This way we have a clean separation between the API that end users will access to receive clock times and the API that we'll use internally to set clock times based on the /clock
topic. We also don't need to put Clock
in any Mutex or Arc this way. We would give a ClockSource
to the TimeSourceBuilder
instead of giving a Clock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 5df379f (and cleaned up a bit in following commits).
I tried to use the RwLock
but encountered poor results. In principle it should help with a many-readers pattern but in practice even the now
function requires mutable access because the C binding uses a mutable raw pointer so I couldn't get it to compile.
I left a new
function to avoid overcomplicating the Node's builder code, but can take it out if you think it clutters the API too much.
The main difference with the suggested implementation is in the fact that ClockSource
has both a function to set the time and one to enable / disable the time override. I believe the only ways to get rid of the enable / disable logic would be to encode it in the ClockSource
existence (i.e. automatically enable the override when a ClockSource
is constructed, disable when it is destructed) or diverge a bit from the other client library and have RosTime
only encode "Time as dictated by the /clock
topic", and have it fallback to SystemTime
if use_sim_time=false
.
/// Struct that implements a Clock and wraps `rcl_clock_t`. | ||
pub struct Clock { | ||
_type: ClockType, | ||
_rcl_clock: Arc<Mutex<rcl_clock_t>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe we ever need to change the value of the inner rcl_clock_t
. I would recommend removing the Mutex<>
wrapper and see if this code still compiles (we'll also need to get rid of the many .lock().unwrap()
calls). If it does compile then we don't need the Mutex<>
, and we'll avoid a lot of unnecessary locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at it and I couldn't come up with a way to remove the Mutex
since the value in _rcl_clock
could be read and written from different threads at the same time. Still, the Arc
was unnecessary since an rcl_clock_t
instance is always owned by a single clock so at least I could remove that layer of indirection 19a73c3.
Specifically, the set_ros_time
function would modify the inner _rcl_clock
by setting its internal time, at the same time the now
function would read the internal clock and return it. I'm a bit afraid of what would happen if they were both called from concurrent threads, rcl
documentation specifies that even though the functions use atomics, concurrent calls on the same clock
object are not safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I think it may be best to leave in the Mutex
for now, and re-visit this if locking performance becomes an issue. Our hands are a bit tied, since we have to use the rcl
internal clock functions to maintain compatibility with the rest of ROS 2... If locking mutexes becomes a bottleneck, and the underlying safety of rcl
turns out to be the issue preventing us from removing those mutexes, we will need some hard data to prove a change is needed.
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
One high level remark about the current state of the PR is it uses It's probably okay to merge this PR as-is if we intend to do another pass through the whole |
Agreed. The bulk of the unwraps here come from two sources: Mutex / RwLock lock unwrapping, it seems to be done extensively in this codebase and I didn't find any case of dealing with lock poisoning, so I just followed the way it seems to be done. Clock creation returning a Result. This
Edit: A third minor source of I tried to document the remaining ones to say why unwrapping is safe. |
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Thanks again for all the hard work, some replies to the pain points part:
I'd prefer if the
We tend to leave the
We still to vendorize messages because some of the |
Yes there is an option in
In the end I went with changing the node builder to create an
I'm not sure what the procedure here is. The code "seems to compile" but it is downloading |
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
rclrs/src/time_source.rs
Outdated
/// Attaches the given node to to the `TimeSource`, using its interface to read the | ||
/// `use_sim_time` parameter and create the clock subscription. | ||
pub fn attach_node(&mut self, node: Weak<Node>) { | ||
self._node = node; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I missed something, but do we actually need to store the weak reference to the Node at all here? It seems like we only read the use_sim_time
parameter at the time of a TimeSource
structs creation. Unless we plan to re-use these TimeSource
's or if it actually does make sense to have them update when the parameter updates, I don't think we should store the weak reference.
Also this would make the relationship between Node and TimeSource Node -> TimeSource
, instead of Node <-> TimeSource
. Then Node
could just hold an Optional<TimeSource>
which would reduce the amount of Arc
creeping into the public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we only read the value of the parameter (and create the clock subscriber) once at creation time right now so we could technically just pass the node as an argument to the builder and just store an Option<Subscription>
instead.
However, this behavior is not the full implementation and all other client libraries do store the node in the time source. The reason for this is that the time source needs to be able to subscribe to parameter changes and, when the use_sim_time
parameter changes at runtime, be able to create and destroy subscriptions. This is not possible without holding a reference to the node.
If we just did the "bare minimum" now and didn't pass / store the node
we would have to change the API once parameters are implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how much access an end user actually needs to TimeSource
. I can't remember ever needing to touch that as a user in any other client library.
Is it feasible to slap pub(crate)
on enough structs and methods that we can avoid API commitments until the full implementation is ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! Indeed you are right and that part of the code is very rarely touched by users.
In 382f780 I made everything in time_source.rs
either private or pub(crate)
so it doesn't export anything as a public API. I also removed the use_clock_thread
builder argument since it is not implemented yet and sounds like might not be for some time.
This added another unwrap that however falls in the class of the ones I described above (only fail if we can't allocate a few bytes of memory) so it's mostly noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the
use_sim_time
parameter changes at runtime
This doesn't happen currently, I'm guessing its blocked by the lack of a parameter implementation in this PR?
I would prefer if we did something similar to the parameters impl where we deviate slightly from rclcpp. Forcing end users to wrap all nodes in Arc
is a large API change for a niche runtime configuration option IMO
But I'll defer to @esteve, @nnmm, and @jhdcs. I think this PR has been open long enough and we can always iterate :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realistically I think it'll be close to impossible to implement a whole bunch of sophisticated features down the road without requiring that Node
s are wrapped in Arc
unless we take an approach where almost every field inside the Node
struct is wrapped in its own Arc
to hide this implementation detail.
But I don't think that's a great idea because it just hides important information from the user about the lifecycle and identity of a node. For example a user might mistakenly think that they can create a second node instance by cloning an existing one when in reality they would only be getting a new reference to the same node instance.
Putting the Arc on the outside explicitly communicates the lifecycle and shareability properties of the node.
I can imagine implementing a Rust client library that does not use Arc for nodes if it were implemented inside of an ECS framework, but I think such a library should be done as a separate project from this one.
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
@luca-della-vedova now that #332 has been merged, could you rebase this PR? Thanks. |
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Done and I also added a test that makes sure Edit: I stared at the code extensively but not sure how to solve the above issue without some major refactoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me for including in a 0.4 release. Sure, there may be some rough edges, but we can smooth those out in future more-narrowly-tailored PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luca-della-vedova thank you so much for this contribution!
Progress towards #121
Hi! I'm opening this very much WIP PR to get some feedback on general direction and decisions, as well as getting some input on how to deal with some of the hurdles I've encountered.
This PR adds three modules:
time.rs
Includes a
Time
structure, currently quite bare and can just be constructed / converted. I intentionally left most of theDuration
, as well as different operations with times which could lead to overflows out of the PR to keep it simple.clock.rs
Adds a
ClockType
enum that wraps thercl_clock_type_t
enum but removes the uninitialized case, making all the variants valid. It also adds aClock
type that is meant to be the main source of truth for clock information.The clock mimics how
rclcpp::Clock
works by using the samercl
function calls. It can be set to either of the variants and it offers APIs to fetch its type, as well as getting the current time, enabling / disabling ros time override, and setting the current ros time (which usually comes from a/clock
topic).time_source.rs
Similar to how
rclcpp::TimeSource
/rclpy::TimeSource
works, it keeps a list of associated clocks and reads theuse_sim_time
parameter and drive the attached clocks from a/clock
callback if the parameter is set to true. However, parameter callbacks are not implemented so this can only be set once at node execution time.Test it!
If you run the minimal pub/sub example and a gazebo instance you can run the subscriber:
And you should see, depending on the parameter, the wall time or gazebo time being printed.
Docs are still a WIP and that will be my next step but I would be happy to have some input on some of the main pain points I found (as well as get any feedback on the high level direction).
Pain points
rosgraph_msgs
since this is the type of the/clock
topic. I believe this should be OK sincerclcpp
also depends on it. However, I couldn't extract the contained clock the wayrclcpp
does it because we vendorbuiltin_interfaces
and the type for the clock in the vendored package and the one included when adding arosgraph_msgs
dependency are different. I guess the solution to this would be vendoringrosgraph_msgs
? Or not vendoring anything at all?TimeSource
s need a reference to an existing node so they can create subscriptions for the/clock
topic. This means that the time source can only be attached to a node after it is created. Furthermore, we have a bunch of different ways to create nodes, the main node builder creates aNode
, utility functions create either aNode
or anArc<Node>
, so the attachment is a bit hacky. If the main node builder returned anArc<Node>
the code would be a fair bit cleaner since we could do all the creation and attachment at the builder level but not sure this would be a welcome change./clock
subscription so for now the callbacks and spinning happens in the main thread which is quite not ideal (with multiple callbacks queued when spinning, the clock would freeze until its turn to run comes). However I couldn't find an immediate way to spawn a second thread to spin only a set of topics since there seems to be no concept of callback groups inrclrs
.Time
instances compared to other client libraries. The current implementation has eachTime
contain a weak pointer to the clock that generated it and have an API that allows comparison and returns anOption
that isNone
if the two times are not comparable because they were generated from different clocks, even if the clocks are of the same type. This is a stronger constraint than other libraries but should also be safer.RosTime
slightly. In other rcl libraries it means "Transparently use either a simulated clock or the system time depending on theuse_sim_time
parameter value."This behavior however, creates issues when the parameter value is updated at runtime because users might suddenly try to compare a system time to a simulated time and have their code behave unexpectedly since the library would allow that.
An idea that floated is to have
RosTime
mean exclusively "simulation time", and the inner clock type changing betweenRosTime
andSystemTime
depending on theuse_sim_time
parameter. Now whenever the parameter value changes, the underlying node clock is recreated with a new variant, and all existingTime
objects will explicitly fail to be compared, giving the user a strong warning that they might be falling into undefined behavior.This is quite a change from other rcl libraries so I thought I'd check what the feeling is before going into it.
A last pain point I found is dealing with
TimeSource
and differentClock
types: Other libraries for examplerclcpp
here, throw exceptions if the user builds aTimeSource
with aClock
that is not set toRosTime
and theuse_sim_time
parameter is set totrue
. I tentatively made it anError
instead but I'm also not happy about this decision. The alternatives for the case of the user manually setting the node's clock type toSystemTime
and setting theuse_sim_time
parameter are:Panic
, consistent withrclcpp
andrclpy
, warns the user in the strongest possible way that something might be wrong with their configuration.Result
, leaves the user the responsibility of checking how each clock attaching operation went. Does not however deal with the case of attaching a nonRosTime
clock while theuse_sim_time
is set tofalse
, just for the parameter to be updated later in a parameter callback.rclcpp
warns about this with a parameter setting resultIgnore
, assume that if the user sets the clock toSystemTime
they want the system time regardless and are OK with ignoring theuse_sim_time
parameter, also a bit dangerous and a deviation from how other libraries do it.Following feedback in #325 (comment), the parameter is now not read if the clock is not
RosTime
so the current behavior isIgnore
.TODOs (on top of the above):
/clock
subscription, instead of using the main node's executor. I believe this is not currently possible.rcl
calls and make sure we return results when needed and ignore return values when not.cargo fmt
andcargo clippy
, as well as sanity check against the contribution guide