Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ros2topic] fix pub rate reduced issue #141

Merged
merged 1 commit into from
Sep 1, 2018

Conversation

yechun1
Copy link
Contributor

@yechun1 yechun1 commented Aug 29, 2018

when rate is 1000, time.sleep() in main loop will cost about 150ms every 1s, use time comparison instead of time.sleep(), save sleep cycle time lossing

Signed-off-by: Chris Ye chris.ye@intel.com

@tfoote tfoote added the in review Waiting for review (Kanban column) label Aug 29, 2018
Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good solution because it will use 100% of a CPU core waiting for the time to be expired. If you set the publish rate to 1 Hz, it will spend too much time just checking the time.

I would not merge this as-is.

@dirk-thomas
Copy link
Member

Please use a Timer instead - see here for an example.

@dirk-thomas dirk-thomas added requires-changes bug Something isn't working labels Aug 29, 2018
@yechun1
Copy link
Contributor Author

yechun1 commented Aug 30, 2018

I tried to use create_timer, the time_callback will also cost extra time exhaust. -r 1000 -p 1000, real time is about 1.280s.

I have no idea to fix the issue in pub command. Since it just call APIs to publish messages. Compare with ROS, it used rospy.Rate.sleep(). ref to timer.py. There is no rate loss. Could ROS2 rclpy be possible to support timer with Rate.sleep() function?

@dirk-thomas
Copy link
Member

I tried to use create_timer, the time_callback will also cost extra time exhaust. -r 1000 -p 1000, real time is about 1.280s.

Can you share that code somehow to review it?

@yechun1
Copy link
Contributor Author

yechun1 commented Aug 30, 2018

Here is code: https://github.com/yechun1/ros2cli/blob/fix_ros2topic_pub_issue2/ros2topic/ros2topic/verb/pub.py#L130

$ ros2 topic pub /image sensor_msgs/Image -r 1000 -p 1000
publisher: beginning loop
publishing #1000: sensor_msgs.msg.Image(header=std_msgs.msg.Header(stamp=builtin_interfaces.msg.Time(sec=0, nanosec=0), frame_id=''), height=0, width=0, encoding='', is_bigendian=0, step=0, data=[])

interval = 1.280345
publishing #2000: sensor_msgs.msg.Image(header=std_msgs.msg.Header(stamp=builtin_interfaces.msg.Time(sec=0, nanosec=0), frame_id=''), height=0, width=0, encoding='', is_bigendian=0, step=0, data=[])

interval = 1.302808

@yechun1 yechun1 force-pushed the fix_ros2topic_pub_issue branch from 04767a3 to 96a3d86 Compare August 30, 2018 05:26
@yechun1
Copy link
Contributor Author

yechun1 commented Aug 30, 2018

Compared with ROS1 which works very well, so I referenced to the code rospy.Rate.sleep(), since there is no rclpy.Rate.sleep() on ros2 currently, I implemented the code inside the pub.py, updated the patch, please help review, thanks.

@yechun1 yechun1 force-pushed the fix_ros2topic_pub_issue branch 2 times, most recently from 5c4808c to 4938813 Compare August 30, 2018 15:38
@dirk-thomas
Copy link
Member

dirk-thomas commented Aug 30, 2018

There are two separate issues here:

  1. The time each invocation of the publish method takes. That is a problem in / below rclpy (actually in C code below which performs the conversion to a C struct and then publishes the data through the RMW layer).

  2. The current logic in the loop doesn't maintain the desired rate but it adds the time the publish call takes to each cycle.

The second problem can be address - as suggested and recommended above - by using a timer. This works as long as the duration of the publish() call doesn't exceed the cycle time.

The solution using a timer will look like this:

import time
import rclpy

rclpy.init()
node = rclpy.create_node('test_timer')

def timer_callback():
    print(time.time())
    time.sleep(0.5)

timer = node.create_timer(1.0, timer_callback)

rclpy.spin(node)

node.destroy_timer(timer)
node.destroy_node()
rclpy.shutdown()

Please update this PR accordingly to avoid the manual calculation of the time to sleep.

The first problem will need to be addressed on a lower level. You can easily reproducing the problem with my code snippet by changing the sleep() call to e.g. 2.0s.

@yechun1
Copy link
Contributor Author

yechun1 commented Aug 31, 2018

I have tried timer but the result is not good:
1. the timer is good for 1hz, but not good for 1000hz. Because timer call cycle will exhaust time.
2. pub and pub call will cost time, but not major rate loss in this scenario (pub -r 1000 -p 1000)

Here is sample code

Below is result print every 0.001s * 1000. All the data has been added up on each 1ms cycle.
real_1s_interval: each expected 1s, then print real time.
cb_exec_time: callback execution time in 1s. (include pub_call_time)
timer_call_time: time to trigger callback, it is outside the callback.

real_1s_interval = 1.313601
cb_exec_time + timer_call_time = 1.313601
cb_exec_time:0.105953
timer_call_time:1.207648
pub_call_time:0.102473

Below is result that disabled pub(msg) code, timer_call_time cost much more time.
( #pub.publish(msg))

real_1s_interval = 1.300832
cb_exec_time + timer_call_time = 1.300832
cb_exec_time:0.003738
timer_call_time:1.297094
pub_call_time:0.000399

Should we report a bug on timer with 1ms 1000hz test? Or adopt manual calculation that adopt the sleep mechanism from ROS topic pub code?

@dirk-thomas
Copy link
Member

the timer is good for 1hz, but not good for 1000hz. Because timer call cycle will exhaust time

Can you clarify what you mean with this. Why is it "not good"? What does "exhaust time" mean?


On my system a callback triggered by a timer is on average about 0.2ms "late". For a rate of 1000 Hz that adds a significant delay. The current implementation of timers doesn't produce a "fixed cycle time" but accumulates these delays over time which for 1000Hz is pretty significant. I created ros2/rcl#289 to address this in the timer directly.

Therefore I would suggest to update this patch to use a timer which can then also be used to validate the future fix of the timer itself. I don't think implementing custom timing logic in this package as a temporary workaround is a good idea.

@yechun1 yechun1 force-pushed the fix_ros2topic_pub_issue branch from 4938813 to 2a20391 Compare September 1, 2018 10:54
time.sleep will add the time the publish call takes to each cycle. Use a timer to avoid pub rate loss.

Signed-off-by: Chris Ye <chris.ye@intel.com>
@yechun1
Copy link
Contributor Author

yechun1 commented Sep 1, 2018

understand, updated the code, please help review.
I have tested by using a timer the real period of 1ms count to 1000 is 1.3s. The expect 1000 pub rate is reduce to about 769 Hz (1000/1.3). Most time cost in timer delay. Maybe we should address the issue in the timer.

@yechun1
Copy link
Contributor Author

yechun1 commented Sep 1, 2018

Verified the timer patch based on (ros2/rcl#289]):

before change (timer without ros2/rcl#289):
$ ros2 topic pub /image sensor_msgs/Image -r 1000 -p 1000

publishing #307000: sensor_msgs.msg.Image(header=std_msgs.msg.Header(stamp=builtin_interfaces.msg.Time(sec=0, nanosec=0), frame_id=''), height=0, width=0, encoding='', is_bigendian=0, step=0, data=[])

$ ros2 topic hz /image (ref to #133)

average rate: 833.743
min: 0.000s max: 0.004s std dev: 0.00008s window: 10000

# after change: (timer based on [ros2/rcl#289])
ros2 topic pub /image sensor_msgs/Image -r 1000 -p 1000

publishing #103000: sensor_msgs.msg.Image(header=std_msgs.msg.Header(stamp=builtin_interfaces.msg.Time(sec=0, nanosec=0), frame_id=''), height=0, width=0, encoding='', is_bigendian=0, step=0, data=[])

ros2 topic hz /image

average rate: 1000.008
min: 0.000s max: 0.003s std dev: 0.00007s window: 10000
average rate: 999.991
min: 0.000s max: 0.003s std dev: 0.00007s window: 10000
average rate: 999.993

This is a significant improvement, the timer call delay issue should be fixed.
And one other issue is when pub and sub(by topic hz) for a while, the pub rate will randomly decrease, cpu consumption is then more than 100%. I have checked the publish(msg) cost much time, not related with timer call, the original code with time.sleep loop also has the same issue.

@dirk-thomas
Copy link
Member

@yechun1 Thank you for iterating with me on the patch.

@dirk-thomas dirk-thomas merged commit 62ee790 into ros2:master Sep 1, 2018
@dirk-thomas dirk-thomas removed the in review Waiting for review (Kanban column) label Sep 1, 2018
@yechun1 yechun1 deleted the fix_ros2topic_pub_issue branch September 2, 2018 14:28
esteve pushed a commit to esteve/ros2cli that referenced this pull request Dec 16, 2022
…e event (ros2#141)

* Add 'handle_once' property to EventHandler

This is useful for implementing "one shot" events.
The property can be set via the constructor or setter.
If 'handle_once' is set to True, then the EventHandler unregisters itself from the context after the first time it is handled.
All subclasses of EventHandler pass it kwargs so they can use the new property.
Tests included.

* Add a common interface for describing an EventHandler

A common pattern for describing event handlers has been moved to the parent class.
Subclasses should implement the 'handler_description' and 'matcher_description' properties rather than implementing a describe method.
esteve pushed a commit to esteve/ros2cli that referenced this pull request Dec 16, 2022
Signed-off-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants