-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publishing large data is 30x-100x slower than for rclcpp #763
Comments
Corresponding question on answers.ros.org: |
Thanks for the report. I'm able to reproduce with the latest foxy binaries and your example. I can reproduce it in this smaller example too. Just python MRE#!/usr/bin/python3
import array
import builtin_interfaces.msg
import rclpy
from rclpy.node import Node
from rclpy.qos import QoSProfile
from sensor_msgs.msg import PointCloud2
from sensor_msgs.msg import PointField
class SelfCycler(Node):
"""Publish Point cloud to itself endlessly."""
def __init__(self):
super().__init__("self_cycle_py")
pub_qos = QoSProfile(depth=10)
sub_qos = QoSProfile(depth=10)
self._pc_sub = self.create_subscription(PointCloud2, topic="pc_py", callback=self._handle_pc, qos_profile=sub_qos)
self._pc_pub = self.create_publisher(PointCloud2, topic="pc_py", qos_profile=pub_qos)
self._timer = self.create_timer(1.0, self.publish_msg)
def publish_msg(self):
msg = self.create_pc_msg()
t1 = self.get_clock().now().to_msg()
self._pc_pub.publish(msg)
t2 = self.get_clock().now().to_msg()
self.get_logger().info(f"publishing took {self.format_dt(t1, t2)}")
def create_pc_msg(self):
msg = PointCloud2()
msg.header.stamp = self.get_clock().now().to_msg()
msg.header.frame_id = "base_link"
# 10 MB
msg.height = 1
msg.width = 327680
x_field = PointField()
x_field.name = "x"
x_field.offset = 0
x_field.datatype = PointField.FLOAT32
x_field.count = 1
y_field = PointField()
y_field.name = "y"
y_field.offset = 4
y_field.datatype = PointField.FLOAT32
y_field.count = 1
z_field = PointField()
z_field.name = "z"
z_field.offset = 8
z_field.datatype = PointField.FLOAT32
z_field.count = 1
rgb_field = PointField()
rgb_field.name = "rgb"
rgb_field.offset = 16
rgb_field.datatype = PointField.FLOAT32
rgb_field.count = 1
msg.fields = [x_field, y_field, z_field, rgb_field]
msg.is_bigendian = False
msg.point_step = 32
msg.row_step = 10485760
msg.data = array.array('B', [123 for _ in range(10485760)])
msg.is_dense = True
return msg
def format_dt(self, t1: builtin_interfaces.msg.Time, t2: builtin_interfaces.msg.Time):
""" Helper for formatting the difference between two stamps in microseconds """
us1 = t1.sec * 1e6 + t1.nanosec // 1e3
us2 = t2.sec * 1e6 + t2.nanosec // 1e3
return f"{int(us2 - us1):5} [us]"
def _handle_pc(self, pc: PointCloud2):
""" Callback for the PC subscriber. """
self.get_logger().info("Got point cloud!")
def main(args=None):
rclpy.init()
node = SelfCycler()
try:
rclpy.spin(node)
finally:
node.destroy_node()
if __name__ == '__main__':
main() Command output using latest Foxy debians on Ubuntu Focal $ python3 self_cycle.py
[INFO] [1617994828.072710265] [self_cycle_py]: publishing took 85728 [us]
[INFO] [1617994828.081663188] [self_cycle_py]: Got point cloud!
[INFO] [1617994829.062116464] [self_cycle_py]: publishing took 80831 [us]
[INFO] [1617994829.073127008] [self_cycle_py]: Got point cloud!
[INFO] [1617994830.057854277] [self_cycle_py]: publishing took 82257 [us]
[INFO] [1617994830.069011414] [self_cycle_py]: Got point cloud!
[INFO] [1617994831.058229997] [self_cycle_py]: publishing took 83989 [us]
[INFO] [1617994831.069128751] [self_cycle_py]: Got point cloud! |
@sloretz Out of curiosity, how does Rolling/Galactic fair? |
Output using #...
def publish_msg(self):
msg = self.create_pc_msg()
with cProfile.Profile() as pr:
t1 = self.get_clock().now().to_msg()
self._pc_pub.publish(msg)
t2 = self.get_clock().now().to_msg()
pr.print_stats()
self.get_logger().info(f"publishing took {self.format_dt(t1, t2)}")
sys.exit()
#... cProfile output
Seems to say all significant time spent is inside of |
It's not the same comparison, but my Rolling workspace built in Debug mode is the same order of magnitude for Python. I haven't tried the example comparing it with C++.
|
Thanks. Then I'll leave this on the Galactic board to look at. |
Using @hidmic you called the location :) Template for generated code https://github.com/ros2/rosidl_python/blob/5b9fe9cad6876e877cbbcf17950c47a9e753c6e6/rosidl_generator_py/resource/_msg_support.c.em#L158 Generated code in file Generated code for above functionROSIDL_GENERATOR_C_EXPORT
bool sensor_msgs__msg__point_cloud2__convert_from_py(PyObject * _pymsg, void * _ros_message)
{
// check that the passed message is of the expected Python class
{
char full_classname_dest[42];
{
char * class_name = NULL;
char * module_name = NULL;
{
PyObject * class_attr = PyObject_GetAttrString(_pymsg, "__class__");
if (class_attr) {
PyObject * name_attr = PyObject_GetAttrString(class_attr, "__name__");
if (name_attr) {
class_name = (char *)PyUnicode_1BYTE_DATA(name_attr);
Py_DECREF(name_attr);
}
PyObject * module_attr = PyObject_GetAttrString(class_attr, "__module__");
if (module_attr) {
module_name = (char *)PyUnicode_1BYTE_DATA(module_attr);
Py_DECREF(module_attr);
}
Py_DECREF(class_attr);
}
}
if (!class_name || !module_name) {
return false;
}
snprintf(full_classname_dest, sizeof(full_classname_dest), "%s.%s", module_name, class_name);
}
assert(strncmp("sensor_msgs.msg._point_cloud2.PointCloud2", full_classname_dest, 41) == 0);
}
sensor_msgs__msg__PointCloud2 * ros_message = _ros_message;
{ // header
PyObject * field = PyObject_GetAttrString(_pymsg, "header");
if (!field) {
return false;
}
if (!std_msgs__msg__header__convert_from_py(field, &ros_message->header)) {
Py_DECREF(field);
return false;
}
Py_DECREF(field);
}
{ // height
PyObject * field = PyObject_GetAttrString(_pymsg, "height");
if (!field) {
return false;
}
assert(PyLong_Check(field));
ros_message->height = PyLong_AsUnsignedLong(field);
Py_DECREF(field);
}
{ // width
PyObject * field = PyObject_GetAttrString(_pymsg, "width");
if (!field) {
return false;
}
assert(PyLong_Check(field));
ros_message->width = PyLong_AsUnsignedLong(field);
Py_DECREF(field);
}
{ // fields
PyObject * field = PyObject_GetAttrString(_pymsg, "fields");
if (!field) {
return false;
}
PyObject * seq_field = PySequence_Fast(field, "expected a sequence in 'fields'");
if (!seq_field) {
Py_DECREF(field);
return false;
}
Py_ssize_t size = PySequence_Size(field);
if (-1 == size) {
Py_DECREF(seq_field);
Py_DECREF(field);
return false;
}
if (!sensor_msgs__msg__PointField__Sequence__init(&(ros_message->fields), size)) {
PyErr_SetString(PyExc_RuntimeError, "unable to create sensor_msgs__msg__PointField__Sequence ros_message");
Py_DECREF(seq_field);
Py_DECREF(field);
return false;
}
sensor_msgs__msg__PointField * dest = ros_message->fields.data;
for (Py_ssize_t i = 0; i < size; ++i) {
if (!sensor_msgs__msg__point_field__convert_from_py(PySequence_Fast_GET_ITEM(seq_field, i), &dest[i])) {
Py_DECREF(seq_field);
Py_DECREF(field);
return false;
}
}
Py_DECREF(seq_field);
Py_DECREF(field);
}
{ // is_bigendian
PyObject * field = PyObject_GetAttrString(_pymsg, "is_bigendian");
if (!field) {
return false;
}
assert(PyBool_Check(field));
ros_message->is_bigendian = (Py_True == field);
Py_DECREF(field);
}
{ // point_step
PyObject * field = PyObject_GetAttrString(_pymsg, "point_step");
if (!field) {
return false;
}
assert(PyLong_Check(field));
ros_message->point_step = PyLong_AsUnsignedLong(field);
Py_DECREF(field);
}
{ // row_step
PyObject * field = PyObject_GetAttrString(_pymsg, "row_step");
if (!field) {
return false;
}
assert(PyLong_Check(field));
ros_message->row_step = PyLong_AsUnsignedLong(field);
Py_DECREF(field);
}
{ // data
PyObject * field = PyObject_GetAttrString(_pymsg, "data");
if (!field) {
return false;
}
PyObject * seq_field = PySequence_Fast(field, "expected a sequence in 'data'");
if (!seq_field) {
Py_DECREF(field);
return false;
}
Py_ssize_t size = PySequence_Size(field);
if (-1 == size) {
Py_DECREF(seq_field);
Py_DECREF(field);
return false;
}
if (!rosidl_runtime_c__uint8__Sequence__init(&(ros_message->data), size)) {
PyErr_SetString(PyExc_RuntimeError, "unable to create uint8__Sequence ros_message");
Py_DECREF(seq_field);
Py_DECREF(field);
return false;
}
uint8_t * dest = ros_message->data.data;
for (Py_ssize_t i = 0; i < size; ++i) {
PyObject * item = PySequence_Fast_GET_ITEM(seq_field, i);
if (!item) {
Py_DECREF(seq_field);
Py_DECREF(field);
return false;
}
assert(PyLong_Check(item));
uint8_t tmp = (uint8_t)PyLong_AsUnsignedLong(item);
memcpy(&dest[i], &tmp, sizeof(uint8_t));
}
Py_DECREF(seq_field);
Py_DECREF(field);
}
{ // is_dense
PyObject * field = PyObject_GetAttrString(_pymsg, "is_dense");
if (!field) {
return false;
}
assert(PyBool_Check(field));
ros_message->is_dense = (Py_True == field);
Py_DECREF(field);
}
return true;
} |
Potentially https://docs.python.org/3/c-api/buffer.html , along with a single |
This comment has been minimized.
This comment has been minimized.
Hi, first of all thanks to @sloretz and @clalancette for narrowing down the problem so fast!! @gavanderhoorn I think the questions you are referring to also talk about performance problems, but with different reasons:
We really spent a lot of time searching for questions where the problem is really only the Python API. That's why we also spent the time trying out all solutions from the existing questions you mentioned (see "Tried Solutions" above) and isolating the issue as much as possible in the reproduction package. I would also say that is is not really relevant if the problem might be old, because it stays a problem if rclpy can't keep up with the ROS1 Python API in terms of performance. Looking at the section @sloretz found to be the issue, I think it seems that every byte of the 10MB data array is iterated over with casts and single |
This comment has been minimized.
This comment has been minimized.
I think this can be closed now that ros2/rosidl_python#129 has been merged! |
TL;DR: Publishing a 10MB Pointcloud takes only 2.8 ms in rclypp but 92 ms in rclpy.
We observed significant latency problems when publishing large data like images or pointclouds via rclpy. Investigating the problem, we found that publishing large data of any kind is 30-100x slower in Python than in C++.
We found the Python
Publisher.publish(msg)
call takes especially long compared to rclcpp. Publishing a 640x480 Pointcloud of size ~10MB with 15 FPS is not possible with rclpy even on our quite powerful machine (Ryzen 3900X), let alone our robot (Jetson Xavier NX). The time taken for both seems to be about linear with the data size.Reproduction: https://github.com/karl-schulz/ros2_latency
Bug report
Steps to reproduce issue
I created a package for easy reproduction:
https://github.com/karl-schulz/ros2_latency
Basically, it consists of:
source
publishing a random pointcloud of X MB size, X is controllable over a parameterrepeater
s for C++ and Python, respectively, whichmeasure
which subscribes to both topics and compares the timestamps with the current ROS timeExpected behavior
Actual behavior
Additional information
Tested on:
Tried Solutions
We were pretty desperate and already tried:
(however the good performance on rclcpp indicates that it is not a middleware problem)
https://index.ros.org/doc/ros2/Troubleshooting/DDS-tuning/
(but the same rationale applies)
Image
messages instead ofPointloud2
This problem is actually really critical for our application. In the worst case: Is this expected behavior? This would make the ROS2 Python API useless for high-framerate or low-latency image pipelines.
Help would be really appreciated!
The text was updated successfully, but these errors were encountered: