management of subscribers/publishers #2

BobDean · 2014-09-02T15:23:40Z

Not sure if this is the way to make a comment or not, so please forgive.

Why should create_X return a handle for a publisher or subscriber when that information can be managed internally? What flexibility does it provide? Most topics are openend/closed at start/shutdown, with a single type on a single topic. With the current mechanism, there are 3 APIs: node, publish, subscribe. If the Node manages the publishers and subscribers, there would be a single API handling create/read/write, fewer objects for the client to track, and potentially a simpler locking and management scheme in a multi-threaded environment.

wjwwood · 2014-09-04T01:14:27Z

Hi @BobDean, this is a good place to comment on something this particular to the current implementation.

We return handles because that is the best way for a user to identify a publisher or subscription after creation. If a user wants to remove a subscription then the handle is the way to do it. For rclcpp (which is C++ specific) the api currently returns scoped publishers and subscriptions as std::shared_ptr's. We have discussed at length this idea of how ownership of publishers and subscriptions should work and we see benefits to different approaches, but for now we will stick with the scoped approach for subscriptions, publishers, and timers to keep the API symmetric.

In the future we may have some API sugar which allows for named publishers, subscriptions, and timers, which would be backed with some sort of map store for persistent ownership. Another option is always letting the node have ownership over the subscriptions, publishers, and timers, only giving the user an opaque handle with which it can interact with them through the node. This allows for create it and forget it for subscriptions and timers.

Make sure you have a look again, I just merged a major refactoring of the rclcpp api today, I actually had a drafted email response to you, but I was waiting to get it merged.

There are updated examples too:

ROS1 like:
- talker: https://github.com/ros2/examples/blob/master/userland/src/ros1_like/talker.cpp
- listener: https://github.com/ros2/examples/blob/master/userland/src/ros1_like/listener.cpp
Multiple nodes in one process: https://github.com/ros2/examples/blob/master/userland/src/explicit/two_nodes.cpp
Example of using callback groups to control execution of callbacks: https://github.com/ros2/examples/blob/master/userland/src/explicit/different_groups.cpp

wjwwood · 2014-09-04T02:38:07Z

For just a little more context on the need for some kind of handle or user provided name, you can consider roscpp (ROS1), where publishers are singletons based on topic name (implies type). This lead to this issue: ros/ros_comm#450

To summarize, this code produces the same publisher backed by the same queue:

auto pub1 = node_handle.advertise<std_msgs::String>("/chatter", 10);
// pub1's queue size is 10
auto pub2 = node_handle.advertise<std_msgs::String>("/chatter", 20);
// pub2's queue size is also 10

We don't see a reason to limit one message queue for each topic, especially in the case of composed components which do not have knowledge of each other, i.e. they need to be able to create unique queues for potentially non-unique topics.
Because of that, I don't believe there is any set of possible parameters to creating a publisher which can uniquely identify a publisher (the same applies to subscriptions), without a user provided name (which could still collide) or a handle. The handle can either be a reference to the publisher, a pointer, a smart pointer, or an opaque object like an index. The handle can also be scoped or not, i.e. user ownership or shared ownership with the node, respectively. Regardless of what the handle looks like or how it behaves, I believe a unique handle for each is a good idea.

BobDean · 2014-09-04T05:08:44Z

Hello,
I chose a question to get started, which may or may not have come out correctly. I have gone over the new code a bit, and will look deeper tomorrow.

[EDIT] there were some questions here about the necessity of publication vs callback queues, however in reading the current roscpp code for the 20th time, I am super confused. I may just need to call and have a voice chat about it to set it right.

If I understand you correctly, the current effort is to get dds in as the comm solution, with parity to existing design. Which is understandable, however my understanding of the 2.0 effort was to evaluate alternatives, which may be implemented later. So let me try another method, this may take a couple posts and discussions. In one way I am questioning the ros1 style for a C++ node. Currently, there are multiple methods and free functions required to initiate a roscpp node, and multiple classes to use. I ask why those functions cannot be included within a single class, with internal input queue(s) storing const shared_ptr as needed, and lifecycle status data automatically generated and published. Talker as a more object-y syntax could be:

Node node("talker");
node.loopPeriod(0.2); // 5 hz
if (node.createPublisher<simple_msgs::String>("chatter") == fail) exit(0);  // return code > exceptions imo

while (flag) {
      msg->data = "hello world" + std::to_string(i++);
      node.publish(msg);
      node.spin();  // handles i/o and callbacks for this node, and sleeps appropriately
}

node.publish() could have a secondary parameter containing a handle or id of some sort, or perhaps just the topic name itself if a unicast delivery is desired. The 2 nodes example could be:

Node node1("node1");
Node node2("node2");  // side effect,  node2 is registered with the  internal state

node1.createPublisher("chatter");
node2.createSubscriber("chatter",queueSize,subscriptionRate,on_message);

while (flag) {
     node1.publish(msg); // side-effect in on_message() being called for node1
     node1.sleep();  // the 2 node example only has timing requirement for node1, so node2.sleep() is skipped
}

I'll leave discussing a Node as a baseclass for another time.

wjwwood · 2014-09-04T08:46:49Z

If I understand you correctly, the current effort is to get dds in as the comm solution, with parity to existing design. Which is understandable, however my understanding of the 2.0 effort was to evaluate alternatives, which may be implemented later.

We are going to use DDS as the communications solution (discovery, pub/sub transport, serialization, and rpc once they ratify it). We believe that the current "don't wrap you main" style of API of ROS1 has value in certain situations. Using a baseclass (like Node) and a shared ownership or even automatic management of subscriptions, timers, and publishers (ST&P's) is more like the component API which we have discussed in the past.

Node node1("node1");
Node node2("node2"); // side effect, node2 is registered with the internal state

This is something we have actually tried to explicitly avoid. In the examples I have given there is only a very minimal amount of global state, e.g. rclcpp::init registers a signal handler and processes command line arguments for the whole process. But Nodes, executors, contexts, groups, ST&P's, none of them are tracked globally. I think this is a much cleaner design then what you are suggesting here, and cleaner than roscpp (ROS1). I'm not really concerned about making the fewest lines of code for someone who is creating multiple nodes in the same main function, and even then you just have to create an executor, add the nodes to it, and then spin the executor.

node1.createPublisher("chatter");
node2.createSubscriber("chatter",queueSize,subscriptionRate,on_message);

We discussed the behavior about these create_* functions a lot internally and we are somewhat in disagreement about how it should go. Currently we will be returning scoped, shared_ptr's to ST&P's for two reasons:

Subscribers in ROS1 are scoped
Publishers require handles anyways in order to call publish on them
- If publisher handles need to kept around to use them, then it is more symmetric (maybe less surprising to the user) to keep the subscriptions and timers around too

Now this line of reasoning has some issues, and I think there is a strong case to be made for having either optional (non-scoped) handles returned or named ST&P's. My biggest issue with having optional handles is that if you drop the handle then you cannot remove the ST&P's you created. Now if you need to remove them you can just keep them around, but most of the easy examples will not show the storage of the handle so it isn't as obvious to new users how this might be done.

Finally, we could have both, but there is also disagreement about whether or not that is a good idea, and how it would be implemented.

node1.publish(msg); // side-effect in on_message() being called for node1

To which topic is msg published? What if you have multiple topics with the same type? I think it makes much more sense to have a publisher object on which you can call publish, but maybe I'm missing something.

node1.sleep(); // the 2 node example only has timing requirement for node1, so node2.sleep() is skipped

What if node2 needed to "sleep" too? What if you had twenty nodes?

This is basically what an Executor does in the current system, the equivalent would be:

auto node1 = ....
auto node2 = ....

rclcpp::executors::SingleThreadedExecutor executor();
executor.add_node(node1);
executor.add_node(node2);
executor.spin();  // blocks until shutdown, use a Timer to call `publish(msg)`

Or if you need to interleave with other stuff:

while (flag) {
  executor.spin_some();
  // Do other stuff...
}

I'll leave discussing a Node as a baseclass for another time.

My personal vision is that the Node class can be used as it is in the ROS1 like examples I linked above or as a baseclass. The baseclass version, however, might be required for component model of node development, i.e. where the user builds a library and it is loaded, instantiated, and added to a common executor by a container of sorts.

For example, you could do a simple example in both the ROS1 like, procedural API:

#include <iostream>
#include <rclcpp/rclcpp.hpp>
#include <simple_msgs/String.h>

void chatterCallback(const simple_msgs::String::ConstPtr &msg)
{
  std::cout << "I heard: [" << msg->data << "]" << std::endl;
}

int main(int argc, char *argv[])
{
  rclcpp::init(argc, argv);
  auto node = rclcpp::Node::make_shared("listener");
  auto sub = node->create_subscription<simple_msgs::String>("chatter", 7, chatterCallback);
  rclcpp::spin(node);
  return 0;
}

As well as the component model style API (This API hasn't been designed yet and I'm really just imagining what it might be):

#include <iostream>
#include <rclcpp/rclcpp.hpp>
#include <simple_msgs/String.h>

namespace my_package {
class MyNode : public Node
{
public:
  MyNode() : Node("listener") {}

  void onInit()
  {
    this->sub_ = this->create_subscription<simple_msgs::String>(
      "chatter", 7, &MyNode::chatterCallback, this);
  } 

  void chatterCallback(const simple_msgs::String::ConstPtr &msg)
  {
    std::cout << "I heard: [" << msg->data << "]" << std::endl;
  }

private:
  rclcpp::Subscription::SharedPtr sub_;

};
}

RCLCPP_REGISTER_NODE(my_package::MyNode);

The latter would be made into a library and loaded by a "container", which might look something like this:

// Some includes...
namespace my_container_pkg {
class MyContainer : public Node
{
public:
  MyContainer()
  : Node("container"), executor_(rclcpp::executors::MultiThreadedExecutor::make_shared())
  {}

  void onInit()
  {
    add_node_by_name_service_ = this->create_service<some_pkg::AddNodeByName>(
      "add_node",
      &MyContainer::handle_add_node_by_name,
      this);
  }

  void spin() {executor_->spin();}

  void handle_add_node_by_name(const some_pkg::AddNodeByName::SharedPtr &srv)
  {
    auto node = load_node_by_srv_request(srv->request);
    node->onInit();
    nodes_[srv->request.node_name] = node;
    executor_->add_node(node);
  }

private:
  rclcpp::executors::Executor::SharedPtr executor_;
  rclcpp::Service::SharedPtr add_node_by_name_service_;
  std::map<std::string, rclcpp::Node::SharedPtr> nodes_;

};
}

int main(int argc, char **argv)
{
  my_container_pkg::MyContainer my_container;
  my_container.spin();
  return 0;
}

Obviously this example is trite in some ways, but it illustrates what I mean by the Node class (as an API) being the only interface you need. Then again, we may decide this is not the way to go and I'll have to eat my shoe. But regardless I don't think this part of the API really addresses what you were talking about in the comment just before this one.

Now there is a lot you could quibble about in the Node API, like if the node keeps track of the ST&P's, returning a shared ownership object or a handle or nothing. I think that we can iterate on that behavior as we go along. And as much as I dislike having "more than one way to do things", I think there might be value in having both managed and scoped versions of create_* ST&P functions. I think the most important thing is that we can create unique subscriptions and publishers which may have all the same settings, but that those unique instances each have their own queueing and access them after creation (either to remove them or mutate them). DDS allows this, and I think the issue in the above linked roscpp issue supports that conclusion as well.

BobDean · 2014-09-04T16:18:40Z

To a certain extent I am playing devils advocate, and questioning everything you guys put forward as a 3rd party. It will take me some time to learn how to ask a question in the most efficient manner. I think some disagreement occured as I tried to use your terminology mixed to explain concepts which differ from the current approach. I reread the cpp implementation, and re-read some stroustroup. Then re-read the cpp code.

Before I continue down the rabbit hole, here are some questions so I can direct my focus:

how will node to node topics be handled within a process? (the "nodelet" pipeline)
how will "rosbag" be implemented? (high level)
Has anyone done performance analysis of potential queueing scenarios in ros1 or with dds?
how will performance statistics of individual topics and nodes be gathered and distributed?
I see the string unit test for dds performance. Has a test been performed of an actual message? For example, a 25MP stereo pair?
"don't wrap you main" style of API of ROS1 has value in certain situations", this implies it is not the common case. If so, what is the common situation?

We use a component model, with some minimal global state, however each module/node can be executed in the "don't wrap main" environment. An executor manages the threads, a MessagingManager handles the client i/o for thread to thread comms and routing to the proper comms plugin (we can run ros1, dds, nml, and other simulataneously). So the major components are the same between what you and I are describing. The approach to how they are accessed differs. We have found that a component model greatly simplifies complexity, integration, and debugging through the use of common structure and life cycle management.

Perhaps part of this discussion belongs with a life cycle article on the design site.

wjwwood · 2014-09-04T22:50:03Z

To a certain extent I am playing devils advocate, and questioning everything you guys put forward as a 3rd party.

That's certainly useful and we appreciate you taking the time to have a look and ask questions.
I'll try to answer your questions as directly as possible:

how will node to node topics be handled within a process? (the "nodelet" pipeline)

This hasn't been fully designed yet. From an API perspective the current plan is to use a "Context" to control what parts of a process will use intra-process comms when possible. The Context would be an optional argument to the construction of a Node, where the default is a global default Context instance.

As for the actual implementation, we have talked about several ways, but the leading idea is to use DDS DataWriters to send messages which contain pointers to shared, in-process memory for the messages, i.e. not OS IPC shared memory. The benefit of this approach is that we can setup the DDS DataReaders and DataWriters which ferry these pointer messages exactly like the DDS DataReaders and DataWriters which ferry the actual wire messages between processes. If we made our own queueing system, with global queues of messages, pushed to by publisher.publish and drained into callbacks with a global thread, then we would need to emulate any QoS settings we expose to the user API. This includes things like queue size, reliable vs best effort, and any other QoS we eventually want to expose.

But, we only have a prototype of this and it will likely evolve as we start to try and implement this feature.

how will "rosbag" be implemented? (high level)

We haven't decided this either, there appears to be two general paths we can take. One way is that we store the serialized wire format of messages from DDS (which is a standard called CDR). This is how rosbag currently works by storing the serialized data form of the ROS1 messages. The other option is a different but common storage format which is not the same as the wire format. This has been done with ROS1 where the messages are received, deserialized, and then converted into JSON/BSON for storage in mongodb. There is also some work which stores ROS1 messages in hdf5 storage containers. There are obviously performance considerations with each of those lines, but there is also the opportunity to reuse existing, popular formats too. Like I said we haven't done the requisite ground work to make a decision on this yet, but we feel confident that a suitable solution exists.

Has anyone done performance analysis of potential queueing scenarios in ros1 or with dds?

I'm not sure I understand this question, which scenarios are you thinking about? We haven't done extensive performance analysis of DDS vs ROS, but I did go into some detail with the large message experiment:

osrf/ros_dds#2

I basically came to the conclusion that ROS1 would be faster than DDS in the case that you are sending large messages over the wire to other processes on the same machine. This basically boils down to DDS makes many sendto calls for each UDP packet and ROS1 makes one TCP sendto for each message. However, most of the DDS implementations work around this by providing DDS over shared memory for processes on the same machine. Some of them also support DDS over TCP, but then you don't get access to multicast UDP. So for DDS vendors which have shared memory I think it will be faster than ROS1 in most all cases. For use cases like /tf, DDS can take advantage of multicast UDP. We did some tests between two machines on a network and out-of-the-box DDS and ROS1 are on similar footing performance wise. For large messages DDS should use Reliable, which will resend lost UDP packets. Otherwise large messages will often loose one UDP packet and become invalid, but this is the drawback of UDP over TCP for large consistent messages. However, the big win with DDS is that if you spend some time tuning it, it can be much better in most scenarios. You can change things like the max UDP packet size if you know your hardware supports it, you can change the timeout on "connections" to behave better than TCP on unreliable networks.

So we haven't done extensive testing in tuning DDS, but we have seen enough examples of this with DDS that we are confident that it is possible. The challenge remaining is how to expose these in a reasonable way to ROS users. Most DDS implementations have a notion of profiles, where you can have groups of settings which work well in particular scenarios. These settings can be set globally in a config or programatically.

how will performance statistics of individual topics and nodes be gathered and distributed?

In indigo, nodes will collect and publish statistics about their topics on another topic. This will be how we do it for ROS2 as well. It remains to be seen how that will be exposed through the ros_middleware_interface, but the plan is to have each node aggregate it's own stats and publish them for introspection.

I see the string unit test for dds performance. Has a test been performed of an actual message? For example, a 25MP stereo pair?

It has not been, since we just got dynamic arrays work a few days ago, but we will be doing more of these "problem" scenarios in the future to track our performance.

"don't wrap you main" style of API of ROS1 has value in certain situations", this implies it is not the common case. If so, what is the common situation?

I think what we will recommend to users is the subclassing from Node approach. This will give them the flexibility to run their node in it's own process or with other nodes in a common process at launch time. But sometimes explicit control over threading and execution is needed, and should be supported. We want to have the component style API built on top of the simpler "don't wrap your main" style API.

The approach to how they are accessed differs. We have found that a component model greatly simplifies complexity, integration, and debugging through the use of common structure and life cycle management.

We agree, it also allows us to have verifiable systems, i.e. "Is the system up/ready?"

Perhaps part of this discussion belongs with a life cycle article on the design site.

I think that might be a good place, but even though the component model and life cycle are related, we believe they are actual two distinct features which are composable.

tfoote · 2015-03-02T22:12:08Z

Sorry wrong button

wjwwood · 2016-06-23T02:49:46Z

@BobDean I'm going to close this since I think most of the questions were addressed. We can continue the conversation or move it to the sig ng mailing list.

Asoragna/add events executor

Another name change; using Handler instead of the more passive term, …

That is, make sure to deal with inter-process publishing properly. This requires us to introduce two copies of do_intra_process_and_return_shared(), one of which deals with the ROS Message type and the other that deals with the PublishedType. This is kind of wasteful, and we may get rid of this later on, but this works for now. Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>

That is, make sure to deal with inter-process publishing properly. This requires us to introduce two copies of do_intra_process_and_return_shared(), one of which deals with the ROS Message type and the other that deals with the PublishedType. This is kind of wasteful, and we may get rid of this later on, but this works for now. Signed-off-by: Chris Lalancette <clalancette@openrobotics.org> Make sure to add the appropriate library directory. (ros2#3) Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>

code style only

Signed-off-by: Andrew Symington <andrew.c.symington@gmail.com>

* change publish tp def Signed-off-by: yamasaki <114902604+ymski@users.noreply.github.com> * added publisher_handle Signed-off-by: yamasaki <114902604+ymski@users.noreply.github.com> --------- Signed-off-by: yamasaki <114902604+ymski@users.noreply.github.com>

wjwwood added the question label Sep 4, 2014

wjwwood added backlog and removed question Further information is requested labels Feb 24, 2015

tfoote closed this as completed Mar 2, 2015

tfoote reopened this Mar 2, 2015

dirk-thomas removed the backlog label May 22, 2015

dirk-thomas added the question Further information is requested label Oct 13, 2015

jacquelinekay mentioned this issue Nov 30, 2015

Lock-free executor #77

Open

wjwwood closed this as completed Jun 23, 2016

sloretz mentioned this issue Dec 5, 2018

[rclcpp_action] Action client implementation #594

Merged

This was referenced Apr 15, 2019

Fixup timer #693

Closed

Fixup utilities #692

Merged

charlotteheggem mentioned this issue May 28, 2020

Rclcpp clock problems #1144

Closed

alsora pushed a commit to alsora/rclcpp that referenced this issue Oct 12, 2020

Merge pull request ros2#2 from alsora/asoragna/add-events-executor

295a337

Asoragna/add events executor

BaiCai1990 mentioned this issue Jan 23, 2021

rclcpp::spin_some(node) Occasionally crash. #1524

Closed

nuclearsandwich pushed a commit that referenced this issue Mar 5, 2021

Merge pull request #2 from mjeronimo/mjeronimo/address-review-feedback

3404ec9

Another name change; using Handler instead of the more passive term, …

nnmm pushed a commit to ApexAI/rclcpp that referenced this issue Jul 9, 2022

Merge pull request ros2#2 from ros2/code_style_uncrustify

a896297

code style only

asymingt added a commit to asymingt/rclcpp that referenced this issue Aug 2, 2022

CI nondeterminism test ros2#2

2fc53cf

Signed-off-by: Andrew Symington <andrew.c.symington@gmail.com>

ivanpauno pushed a commit to asymingt/rclcpp that referenced this issue Aug 26, 2022

CI nondeterminism test ros2#2

2c7fe5c

Signed-off-by: Andrew Symington <andrew.c.symington@gmail.com>

firesurfer mentioned this issue May 15, 2024

fix: Fixed race condition in action server between is_ready and take"… #2531

Merged

zxf101826 mentioned this issue Jun 24, 2024

Segment fault occurred when node was added to executor if the callback group variable created was local #2445

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

management of subscribers/publishers #2

management of subscribers/publishers #2

BobDean commented Sep 2, 2014

wjwwood commented Sep 4, 2014

wjwwood commented Sep 4, 2014

BobDean commented Sep 4, 2014

wjwwood commented Sep 4, 2014

BobDean commented Sep 4, 2014

wjwwood commented Sep 4, 2014

tfoote commented Mar 2, 2015

wjwwood commented Jun 23, 2016

management of subscribers/publishers #2

management of subscribers/publishers #2

Comments

BobDean commented Sep 2, 2014

wjwwood commented Sep 4, 2014

wjwwood commented Sep 4, 2014

BobDean commented Sep 4, 2014

wjwwood commented Sep 4, 2014

BobDean commented Sep 4, 2014

wjwwood commented Sep 4, 2014

tfoote commented Mar 2, 2015

wjwwood commented Jun 23, 2016