Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add design document for node name uniqueness #187

Open
wjwwood opened this issue Aug 21, 2018 · 12 comments
Open

add design document for node name uniqueness #187

wjwwood opened this issue Aug 21, 2018 · 12 comments
Labels
backlog enhancement New feature or request

Comments

@wjwwood
Copy link
Member

wjwwood commented Aug 21, 2018

Currently many design documents and features implicitly rely on node names being unique, but we have never sorted out the issues with how to enforce node name uniqueness in a distributed system and therefore we've never officially declared that node names need to be unique. We need a design document that states node names need to be unique (or not) with rationale why and if unique then how to enforce it and if not unique how to deal with existing features that assume and rely on unique node names.

@Myzhar
Copy link

Myzhar commented Feb 27, 2019

What about the fact that using composition the nodes in the same container get the same name?

For more information:
https://answers.ros.org/question/316870/ros2-composition-and-node-names-with-launch-files/

This is an example:

// /////////////////////////////////////////////////////////////////////////
//
// Copyright (c) 2018, STEREOLABS.
//
// All rights reserved.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// /////////////////////////////////////////////////////////////////////////

#include <stdlib.h>
#include <rclcpp/rclcpp.hpp>

#include "zed_component.hpp"
#include "zed_it_broadcaster.hpp"

#include <iostream>

int main(int argc, char* argv[]) {

    // force flush of the stdout buffer.
    // this ensures a correct sync of all prints
    // even when executed simultaneously within the launch file.
    setvbuf(stdout, NULL, _IONBF, BUFSIZ);

    rclcpp::init(argc, argv);
    rclcpp::executors::MultiThreadedExecutor exe;

    // namespace: zed - node_name: zed_node - intra-process communication: true
    std::string defNamespace = "zed";
    std::string defNodeName = "zed_node";
    bool intraProcComm = false;

    // ZED main component
    auto lc_node = std::make_shared<stereolabs::ZedCameraComponent>(defNodeName, defNamespace, intraProcComm);
    exe.add_node(lc_node->get_node_base_interface());

    // ZED Image Transport broadcaster
    // Note: this is required since `image_transport` stack in ROS Crystal Clemmys does not support
    //       Lifecycle nodes. The component subscribes to image and depth topics from the main component
    //       and re-publish them using `image_transport`
    auto it_node = std::make_shared<stereolabs::ZedItBroadcaster>(defNodeName + "_it", defNamespace, intraProcComm);
    exe.add_node(it_node->get_node_base_interface());

    exe.spin();

    rclcpp::shutdown();

    return 0;
}

Both the node gets the zed_node as set in the launch file:

zed_node = LifecycleNode(
        node_namespace = 'zed',        # must match the namespace in config -> YAML
        node_name = 'zed_node',        # must match the node name in config -> YAML
        package = 'stereolabs_zed',
        node_executable = 'zed_wrapper_node',
        output = 'screen',
        parameters = [
            config_common,  # Common parameters
            config_camera,  # Camera related parameters
        ]
    )

This is the result of the node list CLI command:

$ ros2 node list
/launch_ros
/zed/zed_node
/zed/zed_node
/zed/zed_node
/zed_state_publisher

@wjwwood
Copy link
Member Author

wjwwood commented Feb 27, 2019

There's no reason that you have to end up with nodes with the same fully qualified node name when doing composition. You can remap the node name and/or namespace when you construct it or if you launch it with launch.

The warning you're getting is letting you know that the logging feature doesn't work correctly if you don't make your node names unique. In the future we may enforce this some other way, or we may allow duplicate node names, that's what the design doc described in this issue would decide.

I would bet on node names needing to be unique, for two reasons, first there are some features we have implemented right now that assume it (like the logging) and for some of those I cannot think of a way to do it with non-unique node names in an async distributed system (no central master). Second, having unique node names is really nice for introspection, like if you get a console "[zed.zed_node] there was a problem", which node did that come from?

@thomas-moulard
Copy link
Contributor

Many systems use a two layer approach: a long ID for uniqueness purposes and a human readable label. E.g. IP vs host name, DDS Participant GUID vs DDS Participant Name.

Considering many use cases for non-unique names match what has been solved in the web server world (load balancing, redundancy...). I was wondering if this experience could be leveraged.

This change will also impact SROS as the X.509 certificate DN must match the participant name. However, I don't understand the DDS Security spec well enough to completely understand if node name uniqueness would be an improvement or not.

I guess the cons of node name uniqueness are:

  • hard to ensure globally unique node names for human readable strings
  • prevent launching e.g. 3 times the same node with the same name for reliability reason.

Are there any other use cases where having non-unique node names are useful?

@Myzhar
Copy link

Myzhar commented Feb 27, 2019

@wjwwood I can set different names creating the two classes and it works if I use "ros2 run", but in the launch file I must specify a node name (it is required for lifecycle nodes) that overwrites the names specified in the code

@wjwwood
Copy link
Member Author

wjwwood commented Feb 28, 2019

@wjwwood I can set different names creating the two classes and it works if I use "ros2 run", but in the launch file I must specify a node name (it is required for lifecycle nodes) that overwrites the names specified in the code

Yeah, those actions in launch cannot support a process with more than one node in it at a time. As I said in my answer to your question on answers.ros.org, actions to handle that case were considered, but not implemented yet, see also:


Many systems use a two layer approach: a long ID for uniqueness purposes and a human readable label. E.g. IP vs host name, DDS Participant GUID vs DDS Participant Name.

That's true, but for IP vs host name there's an important difference in that the IP address's uniqueness is made by a central authority (well sort of, in practical terms it is anyways), which isn't possible in ROS 2 since it's discovery is distributed. I'd actually say the IP no different than the Node name. Host names need to be unique as well, by the way.

For the DDS participant name, that's actually an extension, and not part of the DDS standard, see:

ros2/ros2#438

The DDS Participant GUID is something we could expose/use, but it's not something that is trivial to create, and would be a considerable burden for other rmw implementations which aren't based on RTPS, which is a secondary use case, but still something people are already doing.

Considering many use cases for non-unique names match what has been solved in the web server world (load balancing, redundancy...). I was wondering if this experience could be leveraged.

But they don't achieve this by non-unique IP or host name, instead they have proxy's. They can play tricks with DNS, but the result is never that I ask for google.com and get more than one IP back, which would be the case if I asked for the unique id of /zed/zed_node from the example above (there would be three results).

Instead, an equivalent setup in ROS (in my opinion) would either be a node which you call the service on and then relays the service call to one of a couple backends (each with their own name) or we could make service names namespaced by node name (which was a proposal we didn't follow through on) so that you'd do something like this:

node->wait_for_service('/do_something');
auto service_providers = node->get_service_servers('/do_something');
// service_provides -> ['/backup_node1/do_something', '/backup_node2/do_something']
node->create_service(
  service_providers[0],  // arbitrarily select the first, could be any logic here to select one
  ...);

And the above could be the implementation of a service server, maybe something like /feature_extractor/do_something, where feature_extractor is actually just a load balancing node.

For topics, it's easier because the node name actually just doesn't matter if you're doing load balancing or high availability. For high availability you just need to set the priority QoS (which we currently don't expose, but we could) or push the topic selection code into the user's application (maybe a library call).

This change will also impact SROS as the X.509 certificate DN must match the participant name. However, I don't understand the DDS Security spec well enough to completely understand if node name uniqueness would be an improvement or not.

I don't have any knowledge about that, but input from the security folks would be essential for this design document. Though I have to imagine that the configuration files (at least) would be a lot simpler if the node names were required to be unique.

* hard to ensure globally unique node names for human readable strings

Yeah, we'd basically need to expose the unique id everywhere (at least optionally) otherwise things like logs will become a nightmare if you're running lots of instances of reusable nodes. Basically even if we supported non-unique node names in the end, I'd still recommend having unique node names in your system for your own sanity's sake.

* prevent launching e.g. 3 times the same node with the same name for reliability reason.

You just need to create them with different names, which we need to make easier but still should be easy at some point, because as I said above in most cases the node name doesn't affect your ability to have backups for reliability.

Are there any other use cases where having non-unique node names are useful?

Personally, I have a hard time coming up with scenarios where non-unique node names is required and I can only come up with a few cases where it might be slightly more convenient to have duplicate node names. Most of the time it's just making the system more confusing.

The main reason I didn't just make it the case from the start of rclcpp is that enforcing it and what to do when node names collide isn't straightforward. That's the other half of this design doc: "if we decide to have unique node names, how and where do we enforce it?"

@gbiggs
Copy link
Member

gbiggs commented Feb 28, 2019

If we decide to adopt the dual approach proposed by @thomas-moulard, then giving every node a UUID, with the option of manually specifying it and potentially using a different type of label for systems that need that determinism, seems reasonable to me.

@Myzhar
Copy link

Myzhar commented Feb 28, 2019

I agree that having unique name for each node is the right way to take, for all the reasons you reported.
Here the problem is that multiple nodes in the same process are not correctly handled when using the launch system. Maybe that the process creation should handle the fact that it contains more than one node and that it must keep valid name coherency.
The dual approach proposed by @thomas-moulard seems interesting, but UUID and human readable label must be unique again.

@gavanderhoorn
Copy link
Contributor

@wjwwood wrote:

@thomas-moulard wrote:

Considering many use cases for non-unique names match what has been solved in the web server world (load balancing, redundancy...). I was wondering if this experience could be leveraged.

But they don't achieve this by non-unique IP or host name, instead they have proxy's. They can play tricks with DNS, but the result is never that I ask for google.com and get more than one IP back, which would be the case if I asked for the unique id of /zed/zed_node from the example above (there would be three results).

Just an observation: we could still classify it as a "trick", but round robin dns actually works exactly like that:

user@host:~$ dig google.com

; <<>> DiG 9.10.3-P4-Ubuntu <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7830
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 5

[..]

;; ANSWER SECTION:
google.com.             15      IN      A       108.177.126.113
google.com.             15      IN      A       108.177.126.100
google.com.             15      IN      A       108.177.126.102
google.com.             15      IN      A       108.177.126.138
google.com.             15      IN      A       108.177.126.139
google.com.             15      IN      A       108.177.126.101

[..]

poor mans failover can be implemented like this (although not as efficient as with more explicit systems/implementations).

@dirk-thomas
Copy link
Member

dirk-thomas commented Dec 16, 2019

I would like to get back to the second part of Williams question:

That's the other half of this design doc: "if we decide to have unique node names, how and where do we enforce it?"

Some subsystems can probably more easily ensure that node names are unique. E.g. when launch starts N nodes it can check among those easily.

But how about scenarios where node with the same name discover each other later asynchronously. E.g. two robots roaming around and when they first time join the same WiFi network discover each other and happen to have duplicate node names.

What behaviors are we expecting in these situations? Logging a message, stopping either / all of the non-unique nodes, other options?

@wjwwood
Copy link
Member Author

wjwwood commented Jan 15, 2020

Just an observation: we could still classify it as a "trick", but round robin dns actually works exactly like that:

Yeah I thought when I was writing it that might be the case, but flawed as my analogy was I still think the IP address as host is a decent analogy. I suppose we could have a two tier system, where there are node names and node addresses (like IP address, but maybe GID or something) and a name look up system (to translate from node name to node address), but I don't really believe we need that.


What behaviors are we expecting in these situations? Logging a message, stopping either / all of the non-unique nodes, other options?

In my opinion, we should let that be configurable, as all of those cases might be appropriate.

In ROS 1 the older node would automatically shutdown, allowing you to replace just one node in a system by simply running a new instance of it (useful for development in some cases), but I think that would be surprising behavior for most people.

For a default (non-production) behavior I think having the newer node raise an exception makes sense. As that's the program output the developer is most likely to be watching when the collision is detected. In both cases, i.e. old closes or new closes, a warning could be logged on the console logging of the node that didn't shutdown for visibility. In the case of a tie (timestamp of node creation), which is unlikely unless manufactured, we can have some resolution (it's a common problem in shared decision making systems).

For production systems, I expect neither of the above strategies will be considered safe to allow. So most likely, ignore with a log message and/or have a callback to handle it programmatically is the right approach (this can be used to implement the other cases). I think this is similar to something like how in iOS you can get a callback just before your application is killed due to out-of-memory. It gives you an opportunity handle it yourself, but by default will have some reasonable behavior.

@Kaju-Bubanja
Copy link

Kaju-Bubanja commented Aug 25, 2021

Have there been any internal discussions in the ros team about this design document?
Adding my two cents I would favor node names being unique. Nodes need some form of unique ID to be able to talk to each other(at least in my mind) and regarding the load balancing case, that seems like it can be done too with unique IDs.

@RobotBramhana
Copy link

I suggest to have an option for enabling and disabling node name uniqueness through an environment variable. Thereby providing the downstream users to just turn on or turn off without much changes to their code. What are your thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants