From 21dc8358a13242eff6bd0c15fe4dcd04040503a8 Mon Sep 17 00:00:00 2001
From: William Woodall <william@osrfoundation.org>
Date: Tue, 10 Dec 2013 16:03:16 -0800
Subject: [PATCH 1/2] data layer [graph] => communication graph

This change comes from a suggestion from
@dirk-thomas
---
 articles/discovery_and_negotiation.md | 58 +++++++++++++--------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/articles/discovery_and_negotiation.md b/articles/discovery_and_negotiation.md
index f403c699d..0a6aefc52 100644
--- a/articles/discovery_and_negotiation.md
+++ b/articles/discovery_and_negotiation.md
@@ -14,9 +14,9 @@ abstract:
 
 ROS systems tend to be implemented as a **computational graph**, where there are graph **node**'s connected by **topic**'s and **service**'s. The graph which models the computational nodes with their topics and services can be different from the graph which represents the physical groupings and connections which implement the behavior modeled in the topics and services graph.
 
-For the purposes of this document, the graph which defines the computational nodes and how they are connected using topics and services will be referred to as the **computational graph**. The graph which models processes around nodes and physical layers between nodes will be referred to as the **data layer graph**. A **node** is any addressable participant in the **computational graph**, it does not imply how nodes are organized into system processes, i.e. a node is neither necessarily a single process, nor does it necessarily share a process with other nodes.
+For the purposes of this document, the graph which defines the computational nodes and how they are connected using topics and services will be referred to as the **computational graph**. The graph which models processes around nodes and physical layers between nodes will be referred to as the **communication graph**. A **node** is any addressable participant in the **computational graph**, it does not imply how nodes are organized into system processes, i.e. a node is neither necessarily a single process, nor does it necessarily share a process with other nodes.
 
-To demonstrate the difference between the **computational graph** and the **data layer graph**, consider the following:
+To demonstrate the difference between the **computational graph** and the **communication graph**, consider the following:
 
 - A computational graph has nodes N1, N2, N3, and N4.
 - N1 is on machine M1 and in process P1.
@@ -30,7 +30,7 @@ For the purposes of this example, assume these computational nodes can optionall
 
 The above describes how the **computational graph** is organized from a conceptual point of view. The existence of a node, its host machine and host process are products of the graph's execution, which would be done by the user executing each in turn, or by a process management system. The topic publications and subscriptions are defined at runtime by the user code in each of the nodes, though for some use cases the topics might also have to be captured externally and statically.
 
-The above constraints do not at all describe the method by which messages over topics are delivered to various nodes of the graph, this is known as the **data layer graph**. It is the responsibility of some entity, or entities, to use this **computational graph** (and optionally the information about process and machine layout) to decide how the **data layer graph** should be implemented and then execute that implementation.
+The above constraints do not at all describe the method by which messages over topics are delivered to various nodes of the graph, this is known as the **communication graph**. It is the responsibility of some entity, or entities, to use this **computational graph** (and optionally the information about process and machine layout) to decide how the **communication graph** should be implemented and then execute that implementation.
 
 In the current ROS system, each node reports its address and configuration to a master process which coordinates the graph. The master is responsible for maintaining the graph state and notifying nodes of relevant changes to the graph, e.g. a new publisher of a topic which this node subscribes to was created. The nodes will then initiate connections to other nodes where appropriate, so in this sense the nodes are somewhat autonomous. One could argue that ROS should have multiple masters rather than sharing one master amongst multiple machines, or that there should be no master and all of the nodes should be completely autonomous. The point here is that there is no one-size fits all solution, therefore this paper will try to identify ways that these discovery and negotiation steps can be abstracted such that many different implementations may be supported.
 
@@ -40,19 +40,19 @@ For the proposed solution to this design space, this paper will build up the int
 
 ### Statically Configured Nodes
 
-The most basic use case is where each node is given a list of declarative instructions on how to connect to the data layer. These instructions would likely come in the form of url's, but would be of the notion "Connect to Topic `<topic>` of type `<msg_type>` via `<protocol>://<address>:<port>/` using `<transport>` and `<serialization>`". A more concrete example might be "Connect to Topic `/foo` of type `pkg/Foo` via `udpm://225.82.79.83:11311/` using `<zmq_pgm>` and `<protobuf>`".
+The most basic use case is where each node is given a list of declarative instructions on how to connect to the **communication graph**. These instructions would likely come in the form of url's, but would be of the notion "Connect to Topic `<topic>` of type `<msg_type>` via `<protocol>://<address>:<port>/` using `<transport>` and `<serialization>`". A more concrete example might be "Connect to Topic `/foo` of type `pkg/Foo` via `udpm://225.82.79.83:11311/` using `<zmq_pgm>` and `<protobuf>`".
 
 The basic building block that this requires is that nodes provide an API which allows something to instruct the node to establish some connection ("connect_to") and to map publishers and subscribers to a given connection ("map_to"). The program parsing the declarative instructions would just iterate over the instructions, calling this "connect_to" function for each url and then the "map_to" function on each publisher and subscriber.
 
 It should be noted that at this point, this "map_to" call should fail if the node is asked to do something which it has not previously set itself up to do, e.g. a node is asked to map a topic and subscriber to a connection for which it has not created a Subscriber instance. This constraint implies the need for a life cycle where any publishers and subscribers are instantiated by the node in one step and then connections are made in another step.
 
-Because the decisions about how to connect to the data layer are static, it also removes the possibility for dynamically creating publishers and subscribers on the fly because no entity will be watching for new publishers/subscribers and dynamically determining and executing data layer connections.
+Because the decisions about how to connect to the **communication graph** are static, it also removes the possibility for dynamically creating publishers and subscribers on the fly because no entity will be watching for new publishers/subscribers and dynamically determining and executing **communication graph** connections.
 
-Another point is that when nodes loose connection with each other on the data layer (temporary loss of network), the node implementation which calls "connect_to" would be responsible for issuing a new "connect_to" after the connection dies. This implies there should be away to introspect the node by polling it or by getting notifications about the state of the underlying connections which were created.
+Another point is that when nodes loose connection with each other on the **communication graph** layer (temporary loss of network), the node implementation which calls "connect_to" would be responsible for issuing a new "connect_to" after the connection dies. This implies there should be away to introspect the node by polling it or by getting notifications about the state of the underlying connections which were created.
 
 ### Statically Configured Graph
 
-The next more complicated system is one where each node starts and waits for an outside process to tell it how to make its connections to the data layer. In this scenario a central authority has the static configuration of all node addresses and knows how they should be connected to each other in the data layer. In order to execute this, the central authority needs to be able to call the previously described "connect_to" and "map_to" functions remotely.
+The next more complicated system is one where each node starts and waits for an outside process to tell it how to make its connections to the **communication graph**. In this scenario a central authority has the static configuration of all node addresses and knows how they should be connected to each other in the **communication graph**. In order to execute this, the central authority needs to be able to call the previously described "connect_to" and "map_to" functions remotely.
 
 This adds the necessity for a node to provide an RPC interface for the "connect_to" and "map_to" functions.
 
@@ -71,11 +71,11 @@ This is were the system features matrix forks. There are two glaring limitations
  - mapping_established/mapping_lost
  - etc...
 - The list of nodes and their addresses, publishers, subscribers, etc. are statically maintained, which prevents:
- - dynamically computing the data layer connections
+ - dynamically computing the **communication graph** connections
  - dynamically adding nodes
  - dynamically adding publishers and subscribers
 
-The first limitation is about having the ability to introspect the changes in the **data layer graph** so that the system can react dynamically to things like dropped connections. The second limitation is about having the ability to dynamically discover and manipulate the layout of the **computational graph** which might in turn change the **data layer graph**.
+The first limitation is about having the ability to introspect the changes in the **communication graph** so that the system can react dynamically to things like dropped connections. The second limitation is about having the ability to dynamically discover and manipulate the layout of the **computational graph** which might in turn change the **communication graph**.
 
 First this paper will look at how the system can be dynamically configured.
 
@@ -83,11 +83,11 @@ First this paper will look at how the system can be dynamically configured.
 
 One of the issues with the "Statically Configured Graph" system described above is that the topics and/or services each nodes provides or uses must be statically defined, either as part of the configuration for each node, or as part of the centralized authority's configuration. This does not allow for dynamically defined topic subscriptions and publications nor dynamically provided services.
 
-In order to enable these type of flexible or dynamic node configurations, each node must provided an externally accessible function for getting the configuration of itself. This would allow a central authority to periodically check each node for new topic subscriptions or publications and/or new service providers and dynamically change the **data layer graph** layout to reflect these changes.
+In order to enable these type of flexible or dynamic node configurations, each node must provided an externally accessible function for getting the configuration of itself. This would allow a central authority to periodically check each node for new topic subscriptions or publications and/or new service providers and dynamically change the **communication graph** layout to reflect these changes.
 
-This also allows for a system where the list of graph participants and their location is known, but the data layer graph is unknown and can be determined at runtime. This would allow a small additional flexibility over a completely statically configured system.
+This also allows for a system where the list of graph participants and their location is known, but the **communication graph** is unknown and can be determined at runtime. This would allow a small additional flexibility over a completely statically configured system.
 
-### Statically Configured Graph with Data Layer Events
+### Statically Configured Graph with Communication Graph Events
 
 The other issue with the "Statically Configured Graph" system above is that it cannot easily monitor the state of each of the graph participants and their connections because it would require some form of polling or point to point event systems. In order to better facilitate use cases where graph state would be maintained in a decentralized manner, point to point events should be avoided (at least conceptually), and instead participants in the graph should be able to send messages (events) to the "graph" notifying the rest of the graph participants of changes to their own state, and anything which wants to monitor the state of the graph should be able to maintain a consistent state of the graph by listening to messages sent to the graph by its participants. This assumes that nodes are the authority of their state in the graph.
 
@@ -95,11 +95,11 @@ This begins to outline the need for a graph interface, which allows a user to ma
 
 With a graph interface anyone, either a graph participant or an observer, can monitor the state of the graph and potentially react to changes in the graph. This allows for scenarios like a long running central authority which can setup the graph initially, and restore any connections when necessary at runtime.
 
-Along with data layer events, comes the notion of liveliness. When a connection is terminated for some reason an event should be sent to the graph, but often the reason for a disconnect will be that one end of the connection has dropped off the graph unexpectedly and therefore an event is unlikely to reach the graph. For this reason, it makes sense to include liveliness into the system when data layer events are added. Liveliness is not required, but could be added to any system which has a notion of the graph interface and is able to send and receive messages to the graph, these messages would be some form of heartbeat.
+Along with **communication graph** events, comes the notion of liveliness. When a connection is terminated for some reason an event should be sent to the graph, but often the reason for a disconnect will be that one end of the connection has dropped off the graph unexpectedly and therefore an event is unlikely to reach the graph. For this reason, it makes sense to include liveliness into the system when **communication graph** events are added. Liveliness is not required, but could be added to any system which has a notion of the graph interface and is able to send and receive messages to the graph, these messages would be some form of heartbeat.
 
-### Dynamically Configured Graph with Data Layer Events and Static Discovery
+### Dynamically Configured Graph with Communication Graph Events and Static Discovery
 
-This system simply adds the data layer events (connection established/lost, heartbeat, etc...) on top of the "Dynamically Configured Graph with Static Discovery". This system would be able to take a static set of nodes, with addresses, and dynamically detect their configuration, determine an appropriate data layer graph, and execute it. It would also be able to adapt to a change in the configuration of the node and adapt to data layer events, like temporarily lost connections, or connection state introspection.
+This system simply adds the **communication graph** events (connection established/lost, heartbeat, etc...) on top of the "Dynamically Configured Graph with Static Discovery". This system would be able to take a static set of nodes, with addresses, and dynamically detect their configuration, determine an appropriate **communication graph**, and execute it. It would also be able to adapt to a change in the configuration of the node and adapt to **communication graph** events, like temporarily lost connections, or connection state introspection.
 
 ### Dynamically Configured Graph with Dynamic Discovery
 
@@ -107,13 +107,13 @@ The obvious next step is a system where the participants of the graph and their
 
 Implementation of this system requires the notion of the graph interface, so that on node creation and termination, the node can send messages to the graph, notifying the rest of the graph of their participation in the graph or their leaving of the graph.
 
-This system does not have the data layer events described in previous systems, though it is likely that once a system is capable of dynamic discovery and dynamic configuration, then the data layer events will likely also be present.
+This system does not have the **communication graph** events described in previous systems, though it is likely that once a system is capable of dynamic discovery and dynamic configuration, then the **communication graph** events will likely also be present.
 
-### Dynamically Configured Graph with Data Layer Events and Dynamic Discovery
+### Dynamically Configured Graph with Communication Graph Events and Dynamic Discovery
 
-This is the most fully featured system covered in this paper, as it combines dynamic configuration of nodes (topics and services), dynamic discovery of nodes, and data layer events.
+This is the most fully featured system covered in this paper, as it combines dynamic configuration of nodes (topics and services), dynamic discovery of nodes, and **communication graph** events.
 
-This system is capable of supporting dynamic insertion and removal of nodes in the graph. Each of those nodes can dynamically change their configurations at will. One or more entities can monitor the state of the nodes and their **computational graph** layout, determine part of or a whole **data layer graph** layout, and execute the **data layer graph** layout. Further more these entities can get event driven notifications of changes to the nodes in the graph, changes to their computational graph connections amongst each other, or changes to their data layer connections.
+This system is capable of supporting dynamic insertion and removal of nodes in the graph. Each of those nodes can dynamically change their configurations at will. One or more entities can monitor the state of the nodes and their **computational graph** layout, determine part of or a whole **communication graph** layout, and execute the **communication graph** layout. Further more these entities can get event driven notifications of changes to the nodes in the graph, changes to their computational graph connections amongst each other, or changes to their **communication graph** connections.
 
 All of these capabilities together allows for complex systems which are capable of dynamic behavior.
 
@@ -122,7 +122,7 @@ All of these capabilities together allows for complex systems which are capable
 Below is a table summarizing the above mentioned use cases and what interfaces/features each of them need to be implemented. All of the systems in the table below require the basic local node API with the "connect_to" and "map_to" functions as well as basic connection introspection.
 
 <div class="table" markdown="1">
-System Name | Remote Node API | Node Configuration API | Data Layer Events | Dynamic Discovery | Requires Graph API
+System Name | Remote Node API | Node Configuration API | Communication Graph Events | Dynamic Discovery | Requires Graph API
 --- | --- | --- | --- | --- | ---
 Statically Configured Nodes | &nbsp; | &nbsp; | &nbsp; | &nbsp; | &nbsp;
 Statically Configured Graph | &#x2713; | &nbsp; | &nbsp; | &nbsp; | &nbsp;
@@ -155,7 +155,7 @@ Note that these interfaces are not necessarily what the end user would use, but
 
 The most basic interface used above is the Node API. It was briefly described as having "connect_to" and "map_to" functions along with basic connection introspection.
 
-The "connect_to" function is necessary in order to execute the data layer connections. This function might cause the node to connect to a remote TCP/IP server, setup a local TCP/IP server, join a UDP Multicast Group, set a shared memory block, or something else.
+The "connect_to" function is necessary in order to execute the **communication graph** connections. This function might cause the node to connect to a remote TCP/IP server, setup a local TCP/IP server, join a UDP Multicast Group, set a shared memory block, or something else.
 
 In order to better support non point-to-point communication transport types like UDP Multicast, the system needs to be able to differentiate between transport and topic, which is why the "map_to" function is described as a separate function. The topic information could be included in the "connect_to" call, but that would make some use cases more difficult. For example, if a system wanted to send multiple types of topics over a single connection, possibly a single TCP or UDP connection, then it would not really make sense to call the "connect_to" function for a topic whose connection is already established. Instead the proposed set of functions would prescribe that the "connect_to" function would be called once, returning a handle to the connection, and that handle would be reused in multiple calls to "map_to".
 
@@ -173,7 +173,7 @@ The format for the configurations should probably be some extensible format rath
 
 Several of the above systems described the need for a graph interface which would allow the graph participants to send and receive messages to the graph. This graph transport abstraction is the most basic interface required for the graph. On top of these send and receive graph message functions other functionality can be built. This design allows the graph interface to be layered.
 
-The first layer of the graph interface is the send and receive layer, which basically allows users to pass messages to and receive messages from the graph in an abstract sense. The method by which these messages are delivered is not something that the users of the interface should be concerned with. Systems can provide implementations of the graph transport to match their needs. This layer of the interface enables nodes to dynamically join the graph, dynamically provide configuration data, send data layer events, send heartbeat messages, send life cycle state changes, and potentially provide other information. In the case of a broker system each message a node sends to the graph will go directly to the master and the master will redistribute the messages to the appropriate nodes as it sees fit. In the case of a completely distributed, master-less system each message sent by a node could be broadcast to every other node and each node could filter the incoming messages as it sees fit.
+The first layer of the graph interface is the send and receive layer, which basically allows users to pass messages to and receive messages from the graph in an abstract sense. The method by which these messages are delivered is not something that the users of the interface should be concerned with. Systems can provide implementations of the graph transport to match their needs. This layer of the interface enables nodes to dynamically join the graph, dynamically provide configuration data, send **communication graph** events, send heartbeat messages, send life cycle state changes, and potentially provide other information. In the case of a broker system each message a node sends to the graph will go directly to the master and the master will redistribute the messages to the appropriate nodes as it sees fit. In the case of a completely distributed, master-less system each message sent by a node could be broadcast to every other node and each node could filter the incoming messages as it sees fit.
 
 The nodes which will be coordinating with each other will have to agree on a graph transport implementation a priori. Because there is no opportunity to negotiate the graph transport implementation, there must exist a simple "lingua franca" and the system wide graph transport implementation serves as that unifying language. It would still be possible to write programs which could serve to transparently bridge networks which used different graph transports.
 
@@ -183,15 +183,15 @@ With the graph event interface another interface which maintains the state of th
 
 ## Communication Negotiation
 
-The previous paragraphs have not discussed the negotiation of the communications at all. It has been described how nodes may provide their configurations dynamically, or the configurations might be captured statically and it has been described how nodes can be instructed to establish connections on the data layer either internally or externally, but not much has been said about determining the appropriate **data layer graph** layout based on the node's configurations and the machine/network topology. It is left to the implementor of the the negotiation system to use the configurations of the nodes and potentially other information to design and execute a **data layer graph**. This method of negotiation implies that when determining the **data layer graph**'s layout, all of the required information can be retrieved from the node, i.e. a node should be able to answer "what transports do you support?" through the node configuration API.
+The previous paragraphs have not discussed the negotiation of the communications at all. It has been described how nodes may provide their configurations dynamically, or the configurations might be captured statically and it has been described how nodes can be instructed to establish connections on the **communication graph** either internally or externally, but not much has been said about determining the appropriate **communication graph** layout based on the node's configurations and the machine/network topology. It is left to the implementor of the the negotiation system to use the configurations of the nodes and potentially other information to design and execute a **communication graph**. This method of negotiation implies that when determining the **communication graph**'s layout, all of the required information can be retrieved from the node, i.e. a node should be able to answer "what transports do you support?" through the node configuration API.
 
 What the above set of use cases does do is try to ensure that most conceivable negotiation systems could be implemented on top of these interfaces. To illustrate, this paper will describe some theoretical systems.
 
 ### Client-Server Master System with Point to Point TCP Data Only
 
-This system is very similar to the existing ROS system in that each node on startup contacts a centralized master. The node reports its existence to the master and notifies the master any time a new publisher or subscriber is created in the node. The master notifies nodes when a publisher exists for their subscribers and the nodes initiate a TCP connection to the publishing node. There is no point at which some more sophisticated **data layer graph** layout is chosen.
+This system is very similar to the existing ROS system in that each node on startup contacts a centralized master. The node reports its existence to the master and notifies the master any time a new publisher or subscriber is created in the node. The master notifies nodes when a publisher exists for their subscribers and the nodes initiate a TCP connection to the publishing node. There is no point at which some more sophisticated **communication graph** layout is chosen.
 
-In this system the graph transport implementation is a connection to the master for each node. The state of the graph is maintained in the master process only, and each node calls its own interface in order to execute the data layer connections between nodes. All events and configurations for each node are sent to the master and relayed to the correct nodes by the master.
+In this system the graph transport implementation is a connection to the master for each node. The state of the graph is maintained in the master process only, and each node calls its own interface in order to execute the **communication graph** connections between nodes. All events and configurations for each node are sent to the master and relayed to the correct nodes by the master.
 
 ### Distributed System with Intelligent Multicast
 
@@ -203,11 +203,11 @@ In this system the graph transport system might be a udp multicast group, with a
 
 There exist still some open questions surrounding Discovery and Negotiation.
 
-### User Hints to Data Layer Implementation
+### User Hints to Communication Graph Implementation
 
-One use case not addressed above is how to allow user code to hint or constrain the generation of the **data layer graph** layout. Ideally, a user could indicate that data on a certain topic is or is not suitable for unreliable transportation, but as it stands there is no direct way for a node to effect change on the data layer.
+One use case not addressed above is how to allow user code to hint or constrain the generation of the **communication graph** layout. Ideally, a user could indicate that data on a certain topic is or is not suitable for unreliable transportation, but as it stands there is no direct way for a node to effect change on the **communication graph**.
 
-One option might be to allow certain hints or constraints to be added to the node configuration, which the negotiation system could use to make more ideal decisions when generating the **data layer graph** layout. However, this implies that the **data layer graph** is determined at runtime and not static.
+One option might be to allow certain hints or constraints to be added to the node configuration, which the negotiation system could use to make more ideal decisions when generating the **communication graph** layout. However, this implies that the **communication graph** is determined at runtime and not static.
 
 Another option is not allow constraints at all, because there will always be the scenario where an end-users wants to reuse a node in a manner for which it was not originally designed and constraints which are not overridable would make the node less reusable.
 
@@ -219,7 +219,7 @@ What system should be used for serialization or graph messages?
 
 What transport should be used for graph messages?
 
-One option is just pick one of the available systems that are used by the data layer. Another option is pick a simple set which must always be available.
+One option is just pick one of the available systems that are used by the **communication graph**. Another option is pick a simple set which must always be available.
 
 ### Topic Tools
 

From 379b6fc0e5d81053ce1f5a7ae2645ac032fcb4e5 Mon Sep 17 00:00:00 2001
From: William Woodall <william@osrfoundation.org>
Date: Thu, 16 Jan 2014 18:08:52 -0800
Subject: [PATCH 2/2] fixup style

---
 articles/discovery_and_negotiation.md | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/articles/discovery_and_negotiation.md b/articles/discovery_and_negotiation.md
index 0a6aefc52..2531e5709 100644
--- a/articles/discovery_and_negotiation.md
+++ b/articles/discovery_and_negotiation.md
@@ -10,6 +10,10 @@ abstract:
 
 # {{ page.title }}
 
+## Abstract
+
+{{ page.abstract }}
+
 ## Problem Space
 
 ROS systems tend to be implemented as a **computational graph**, where there are graph **node**'s connected by **topic**'s and **service**'s. The graph which models the computational nodes with their topics and services can be different from the graph which represents the physical groupings and connections which implement the behavior modeled in the topics and services graph.
@@ -64,13 +68,16 @@ The main evolution of required functionality for this system over the previous o
 
 This is were the system features matrix forks. There are two glaring limitations of the previous use cases:
 
-- Nodes have no general way to notify the rest of the graph about events, events like:
+**Nodes have no general way to notify the rest of the graph about events, events like:**
+
  - node life cycle state changed
  - heartbeat
  - connection_established/connection_lost
  - mapping_established/mapping_lost
  - etc...
-- The list of nodes and their addresses, publishers, subscribers, etc. are statically maintained, which prevents:
+
+**The list of nodes and their addresses, publishers, subscribers, etc. are statically maintained, which prevents:**
+
  - dynamically computing the **communication graph** connections
  - dynamically adding nodes
  - dynamically adding publishers and subscribers