In this section, we illustrate the smarter part of the network, the part that leverages the overlay functionalities to serve the other modules and ultimately the chain.
In specific terms, this section deals with communication models (or specialized communication scenarios) and the awareness models the network can make use of (aware as in aware of the state it needs to perform this specialized communication) and subsequently the architectural decisions they might engender.
Peer Logic is a everything peer specific in the context of an application that is a decentralized peer to peer one. Meaning, whatever behavior is expected to be performed from the perspective of a single peer in relation to others, specifically:
- Communication models and protocols
- Stateful peers/neighbors selection, ranking and ordering
- Peer roles
First, we need to discuss a few things a peer needs to do in the network.
We are interested in the two main state-touching scenarios:
- Proposing
It has become apparent to us early on that the other layers (like consensus for instance) might have specific communication requirements --either for security or performance purposes,
For instance, in Consensus, the proposer is required to directly communicate with the validators. Thus, it is incumbent on the peer-to-peer layer as a module to offer such functionality while also factoring it into the sum of all parts.
- Syncing
When a node is trying to sync the blockchain, we would like the network to be smart enough to know what other peers are the best candidates to sync from and also do so rather efficiently. Since discovering such network state can be expensive to do in a cold fashion (specially at scale), we hope to describe methods by which the peer to peer module will ensure maximum efficiency in doing so.
The star communication model is as the name implies, a one central nucleus who is trying to communicate with a multitude of edges.
A visual illustration is present here and is as follow:
The goal of such communication model is to directly communicate with a list of peers while maintaining connection with them to perform whatever acknowledgment behavior or request/response behavior.
At this point, we envisage that this feature will be primarily used by the proposer-to-validators relationship, but could potentially used in the future for other purposes.
The direct communication model is specified as follows:
- The nucleus peer expects to have its edge peers addresses in a list, either produced from its routing table or retrieved from the chain or some other party*.
- The edge peers are not required to be direct neighbors of the peer (i.e: in the routing table of the nucleus peer).
- The nucleus peer sequentially and linearly communicates with the nodes in the provided edge peers addresses list [a]
- Order is irrelevant in this equation
- Direct communication will have a separate channel(s) and will not be bound to maximum connections and connection pooling constraints. (refer to the Transport Protocol & Security Section for more information)
- Direct communication messages will specify the
Direct
communication type in the messages header. (Check Messages In The Overlay section for more information)
[a]: Better ways of establishing direct communication from a singular nucleus towards a multitude of edges are also feasible, such as broadcast and multicast, however, for security and performance purposes, we will not cover this in this specification. In case there is a serious interest in exploring such avenues, a broadcast model could be possible if some underlying implementation allows it over the internet, say sockets for instance or some special router nodes to act as channels for such broadcast.
A Passaround Communication model is a progressive multicast happening and propagating from peer to peer until the entire network is covered.
This model is similar to the direct communication model in as far as receiving acknowledgements of delivery is required, however is different in the fact that multi-unicasts don't stop at one star pattern but span over every other similar tree/star in the network, gaining the multicast property.
A visual illustration of this sequential process is present here and is as follow:
Passaround initiated by a head node
Passaround initiated by a child node
If we are to aggregate all peers and their rings in one network ring, it would be very similar to chord and as follows:
Except that the jump will not be logarithmic but constant.
(Refer to Routing Structure And Algorithm section to learn more about the ring structure and how can this communication model be achieved)
The Passaround Communication model can be specified as follows:
A peer in a Gemini overlay is organized and ordered in within a number of affinity groups and a number of pointer groups.
A peer Y is concerned with passing around a message to its affinity group and delegating this task to other affinity groups so that the entire network receives this message.
To achieve this, we split every affinity group into k intervals (with k=√n, and n peers in the group as best parameter), and designate the first peer in each of those intervals as the interval's head.
From then on, a message is multicasted within an affinity group by first disseminating this message amongst interval heads in its group, each head goes on to send this message to each child it is responsible for. To ensure that this message is properly disseminated to the entire network, one of the interval heads makes sure to also delegate this message to other affinity groups through its pointers groups.
For a given peer Y to initiate a Passing Round, it performs the following:
-
Peer Y sends a Passaround Message to its immediate head node in its affinity group(s)
1.1. Peer Y must receive an acknowledgment of receipt from its head node.
1.b. In case the immediate head of peer Y fails to acknowledge receipt within a retry window, peer Y marks this peer as a skipped peer in the message's header and passes it to successor of the failed peer as the new head.
1.c. When a head peer receives a Passaround Message, it acknowledges the receipt of the message and passes the message to the other heads in the interval successor in the same fashion.
1.d If a head peer fails to acknowledge the receipt of a message, its successor is taken as a new head.
1.e If peer Y is a head, then peer Y is its own immediate head.
-
The head node of Peer Y then sends a Delegated Passaround Message to all different affinity group(s) that exist in its pointers group by picking peers with unique prefixes (within the same dimension) [a], marking each peer as the Next Head.
2.a. If a peer in the pointer group(s) fails to acknowledge receipt of the Delegated Passaround Message, peer Y will try to send the same message to a different peer that has the same affinity group as the failed one. If none found, that peer is simply skipped in that Delegation round.
2.b. A peer receiving a Delegated Passaround Message will simply initiate a Passing Round in its own affinity group(s) as described in step 1.
2.c. A head peer will try to delegate this message to the other non-covered affinity groups through its own affinity group(s) by hand-picking a peer that has a different pointer groups (suffix) and sending a [Forwarded Delegated Passaround Message] and marking it as a Next Delegator
2.d. A peer receiving a Forwarded Delegated Passaround Message will not initiate a Passing Around in its affinity group(s), but will try to initiate a Delegation Round to other affinity groups through its pointers group(s) that were not covered yet. In case it fails, it will try to initiate a Delegation Forwarding Round in the same way described in step (2.c)
-
If a peer receiving a Passaround Message or Delegated Passaround Message or Forwarded Delegated Passaround message has seen this message MaxSeenTimes, it will not pass it around nor delegate it.
-
The next delegator will also pick a new next delegator, until no next delegator is able to find new affinity groups to delegate to.
-
The next delegator can ignore the fact that everybody has already received (if it runs out of new affinity groups to route to) the message and go for for extra delegation rounds if MaxRedundancy is not reached yet.
The Passaround Algorithm is as follows:
As copiable MD version is as follows
This algorithm should is given as an example for a gemini dimension equals gd=1
.
To extend it to gd=n
, simply Pass, Delegate and Forward Delegation in each dimension's Affinity group, and increase MaxSeenTimes to account for the number of dimensions.
Redundancy estimation and cost:
In the current way this is setup, a given affinity group will not receive a message more than once due to the mere fact that we explicitly track impacted affinity group to not send them messages again.
To achieve some level of redundancy, we will simply go for a new round of multicast for a given message, once the first round finishes, and to achieve this, we will simply reset the impact affinity groups tracker as if we have not reached any affinity group yet and basically restart the entire round from scratch.
In order to achieve network awareness (learning more about the peers' states and organizing them according to it), we opt to go with a leader-based approach to optimize in terms of bandwidth and complexity, as learning about neighboring node's states and propagating such information will be costly, and at all cases will be cheaper if taken care of by a singular node per group.
1. Chain Awareness
Primarily, we are interested in other peers current heights, so that we facilitate processes such as blockchain syncing. For that, we will primarily implement:
A. A leader-based approach
B. A push-based communication model
1.1 Chain-Aware Leader
A Chain-aware leader of a group is basically the successor of the media point of a group. This leader is elected simply by convention and relies on the fact that all peers in a group are interconnected and have up-to-date routing states.
This leader is easily elected (or rather is beforehand known by convention) due to the cheer dumb luck of being the numerical center of its group (address space interval)
The three nodes to follow the leader, which we will denote Chain-ware replicas will gradually replicate the awareness state of the leader to ensure quick leaders replacement in case of outages.
The Chain-aware leader will act as a central entity for its group and provide requesting nodes with destination address for any height they request or would like to sync from.
The Chain-aware leader indexes peer addresses by the height they are at. To avoid indexing the entire network, the Chain-aware leader indexes its entire group per heights, and keeps an index of other groups however with only 3 addresses per height.
The Chain-aware leader keeps this index up to date by:
- Relying on peers in its affinity group to push to their heights.
- Relying on other Chain-aware leaders from other groups to exchange their groups indexes.
Read the next chapter for information
A Chain-aware leader will only keep a certain range of heights in its awareness-index.
1.2 Push-based awareness-state communication
Every time a peer reaches a given height, it will inform the Chain-Aware Leader of its group and the _Chain-Aware Replicas as well. This approach is very straightforward and requires very little overhead in terms of management.
Chain Aware Leaders ensure that they are all up to date by forwarding Chain-Awareness Sync Request to other leaders in other groups using the Targeted Passaround communication model, which every leader considers by keeping only the updated heights received in this request.
1.3 Chain-Awareness Model
The Chain-Aware Leader and Chain-Aware Replicas will maintain a simple chain-awareness structure primarily ordered by a key chain state data point, in our case, chain height.
We would like to keep the data structure choice undecided as to keep innovation space for incoming implementations, however we would like to constraint such structure with the following requirements:
- Read time should be constant: O(1)
- Write time should be at worst logarithmic: O(log(N)
- Entries should be ordered by a key chain state data point. (aka height or other)
This structure will be heavily read from by group peers that would like to learn about what peers to sync from.
The write on the other hand, or Chain-Awareness sync will be periodic and is accepting of a relatively "worse" complexity.
// TBD
// TBD
The Seed role is a primary network role and has as a direct responsibility:
-
Keeping tracking of joining peers 1.a Keep track of their count 1.b Keep track of their addresses 1.c Keep periodically track of their aliveness but on a low frequency.
-
Provide joining peers with routing state bootstrap parameters 2.a Provide the Gemini dimension in function of the network phase. 2.b Provide the Gemini Parameters in function of the network phase 2.c After having decided on the parameters, provide the joining peer with at least on member per routing state dimension to ensure the continutation of the joining process (member list exchange and membership)
-
Trigger network phase change event when all conditions are met and coordinate peers routing state update in function of the phase. // To be continued
// Express a full scenario of how the direct communication model will be used.
// Express a full scenario of how a peer will talk to its affinity group(s) chain-aware leader and request sync information for specific heights and what happens the chain-aware leader is unable to do so. (We will technically fallback on a passaround message to ask everybody for that specific height, and everyone with that height will use Targeted Passaround to send their address to the requesting node for a direct sync to start)