Skip to content

Peer Sharing Implementation Plan

Armando Santos edited this page Sep 18, 2023 · 5 revisions

DISCLAIMER: This document is a work in progress, discussion on the document takes place in the issue #3940

Table of Contents

Introduction

Cardano nodes and the interactions between them are combined together within a networking layer, which distributes information about transactions and block creation among all active nodes. This is known as the diffusion layer. This is how the system carries out the Ouroboros family of protocols, specifically by diffusing, validating, adding new blocks to the chain, as well as verifying transactions. Any such network of nodes must be resilient enough to cope with connectivity and node failures, adapt to capacity restrictions while seeking to minimize communication delays. In the Shelly network design, two separate flavors of connections may be identified:

  • Upstream nodes provide blocks minted elsewhere in the network, by actively following the chain on those nodes;
  • Downstream nodes receive blocks that are relayed from upstream nodes and those that are minted locally, by actively following the chain on this node.

Note, nodes pull information from other nodes typically by placing an outstanding request against the next piece of information. This ensures that the node has control over the amount of work it can be required to do at any one time.

It is deemed a protocol violation to forward invalid blocks. Therefore, there is a need to validate received blocks before forwarding them, which is a resource-intensive operation. All people following the chain need to have a copy of the produced blocks. Only Stake Pool Operators (SPOs) generate blocks. There is a large asymmetry between block producers (few thousand) and block consumers (hundred of thousands to millions).

To meet both the scale and the timeliness of distribution, there needs to be a large fan-out in the direction of block producers to block consumers. It is envisioned that a typical node might have 10 to 20 upstream peers as well as 50 to a 100 downstream peers.

The network topology is established iteratively by some node A requesting to become a downstream node for some other node B. This raises the question of how node A knows the address of node B in order to initiate the connection. There are three possible ways:

  1. By manual configuration, to ensure connectivity to designated nodes;
  2. By sampling from DNS names/addresses recorded on the blockchain;
  3. By sampling from addresses obtained from other nodes at runtime (peer sharing).

This document is about enhancing the above process, by replacing the existing peer sharing approach with a more scalable lightweight solution. This approach, when combined with eclipse evasion, provides for a scalable network while containing the operational load on SPO peers recorded in the blockchain.

Context

Nodes in the Byron federated system were connected by a static configuration provided in a topology file. Since Shelley was introduced, the system has been operating in a hybrid state. In other words, SPO nodes can communicate with both federated relay nodes and SPO-run relay nodes. Although this connectivity is not automated, it allows for the exchange of block and transaction information without the need of federated nodes.

If only major stakeholder nodes (whose numbers are limited by economic incentives) can be upstream peers, the network's scalability could be constrained. There is clearly a limit to how many downstream peers any relay can handle, even though serving blocks to downstream peers is substantially less expensive than confirming blocks obtained from upstream peers. Network capacity can be boosted and the load lighten on SPO's relays by permitting automated connections between SPO relays and allowing non-stake-holding nodes to take part in block forwarding.

Current Situation

Currently, the high-level architecture of P2P is made up of four major components: the Connection Manager, the Peer Selection Governor, the Inbound Governor, and the Server. These components collaborate to control each node's outbound and inbound connections, ensuring optimal network and safety properties, resource utilization, and efficiency.

The Peer Selection Governor (P2P Governor), which is also tied with the Connection Manager, handles the automatic establishment of connections to peers, as well as monitoring and running mini-protocols as needed. It is in charge of outbound peer connection management; it determines which peers are useful for connecting to and which should be promoted or demoted. The primary goal of the P2P governor is to manage outbound connections, ensuring that the target number of cold, warm, and hot peers is met. Thus, building and maintaining a globally connected topology.

Cold peers are known but have no active outbound connection; Warm peers have an active connection (bearer) but are solely used for network measurements and not for any application level consensus protocol; and Hot peers are actively used for application level consensus protocols. These sets of peers also satisfy some other implicit purposes, such as warm peers serving as a churn set for hot peers, allowing potentially better warm peers to take over from existing hot ones, or maintaining a diversity in hop distances to aid recovery from network events that may disrupt normal network operation. Sources for these Cold, Warm and Hot peer sets come from promoting/demoting the so called Root Peers, which can be separated into two groups: Local Peers and Public Peers. Public Peers consist of both manually configured addresses and/or ledger peers. Promoting/Demoting Root Peers establishes the Known, Established and Active peer sets. More details in the image below:

Peer Discovery on Cardano

All these sets ought to have targets and/or policies that the P2P Governor seeks to maintain. Targets and policies serve multiple purposes such as resource management and making sure the node can make progress towards an optimum configuration as well as safeguard the node against adversarial behavior.

As mentioned in the first section, currently, there are two ways a node can learn about other peers. When a node starts, it will look into the topology file referenced from the local configuration for root peers, i.e. either public peers, coming from high veracity sources like IOHK relays or local peers which represent peers of specific significance for this node. The existing default is that the node will only use these manually configured source of peers. Alternatively it can be configured to get peers from the ledger as well.

The P2P Governor will try to maintain the target numbers for each given set, which means it will try to: fetch more Known peers; promote a given Cold peer to Warm, if it can't fulfil its targets it will retry after some delay.

More details can be found in the Shelley Network Design document, however more relevant details will be added in this section as needed.

Goal and Caveats

The aim of the Peer Sharing protocol is to facilitate the discovery of potential peers within the overall Cardano network. There is a requesting side and replying side to this process. The requesting side communicates with its Established Peers, requesting a number of addresses from the remote peer's Known Peers set. New addresses are added to the local Known Peer Set (specifically as Cold Peers). On the replying side a peer responds to a request by supplying addresses from its Known Peer set, to which it has previously established a successful connection.

Caveats associated with address sharing

  1. A peer has to be willing to share (as indicated in handshake)
  2. Manually configured addresses can be optionally shared (as recorded in configuration files)
  3. Learnt addresses that are obviously from ledger peers will not be shared (i.e. as derived from the chain)

Caveats associated with operation

This Peer Sharing process is designed to work in conjunction with Ledger Peers from the chain. There is no assumption that the Peer Sharing process provides a robust defense against sibyl/eclipse attacks. Resistance to such attacks is derived from a connection to Ledger Peers. Consequently the P2P Governor will have a target number of Ledger Peers to maintain contact with. The plan regarding Eclipse Evasion is going to be detailed in the Eclipse Evasion documented that was referenced above.

Plan and High Level Design

Things to consider

  • How to integrate the Peer Sharing into the Governor operation?

    • Use the existing Peer Selection Governor or have separate structure
    • Design MiniProtocol state machine
      • Is simple Request-Reply enough
    • Design MiniProtocol implementation
      • Should request triggered by the Peer Selection Governor if not how?
      • How should responses be filtered?
  • Which peers do we ask to

    • Is asking only upstream peers sufficient?
    • Should we ask Cold peers?
    • Should we ask Established peers?
    • More ?
  • How is the reply to a share request calculated

    • How to identity peers to share?
      • Should we verify they are/were contactable/online?
      • Should we know about the peer's server hard limit?
    • Should they be picked at random?
    • Should we let others know about adversarial nodes too?
    • More ?
  • Node handling of shared information

    • Should we have targets for shared Peers
    • In what context does it make sense to perform Peer Sharing (i.e. while bootstrapping, syncing, caught up, all the time)?
    • Should any type of node not perform Peer Sharing (BP, Relay, Wallet, etc..)?
    • Should we churn shared peers?
    • Should we have a target for hot shared peers?
    • More ?

In essence there are 3 phases to Peer Sharing:

  1. Asking (requesting) peers
  2. Sharing (replying) peers
  3. Receiving and handling the shared peer response set

Discussion

The ideal method appears to be to create a unique GitHub issue for each question, so people can discuss it and further develop the strategy in a transparent, open-source manner. This Wiki page should be updated with a brief explanation of what was discussed/decided in each topic. With that in mind, below are the issues that the networking team will need to resolve in order to implement Peer Sharing:

Things that are somewhat decided & technicalities

For now Peer Sharing is being idealized as a Request-Response type of protocol, that will aid the node obtaining more known peers.

Asking Peers

The initial stage of the Peer Sharing protocol. The Peer Selection Governor should determine when a node should perform Peer Sharing. Currently, the Peer Selection Governor's legacy sharing mechanism will consider the target number of Known Peers and some rate limit of share requests variable to decide when to ask for peers. We can reuse this, however there may be additional conditions, such as:

  • Is Peer Sharing enabled?
  • Are we in Bulk-Sync?

The next step is to choose which peers to ask to since the Peer Selection Governor already provides a method for deciding how many peers need fetching (old system depends on policyMaxInProgressGossipReqs and policyPickKnownPeersForGossip variables). To make share requests, we need to know which peers are available, e.g. positive willingness values configured. We can obtain this information through configuration files or handshake. Changes to the Handshake and Node Configuration and Topology files are implied by this.

We know which peers are available to ask based on their willingness information. Only established peers should be asked (i.e. start a request-response protocol), as their valency is sufficiently high (if, in the future, we decide that we do need ask cold peers, we can make them 'warm' anyway).

With this resolved, all that remains is to select a random set of peers from the established to-ask set, and a share request will be sent to all of the selected peers. It should be noted that the present legacy sharing mechanism will utilise the target number of Known Peers to decide when to ask for peers; ideally, it will ask for enough peers to make the node meet this target, therefore we should divide the number of peers requested by the number of to-ask peers sampled.

It should be noted that the protocol should establish a global maximum number of peers that can be requested on the client side, so that we can protect ourselves against malicious nodes that try to OOM nodes by responding with GB worth of peers. This limit should most likely be determined by the target value.

Sharing Peers

The replying side of the Peer Sharing protocol merely requires us to choose which peers to share with the requesting side. The request includes an upper limit on the maximum number of peers requested by the node. We don't need to know if we've recently answered to this peer because share requests should have a reasonable retry delay for each peer.

We should only share peers that:

  • are not known-to-be-ledger peers;
  • have explicitly stated they do not wish to participate in Peer Sharing via configuration files or handshakes should not be included in the to-share set;
  • we managed to connect-to at some point.

This implies that the node must keep track of which peers are ledger peers, that root peers must be properly configured with private sharing-willingness flags, in order to prevent the possibility of disclosing sensitive information about their ledger peers, and that we every time we have successfully establish a connection with a peer we tag it accordingly. The to-share set should be picked at random.

It should be noted that there will be a limit to how large a response can be, thus the server must not provide more data than that. So, even if the client requests 100000 addresses, the client will only receive, say, 50. (if only that many addresses will fit into the limit).

Receiving and handling the shared peer response set

After receiving the result set, it was considered to conduct some sort of peer validation, such as confirming the addresses are indeed contactable, in order to prevent the spread of incorrect addresses through Peer Sharing. Saying this, we are aware that there are a certain adversarial behaviors that could potentially take advantage of the Peer Sharing protocol. For now the design was followed the simplest approach, since it does not have any critical performance objectives and rates for convergence/divergence can be very slow, further more we already have mechanisms to slow down the impact of such adversarial behavior in our P2P stack. There are of course other ideas such as:

  • Keep track of who informed us about which peer, and if we see that peer gave us bad addresses, further extend the timeout period before we may ask that peer for more addresses;

However we deemed this not being worth to implement in the first iteration.

Node Configuration and Topology Changes

As described in this section, a node's configuration files will require a new set of flags. These flags indicate a node's desire to participate in Peer Sharing.

The 2 edge cases of a node type are: Block Producer and Wallet - the normal case being a Relay node, these are each node type view on Peer Sharing:

  • Relay nodes should have no problem participating in Peer Sharing and its address being forwarded to other nodes;
  • Block Producers should not be known (that's why they should always be behind relays), so they can't participate in Peer Sharing;
  • Wallet addresses are not very useful for Relays but it is useful for Wallets to participate on the network and know more addresses, hence they should participate on Peer Sharing but their address should be private.

With these use cases in mind a new flag in the node configuration file should be added, allowing the user to specify the following options:

  • NoPeerSharing - Peer Sharing is disabled globally
  • PeerSharingPublic - Peer Sharing is enabled and my address is public (i.e. other nodes will forward its address)
  • PeerSharingPrivate - Peer Sharing is enabled but my address is private (i.e. other nodes won't forward its address)

Another use case is when a node indicates in its topology file that it wants to engage in Peer Sharing but does not want to share about a specific configured peer. There is already a "advertise" flag available for this purpose, which can let you know whether or not it is appropriate to share any information about this address.

NOTE: This is no longer true in the current implementation. See Bug fixes and Design changes for details.

Future Enhancements

  1. One of the topics that was also discussed for necessary future work was caching known peers so that a node can recover more rapidly across reboots/failures. For this a node could serialize its Known Set to disk so it could be reinitialised as soon as it starts.
  2. Record information about the effectiveness of Peer Sharing and associated analysis (service assurance)

Low Level Design

Peer Sharing MiniProtocol

Description

The Peer Sharing MiniProtocol will be a simple Request-Reply protocol. Peer Sharing Protocol is used by nodes to perform share requests to upstream peers. Requested peers will share a subset of their Known Peers.

Following the Shelley Networking Protocol document, it should be easy enough to re-use the already existing one to our fit:

image

State Machine

Protocol Messages (note that this should be refine from the Request-Response protocol above):

  • MsgShareRequest amount: The client requests a maximum number of peers to be shared (amount). Ideally this amount should limited by a protocol level constant to disallow a bad actor from requesting too many peers.
  • MsgSharePeers [peerAddress]: The server replies with a set of peers. Ideally the amount of information (e.g. reply byte size) should be limited by a protocol level constant to disallow a bad actor from sending too much information.
  • MsgDone: Terminating Message.
Transition Table
From State Message
Parameters To State
StIdle MsgShareRequest amount StBusy
StBusy MsgSharePeers [peerAddress] StIdle
StIdle MsgDone StDone

Client Implementation Details

The initiator side will have to be running indefinitely since protocol termination means either an error or peer demotion. Because of this the protocol won't be able to be run as a simple request-response protocol. To overcome this the client side implementation will use a registry so that each connected peer gets registered and assigned a controller with 2 queues (request, result queues). This controller will be used to issue requests to the client implementation which will be waiting for the queue to be filled up to send a MsgShareRequest. After sending a request, the result is put into the result queue.

If a peer gets disconnected, it should get unregistered.

Server Implementation Details

As soon as the server receives a share request it needs to pick subset not bigger than the value specified in the request's parameter. The reply set needs to be sampled randomly from the Known Peer set according to the following constraints:

  • Only pick peers that we managed to connect-to at some point
  • Pick not known-to-be-ledger peers
  • Pick peers that have a public willingness information (e.g. PeerSharingPublic).

If a peer has NoPeerSharing flag value do not do not ask it for peers. This peer won't even have the Peer Sharing miniprotocol server running.

If a given peer has PeerSharingPublic and DoNotAdvertise flags enabled at the same time, DoNotAdvertisePeer should have priority, so the peer shouldn't be shared. Also if a peer has PeerSharingPrivate and DoAdvertisePeer enabled at the same time, PeerSharingPrivate should be respected. Given this, if a local/remote peer has expressed that its address should be private, when building the response set one should respect that privacy even if some other public flag conflicts with it.

Computing the result (i.e. random sampling of available peers) needs access to the PeerSelectionState which is specific to the peerSelectionGovernorLoop. However when initializing the server side of the protocol we have to provide the result computing function early in the consensus side. This means we will have to find a way to delay the function application all the way to diffusion and share the relevant parts of PeerSelectionState with this function via a TVar.

CDDL Specification

;
; Peer Sharing MiniProtocol
;

peerSharingMessage = msgShareRequest
                   / msgSharePeers
                   / msgDone

msgShareRequest = [0, byte]
msgSharePeers   = [1, peerAddresses]
msgDone         = [2]

peerAddresses = [* peerAddress]

byte = 0..255

              ; ipv4 + portNumber
peerAddress = [0, word32, portNumber]
              ; ipv6 + portNumber
            / [1, word32, word32, word32, word32, flowInfo, scopeId, portNumber]

portNumber = word16

flowInfo = word32
scopeId = word32

Changes to Configuration Files

As mentioned in section Node Configuration and Topology Changes, the node configuration file will need a new flag. This flag will indicate a node's desire to participate in Peer Sharing. Given this is going to be necessary:

  • Add a new configuration option (in cardano-node/../Configuration/POM.hs) called PeerSharing with 3 possible values: NoPeerSharing, PeerSharingPublic, PeerSharingPrivate.
    • Propagate this change all the way to the Peer Selection Governor.
  • Track PeerAdvertise in public roots, i.e. propagate this from topology files all the way to RootPeersDNS.hs
    • This should be done by resolving the domain name and tag all resolved IPs with the configured advertise value
  • Update documentation files
  • If P2P flag is disabled then ignore the PeerSharing flag overwriting it to NoPeerSharing

Changes to Handshake

The handshake mini protocol is a generic protocol that can negotiate any kind protocol parameters. It only assumes that protocol parameters can be encoded to, and decoded from, CBOR terms. Given this one just needs to add PeerSharing flag values to the codec as an extra protocol parameter. This will require:

  • Adding CBOR encoder/decoder for PeerSharing type
  • Add a new NodeToNode version
  • Extend Handshake protocol to accommodate this extra protocol parameter
  • Change the nodeToNodeCBORTerm function to deal with this new protocol parameter. A simple solution would be to populate the missing parameter with NoPeerSharing by default.

Changes to Peer Selection Governor

As mentioned the Peer Selection Governor already has implemented most of the decision mechanisms to perform Peer Sharing. However, this implementation is set to ask Known Peers and we want to change it to Established Peers. Known Peers know nothing about Established Peers so this will require some work and refactoring. Also, the whole testing infrastructure has this particular detail in mind, so one would also have to change the test suite to make sure the refactor is successful.

When receiving the reply to the issued share request one needs to filter the response set against the known-to-be-ledger peers before adding to the Known Peers set, to make sure we don't add any ledger peers.

To summarize the low level design decisions for the Peer Selection Governor consist:

  • Change the Known Peers belowTarget Peer Selection Governor action:
    • Only ask Established Peers
    • Only ask Peers with a peer with a willingness value of either PeerSharingPublic or PeerSharingPrivate.
    • Keep the other already builtin metrics (such as not asking the same peer twice too often, etc...)
    • If local peer PeerSharing value is NoPeerSharing, meaning Peer Sharing is disabled, no Peer Sharing requests should be issued.

For the change above, moving some of the infrastructure from PeerSelection/KnowPeers.hs to PeerSelection/EstablishedPeers.hs will be needed, as well as refactoring all the associated tests.

Finding a way to adapt jobPhase2 to include a check for ledger peers (This requires Changes to Known Peers) will also be needed.

Changes to Known Peers

Known Peers will need to be extended with extra information in order to implement Peer Sharing. As already could be inferred from the sections above, Known Peers will need to track:

  • Peer Willingness information
  • If they come from ledger
  • If at some point we managed to connect to it.

Suggested Task Order

There might be tasks that can be done in parallel but I'll try to come up with a sequential order that tries to optimize for dependencies:

  1. Changes to Known Peers
  2. Changes to Peer Selection Governor
    1. Refactor
    2. Include Peer Sharing changes
  3. Changes to Handshake
  4. Peer Sharing MiniProtocol
  5. Changes to Configuration Files (Needs to change cardano-node)

Post-Implementation Notes

Implementation PR

Here's the main peer sharing implementation PR: #4019

Related PRs

Here's a related PR that implements light peer sharing, a way for inbound connections to be made known to the peer selection governor:

Here's a PR that adds Peer Sharing protocol to wireshark dissector:

Bug fixes and Design changes

After having found the following bug: #4642. The team went through an extensive discussion about how one could simplify the current design to both fix and mitigate problems like this one.

We ended up noticing that there is no real need for PeerSharingPrivate. The use case we had in mind (see Node Configuration and Topology Changes) is not really worth the added complexity, but really what made us remove this flag option was the fact that it can not really be enforced on the remote side of the protocol and there's no way to punish bad actors. There's still a way for an user to not share an address via the AdvertisePeer flag on the local roots configuration.

In a nutshell #4644 removes the PeerSharingPrivate flag and greatly simplifies the handshake logic making it truly symmetric.