Skip to content

Churn Management

Otto V edited this page Sep 8, 2021 · 17 revisions

Overview


The dynamics of peer participation, or churn, are an inherent property of Peer-to-Peer (P2P) systems and critical for design and evaluation. Accurately characterizing churn requires precise and unbiased information about the arrival and departure of peers, which is challenging to acquire due to the nature of the P2P systems.

What is Churn Management?


We can briefly define churn as the process of peers leaving and entering the system continuously. Churn Management tackles the common tasks based on the interactions of the peers.

Specification


1. Discovery

Peers need to be able to discover other Peers based on their perspective of the network.

Our Overlay should allow that any Peer A is able to reach any other Peer B even if Peer B is not on a direct communication path with Peer A.

For the current overlay, Gemini, the routing table is divided into 2 parts, All Peers on each are considered to be on a direct communication path between them. All other nodes that are reachable through hops are said to be on an indirect communication path.

This allows for an on-demand discovery of peers through hops, harnessing the ability of the overlay to reach any node on the network on 2 hops in an almost constant manner (probabilistic routing). Is important to distinguish that even if the Peer is discovered by using hops on the P2P network (indirect communication path), after the discovery a direct communication is possible, details about how to archive this depend on the implementation and the inherent peer maintenance costs associated.

A Specialized Peer type that helps the standard Peer on discovery tasks should exist if we need advanced capabilities for discovery. But a basic defined logic should exist on all peers that allow the network to not depend on the specialized peer for basic performance.

2. Join

When a new Peer X joins the network, it first contacts an existing bootstrap peer E. Peer E picks a Peer H with X’s hat(prefix) from its bootcase(suffix group), as well as a Peer B with X’s boot(suffix) from its hatcase(prefix group). X collects its hatcase from H and bootcase from B. If E does not find appropriate H or B, it asks another Peer in its hat or bootcase for help.

This way a Peer that joins is paired with its corresponding Prefix and Suffix groups, which should provide sufficient peering for the application needs.

3. Leave

A peer that wants to leave the network basically disconnects and relies on the maintenance routine to broadcast its unavailability.

Every x unit of time defined by the maintenance params, the Peers on each individual group (suffix or prefix) ordered in a Ring-like or linked list structure verify the status of the next peer in the list (Last Peer verify the first) by sending a "heartbeat" message. If T failures happened the Peer then multicasts the unavailability of the verified Peer to all the nodes in this group, sending a Peer Event Message.

Peer responses to Heartbeat messages can be used by an implementation to profile peers and establish dynamic ban/timeout procedures for faulty nodes as a response against scenarios that could make network conditions unreliable within a group.

The sending of a Direct Event Message is an option if an implementation chooses to allow regular peering with elements outside of the prefix/suffix groups. This will marginally increase maintenance/bandwith costs but allows for a more flexible communication path if required.

4. Maintenance

The Gemini overlay routing table consists of two parts, one containing all peers sharing an h-bits prefix and the other containing all peers having a b-bits common suffix(h and b are systematic parameters).

The values for h and b define the size of these parts and add to the maintenance cost.

Gemini adopts a report-based routing table maintenance algorithm. When a membership change event occurs, the overlay will multicast this event in a report-based mechanism, which helps the overlay consumes low bandwidth to deal with Peer’s join and leave.

When values of h and b are the same our maintenance is defined by the following formula:

M = (4(N) * f ) / (2^b * L)

where:

N = number of peers in the network f = redundancy of the multicast algorithm

b = number of bits taken for the common suffix L = average lifetime of peers in seconds

Additional elements added to the basic maintenance routine will add to this cost

When the maintenance cost is not acceptable by peers, Gemini also can trade hops for bandwidth consumption like other overlays by changing the params to make the routing table smaller to fit the application bandwidth needs.

Example scenario:

Assuming that the average lifetime of peers is 1 hour(L), all the items in the routing table have to be refreshed in a period of 1 hour. It means that for a system using Gemini, which consists of 5,000,000 nodes, and h and b are both set as 10, on average every prefix/suffix-group contains 5,000,000/2^10 = 4883 peers.

Then on average, every node receives (4883+4883)*2=19532 event messages per hour, about 5.43 messages per second. Plus heartbeats and their responses, the total message count per second will not exceed 6 messages per second.

If Messages size is 500bit, the bandwidth cost average will be around 6 message/second * 500bit = 3kbps.

Clone this wiki locally