You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Peer-crdt was created as an extensible, abstract and understandable way of reasoning about operation-based CRDTs in a content-addressable world like IPFS. Having a native DAG representation of operations allowed us to use IPFS synchronisation primitives without having to care about the details, with all the automatic benefits of being able to use IPLD, opaque replication, content routing, etc.
But we're still facing some performance issues that are inherent to the limited resources on browsers and all the work that needs to happen in js-ipfs to make graph synchronisation to work efficiently.
The goal here is then to create a protocol that allows peers to quickly synchronise the CRDT state and operations between each other in an efficient way. Having successfully implemented this protocol, we can then consider extrapolating some of this into IPFS territory.
Besides trying to build a faster sync, we also want to take this opportunity to address some pending issues like:
Form an intentional topology that is less resource-consuming
This protocol should allow for a replica to connect to another replica and be able to state what the replica state is. The replica state can be compressed by a vector clock that determines the latest operation that has been received.
Overlay topology
Instead of a everyone-connects-to-all topology we currently have, we need a way have a topology that provides a batter balance between latency with the number of maintained connections. Since the web environment is very constrained, maintaining a high number of connections is to be avoided. In the case of CRDTs, since we don't require consensus to progress, we can afford to have a maximum of f (fan-out) connections per node. Some of these connections will be using eager push and others will be using lazy push. At any given time we can switch an lazy connection into an eager one and vice-versa.
We can use the principles in Epidemic Broadcast Trees to form and maintain such a topology:
Topology protocol
Each node has a fanout f, the number of neighbour nodes it uses to gossip with
A subset of f uses eager push
The other part uses lazy push
The connections with eager push are selected in a way that their closure effectively builds a broadcast tree.
Lazy push connections are used to ensure gossip reliability when nodes fail and to quickly heal the broadcast tree.
In each node, the set of gossip nodes are not changed between gossip rounds. The same peers are used until failures are detected.
To construct the overlay, we use the Spanning Tree Construction and Repair algorithms presented in Epidemic Broadcast Trees.
bootstrap
Here we have two modes of bootstrapping: prime or replica.
In prime mode, a replica is immediately considered bootstrapped. This can be useful when creating a new CRDT instance, where we don't want to block until there is another replica we can sync with.
Bootstrapping in the replica mode
In replica mode, a replica blocks until it's bootstrapped. This is used when the current node is not the CRDT instance creator. Having this mode allows us to maintain the causal stability by avoiding new coming nodes to enter the CRDT directly from the initial (zero) state.
In replica mode, the bootstrap network protocol is performed with a remote replica once we can connect. To optimize, a replica should not initiate the bootstrap protocol with more than one node at the same time.
Bound over a CRDT instance in a mutually-authenticated and private channel, a peer-to-peer connection starts by each peer immediately announcing their current state in the form of a vector clock.
When receiving that remove state, a peer:
updates their causal stability table (see further down)
computes and sends the missing operations
As a part of the general sync protocol, when sending each one of the operations, the replica updates the connection-bound vector clock that represents the state of the other peer.
The bootstrap protocol is said to be complete once a replica reaches the state (vector clock) the remote peer initially declared.
From then on, the normal sync protocol is in action.
Sync protocol
After bootstrapping, a connection with a remote replica can either be in two states:
Eager-push: where updates are sent immediately
Lazy-push: where only the vector clock if the latest operation is transmitted
Preemptive (eager) anti-entropy
As said before, when connected to a remote replica, a replica associates the last sent operation to that connection — in the form of a vector clock (bear in mind that this is not the vector clock in the causal stability table - see further down).
When creating an operation or receiving an operation from another replica, a replica should:
by looking at the connection from the vector clock, check whether the remote replica has received that operation.
if not, send it and update the connection vector clock.
Receiving an operation
When receiving an operation from a remote replica, a replica should:
extract the replica ID and the vector clock from the operation
update the causal stability table (replica ID -> vector clock)
check whether that operation (lookup by vector clock) has already been processed
if not, process it
forward it to the other connections in eager mode that are missing it
vector-clocks
Besides being content-addressable, the operation store must also be addressable by vector clock.
Also, we don't need to transmit the entire vector clock all the time. As an optimisation, a replica can just send a vector clock delta (only the parts of the vector clock that have changed).
Causal Stability
To be able to to truncate the operation logs, we need to maintain a a causal stability table, as described here.
Membership
As operations contain data of the originating node, they can be used to infer membership.
That can be used as a way to gossip about membership, but we still need a protocol for a node to leave membership.
The easiest way here is to create CRDT operations that signify membership changes. A node, when joining the membership, creates a membership operation that will be replicated. When leaving, another membership operation gets added.
Timeout
When a node is offline for some time it will prevent causal stability from progressing, which will prevent op-log truncation from progressing. Here we need a protocol to remove a foreign node from membership (either triggered by a timeout or a manual operation). To progress, this protocol must touch all remaining members, to make sure that no member has a stored an operation from the node about to leave.
RFC: peer-crdt-ipfs replication protocol
Peer-crdt was created as an extensible, abstract and understandable way of reasoning about operation-based CRDTs in a content-addressable world like IPFS. Having a native DAG representation of operations allowed us to use IPFS synchronisation primitives without having to care about the details, with all the automatic benefits of being able to use IPLD, opaque replication, content routing, etc.
But we're still facing some performance issues that are inherent to the limited resources on browsers and all the work that needs to happen in js-ipfs to make graph synchronisation to work efficiently.
The goal here is then to create a protocol that allows peers to quickly synchronise the CRDT state and operations between each other in an efficient way. Having successfully implemented this protocol, we can then consider extrapolating some of this into IPFS territory.
Besides trying to build a faster sync, we also want to take this opportunity to address some pending issues like:
Protocol overview
This protocol should allow for a replica to connect to another replica and be able to state what the replica state is. The replica state can be compressed by a vector clock that determines the latest operation that has been received.
Overlay topology
Instead of a everyone-connects-to-all topology we currently have, we need a way have a topology that provides a batter balance between latency with the number of maintained connections. Since the web environment is very constrained, maintaining a high number of connections is to be avoided. In the case of CRDTs, since we don't require consensus to progress, we can afford to have a maximum of f (fan-out) connections per node. Some of these connections will be using eager push and others will be using lazy push. At any given time we can switch an lazy connection into an eager one and vice-versa.
We can use the principles in Epidemic Broadcast Trees to form and maintain such a topology:
Topology protocol
The connections with eager push are selected in a way that their closure effectively builds a broadcast tree.
Lazy push connections are used to ensure gossip reliability when nodes fail and to quickly heal the broadcast tree.
In each node, the set of gossip nodes are not changed between gossip rounds. The same peers are used until failures are detected.
To construct the overlay, we use the Spanning Tree Construction and Repair algorithms presented in Epidemic Broadcast Trees.
bootstrap
Here we have two modes of bootstrapping: prime or replica.
In prime mode, a replica is immediately considered bootstrapped. This can be useful when creating a new CRDT instance, where we don't want to block until there is another replica we can sync with.
Bootstrapping in the replica mode
In replica mode, a replica blocks until it's bootstrapped. This is used when the current node is not the CRDT instance creator. Having this mode allows us to maintain the causal stability by avoiding new coming nodes to enter the CRDT directly from the initial (zero) state.
In replica mode, the bootstrap network protocol is performed with a remote replica once we can connect. To optimize, a replica should not initiate the bootstrap protocol with more than one node at the same time.
Bound over a CRDT instance in a mutually-authenticated and private channel, a peer-to-peer connection starts by each peer immediately announcing their current state in the form of a vector clock.
When receiving that remove state, a peer:
As a part of the general sync protocol, when sending each one of the operations, the replica updates the connection-bound vector clock that represents the state of the other peer.
The bootstrap protocol is said to be complete once a replica reaches the state (vector clock) the remote peer initially declared.
From then on, the normal sync protocol is in action.
Sync protocol
After bootstrapping, a connection with a remote replica can either be in two states:
Preemptive (eager) anti-entropy
As said before, when connected to a remote replica, a replica associates the last sent operation to that connection — in the form of a vector clock (bear in mind that this is not the vector clock in the causal stability table - see further down).
When creating an operation or receiving an operation from another replica, a replica should:
Receiving an operation
When receiving an operation from a remote replica, a replica should:
vector-clocks
Besides being content-addressable, the operation store must also be addressable by vector clock.
Also, we don't need to transmit the entire vector clock all the time. As an optimisation, a replica can just send a vector clock delta (only the parts of the vector clock that have changed).
Causal Stability
To be able to to truncate the operation logs, we need to maintain a a causal stability table, as described here.
Membership
As operations contain data of the originating node, they can be used to infer membership.
That can be used as a way to gossip about membership, but we still need a protocol for a node to leave membership.
The easiest way here is to create CRDT operations that signify membership changes. A node, when joining the membership, creates a membership operation that will be replicated. When leaving, another membership operation gets added.
Timeout
When a node is offline for some time it will prevent causal stability from progressing, which will prevent op-log truncation from progressing. Here we need a protocol to remove a foreign node from membership (either triggered by a timeout or a manual operation). To progress, this protocol must touch all remaining members, to make sure that no member has a stored an operation from the node about to leave.
References
The text was updated successfully, but these errors were encountered: