From 5378e61d912de50bc19479dafb3ffd2bd9d47f32 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Kripalani?= Date: Wed, 7 Nov 2018 18:49:28 +0000 Subject: [PATCH 01/39] WIP Initial Kademlia DHT spec. --- kad-dht/README.md | 226 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 226 insertions(+) create mode 100644 kad-dht/README.md diff --git a/kad-dht/README.md b/kad-dht/README.md new file mode 100644 index 000000000..c4c899f18 --- /dev/null +++ b/kad-dht/README.md @@ -0,0 +1,226 @@ +# libp2p Kademlia DHT specification +The Kademlia Distributed Hash Table (DHT) subsystem in libp2p is a DHT implementation largely based on the Kademlia [0] whitepaper, augmented with notions from S/Kademlia [1], Coral [2] and mainlineDHT \[3\]. + +This specification assumes the reader has prior knowledge of those systems. So rather than explaining DHT mechanics from scratch, we focus on differential areas: + +1. Specialisations and peculiarities of the libp2p implementation. +2. Actual wire messages. +3. Other algorithmic or non-standard behaviours worth pointing out. + +For everything else that isn't explicitly stated herein, it is safe to assume behaviour similar to Kademlia-based libraries. + +Code snippets use a Go-like syntax. + +## Authors + +* Protocol Labs. + +## Editors + +* [Raúl Kripalani](https://github.com/raulk) +* [John Hiesey](https://github.com/jhiesey) + +## Distance function (dXOR) + +The libp2p Kad DHT uses the **XOR distance metric** as defined in the original Kademlia paper [0]. Peer IDs are normalised through the SHA256 hash function. + +For recap, `dXOR(sha256(id1), sha256(id2))` is the number of common leftmost bits between SHA256 of each peer IDs. The `dXOR` between us and a peer X designates the bucket index that peer X will take up in the Kademlia routing table. + +## Kademlia routing table + +The data structure backing this system is a k-bucket routing table, closely following the design outlined in the Kademlia paper [0]. The default value for `k` is 20, and the maximum bucket count matches the size of the SHA256 function, i.e. 256 buckets. + +The routing table is unfolded lazily, starting with a single bucket a position 0 (representing the most distant peers), and splitting it subsequently as closer peers are found, and the capacity of the nearmost bucket is exceeded. + +## Alpha concurrency factor (α) + +The concurrency of node and value lookups are limited by parameter `α`, with a default value of 3. This implies that each lookup process can perform no more than 3 inflight requests, at any given time. + +## Record keys + +Records in the DHT are keyed by CID [4]. There are intentions to move to multihash [5] keys in the near future, as certain CID components like the multicodec are redundant. + +## Interfaces + +The libp2p Kad DHT implementation satisfies the routing interfaces: + +```go +type Routing interface { + ContentRouting + PeerRouting + ValueStore + + // Kicks off the bootstrap process. + Bootstrap(context.Context) error +} + +// ContentRouting is used to find information about who has what content. +type ContentRouting interface { + // Provide adds the given CID to the content routing system. If 'true' is + // passed, it also announces it, otherwise it is just kept in the local + // accounting of which objects are being provided. + Provide(context.Context, cid.Cid, bool) error + + // Search for peers who are able to provide a given key. + FindProvidersAsync(context.Context, cid.Cid, int) <-chan pstore.PeerInfo +} + +// PeerRouting is a way to find information about certain peers. +// +// This can be implemented by a simple lookup table, a tracking server, +// or even a DHT (like herein). +type PeerRouting interface { + // FindPeer searches for a peer with given ID, returns a pstore.PeerInfo + // with relevant addresses. + FindPeer(context.Context, peer.ID) (pstore.PeerInfo, error) +} + +// ValueStore is a basic Put/Get interface. +type ValueStore interface { + // PutValue adds value corresponding to given Key. + PutValue(context.Context, string, []byte, ...ropts.Option) error + + // GetValue searches for the value corresponding to given Key. + GetValue(context.Context, string, ...ropts.Option) ([]byte, error) +} +``` + +## Value lookups + +When looking up an entry in the DHT, the implementor should collect at least `Q` (quorum) responses from distinct nodes to check for consistency before returning an answer. + +Should the responses be different, the `Validator.Select()` function is used to resolve the conflict and select the _best_ result. + +**Entry correction.** Nodes that returned _worse_ records are updated via a direct `PUT_VALUE` RPC call when the lookup completes. Thus the DHT network eventually converges to the best value for each record, as a result of nodes collaborating with one another. + +### Algorithm + +Let's assume we’re looking for key `K`. We first try to fetch the value from the local store. If found, and `Q == { 0, 1 }`, the search is complete. + +Otherwise, the local result counts for one towards the search of `Q` values. We then enter an iterative network search. + +We keep track of: + +* the number of values we've fetched (`cnt`). +* the best value we've found (`best`), and which peers returned it (`Pb`) +* the set of peers we've already queried (`Pq`) and the set of next query candidates sorted by distance from `K` in ascending order (`Pn`). +* the set of peers with outdated values (`Po`). + +**Initialization**: seed `Pn` with the `α` peers from our routing table we know are closest to `K`, based on the XOR distance function. + +**Then we loop:** + +*WIP (raulk): lookup timeout.* + +1. If we have collected `Q` or more answers, we cancel outstanding requests, return `best`, and we notify the peers holding an outdated value (`Po`) of the best value we discovered, by sending `PUT_VALUE(K, best)` messages. _Return._ +2. Pick as many peers from the candidate peers (`Pn`) as the `α` concurrency factor allows. Send each a `GET_VALUE(K)` request, and mark it as _queried_ in `Pq`. +3. Upon a response: + 1. If successful, and we receive a value: + 1. If this is the first value we've seen, we store it in `best`, along with the peer who sent it in `Pb`. + 2. Otherwise, we resolve the conflict by calling `Validator.Select(best, new)`: + 1. If the new value wins, store it in `best`, and mark all formerly “best" peers (`Pb`) as _outdated peers_ (`Po`). The current peer becomes the new best peer (`Pb`). + 2. If the new value loses, we add the current peer to `Po`. + 2. If successful without a value, the response will contain the closest nodes the peer knows to the key `K`. Add them to the candidate list `Pn`, except for those that have already been queried. + 3. If an error or timeout occurs, discard it. +4. Go to 1. + +## Entry validation + +When constructing a DHT node, it is possible to supply a record `Validator` object conforming to this interface: + +``` +// Validator is an interface that should be implemented by record validators. +type Validator interface { + + // Validate validates the given record, returning an error if it's + // invalid (e.g., expired, signed by the wrong key, etc.). + Validate(key string, value []byte) error + + // Select selects the best record from the set of records (e.g., the + // newest). + // + // Decisions made by select should be stable. + Select(key string, values [][]byte) (int, error) +} +``` + +`Validate()` is a pure function that reports the validity of a record. It may validate a cryptographic signature, or else. It is called on two occasions: + +1. To validate incoming values in response to `GET_VALUE` calls. +2. To validate outgoing values before storing them in the network via `PUT_VALUE` calls. + +Similarly, `Select()` is a pure function that returns the best record out of 2 or more candidates. It may use a sequence number, a timestamp, or other heuristic to make the decision. + +## Public key records + +Apart from storing arbitrary values, the libp2p Kad DHT stores node public keys in records under the `/pk` namespace. That is, the entry `/pk/` will store the public key of peer `peerID`. + +DHT implementations may optimise public key lookups by providing a `GetPublicKey(peer.ID) (ci.PubKey)` method, that, for example, first checks if the key exists in the local peerstore. + +The lookup for public key entries is identical to a standard entry lookup, except that a custom `Validator` strategy is applied. It checks that equality `SHA256(value) == peerID` stands when: + +1. Receiving a response from a `GET_VALUE` lookup. +2. Storing a public key in the DHT via `PUT_VALUE`. + +The record is rejected if the validation fails. + +## Provider records + +Nodes must keep track of which nodes advertise that they provide a given key (CID). These provider advertisements should expire, by default, after 24 hours. These records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS` messages. + +_WIP (jhiesey): explain what actually happens when `Provide()` is called._ + +For performance reasons, a node may prune expired advertisements only periodically, e.g. every hour. + +## Node lookups + +_WIP (raulk)._ + +## Bootstrap process +The bootstrap process is responsible for keeping the routing table filled and healthy throughout time. It runs once on startup, then periodically with a configurable frequency (default: 5 minutes). + +On every run, we generate a random node ID and we look it up via the process defined in *Node lookups*. Peers encountered throughout the search are inserted in the routing table, as per usual business. + +This process is repeated as many times per run as configuration parameter `QueryCount` (default: 1). Every repetition is subject to a `QueryTimeout` (default: 10 seconds), which upon firing, aborts the run. + +## RPC messages + +_WIP (jheisey): consider just dumping a nicely formatted and simplified protobuf._ + +See [protobuf definition](https://github.com/libp2p/go-libp2p-kad-dht/blob/master/pb/dht.proto) + +On any error, the entire stream is reset. This is probably not the behavior we want. + +* `FIND_NODE(key bytes) -> (nodes PeerInfo[])` +Finds the `ncp` (default:6) nodes closest to `key` from the routing table and returns an array of `PeerInfo`s. If a node with id equal to `key` is found, returns only the `PeerInfo` for that node. +* `GET_VALUE(key bytes) -> (record Record, closerPeers PeerInfo[])` +If `key` is a public key (begins with `/pk/`) and the key is known, returns a `Record` containing that key. Otherwise, returns the `Record` for the given key (if in the datastore) and an array of `PeerInfo`s for closer peers. +* `PUT_VALUE(key bytes, value Record) -> ()` +Validates `value` and, if it is valid, stores it in the datastore. +* `GET_PROVIDERS(key bytes) -> (providerPeers PeerInfo[], closerPeers PeerInfo[])` +Verifies `key` is a valid CID. Returns `providerPeers` if in the providers cache, and an array of closer peers. +* `ADD_PROVIDER(key, providerPeers PeerInfo[]) -> ()` +Verifies `key` is a valid CID. For each provider `PeerInfo` that matches the sender's id and contains one or more multiaddrs, that provider info is added to the peerbook and the peer is added as a provider for the CID. +* `PING() -> ()` Tests connectivity to destination node. Currently never sent. + +# Appendix A: differences in implementations + +The `addProvider` handler behaves differently across implementations: + * in js-libp2p-kad-dht, the sender is added as a provider unconditionally. + * in go-libp2p-kad-dht, it is added once per instance of that peer in the `providerPeers` array. + +--- + +# References + +[0]: Maymounkov, P., & Mazières, D. (2002). Kademlia: A Peer-to-Peer Information System Based on the XOR Metric. In P. Druschel, F. Kaashoek, & A. Rowstron (Eds.), Peer-to-Peer Systems (pp. 53–65). Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-45748-8_5 + +[1]: Baumgart, I., & Mies, S. (2014). S / Kademlia : A practicable approach towards secure key-based routing S / Kademlia : A Practicable Approach Towards Secure Key-Based Routing, (June). https://doi.org/10.1109/ICPADS.2007.4447808 + +[2]: Freedman, M. J., & Mazières, D. (2003). Sloppy Hashing and Self-Organizing Clusters. In IPTPS. Springer Berlin / Heidelberg. Retrieved from www.coralcdn.org/docs/coral-iptps03.ps + +[3]: [bep_0005.rst_post](http://bittorrent.org/beps/bep_0005.html) + +[4]: [GitHub - ipld/cid: Self-describing content-addressed identifiers for distributed systems](https://github.com/ipld/cid) + +[5]: [GitHub - multiformats/multihash: Self describing hashes - for future proofing](https://github.com/multiformats/multihash) From ee9734bdfa963709ab7346d5015d59d8d9ed65f4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Kripalani?= Date: Thu, 8 Nov 2018 10:18:45 +0000 Subject: [PATCH 02/39] format kad-dht spec. --- kad-dht/README.md | 160 ++++++++++++++++++++++++++++++++-------------- 1 file changed, 111 insertions(+), 49 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index c4c899f18..baf7ae57c 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -1,13 +1,19 @@ # libp2p Kademlia DHT specification -The Kademlia Distributed Hash Table (DHT) subsystem in libp2p is a DHT implementation largely based on the Kademlia [0] whitepaper, augmented with notions from S/Kademlia [1], Coral [2] and mainlineDHT \[3\]. -This specification assumes the reader has prior knowledge of those systems. So rather than explaining DHT mechanics from scratch, we focus on differential areas: +The Kademlia Distributed Hash Table (DHT) subsystem in libp2p is a DHT +implementation largely based on the Kademlia [0] whitepaper, augmented with +notions from S/Kademlia [1], Coral [2] and mainlineDHT \[3\]. + +This specification assumes the reader has prior knowledge of those systems. So +rather than explaining DHT mechanics from scratch, we focus on differential +areas: 1. Specialisations and peculiarities of the libp2p implementation. 2. Actual wire messages. 3. Other algorithmic or non-standard behaviours worth pointing out. -For everything else that isn't explicitly stated herein, it is safe to assume behaviour similar to Kademlia-based libraries. +For everything else that isn't explicitly stated herein, it is safe to assume +behaviour similar to Kademlia-based libraries. Code snippets use a Go-like syntax. @@ -22,23 +28,36 @@ Code snippets use a Go-like syntax. ## Distance function (dXOR) -The libp2p Kad DHT uses the **XOR distance metric** as defined in the original Kademlia paper [0]. Peer IDs are normalised through the SHA256 hash function. +The libp2p Kad DHT uses the **XOR distance metric** as defined in the original +Kademlia paper [0]. Peer IDs are normalised through the SHA256 hash function. -For recap, `dXOR(sha256(id1), sha256(id2))` is the number of common leftmost bits between SHA256 of each peer IDs. The `dXOR` between us and a peer X designates the bucket index that peer X will take up in the Kademlia routing table. +For recap, `dXOR(sha256(id1), sha256(id2))` is the number of common leftmost +bits between SHA256 of each peer IDs. The `dXOR` between us and a peer X +designates the bucket index that peer X will take up in the Kademlia routing +table. ## Kademlia routing table -The data structure backing this system is a k-bucket routing table, closely following the design outlined in the Kademlia paper [0]. The default value for `k` is 20, and the maximum bucket count matches the size of the SHA256 function, i.e. 256 buckets. +The data structure backing this system is a k-bucket routing table, closely +following the design outlined in the Kademlia paper [0]. The default value for +`k` is 20, and the maximum bucket count matches the size of the SHA256 function, +i.e. 256 buckets. -The routing table is unfolded lazily, starting with a single bucket a position 0 (representing the most distant peers), and splitting it subsequently as closer peers are found, and the capacity of the nearmost bucket is exceeded. +The routing table is unfolded lazily, starting with a single bucket a position 0 +(representing the most distant peers), and splitting it subsequently as closer +peers are found, and the capacity of the nearmost bucket is exceeded. ## Alpha concurrency factor (α) -The concurrency of node and value lookups are limited by parameter `α`, with a default value of 3. This implies that each lookup process can perform no more than 3 inflight requests, at any given time. +The concurrency of node and value lookups are limited by parameter `α`, with a +default value of 3. This implies that each lookup process can perform no more +than 3 inflight requests, at any given time. ## Record keys -Records in the DHT are keyed by CID [4]. There are intentions to move to multihash [5] keys in the near future, as certain CID components like the multicodec are redundant. +Records in the DHT are keyed by CID [4]. There are intentions to move to +multihash [5] keys in the near future, as certain CID components like the +multicodec are redundant. ## Interfaces @@ -87,11 +106,17 @@ type ValueStore interface { ## Value lookups -When looking up an entry in the DHT, the implementor should collect at least `Q` (quorum) responses from distinct nodes to check for consistency before returning an answer. +When looking up an entry in the DHT, the implementor should collect at least `Q` +(quorum) responses from distinct nodes to check for consistency before returning +an answer. -Should the responses be different, the `Validator.Select()` function is used to resolve the conflict and select the _best_ result. +Should the responses be different, the `Validator.Select()` function is used to +resolve the conflict and select the _best_ result. -**Entry correction.** Nodes that returned _worse_ records are updated via a direct `PUT_VALUE` RPC call when the lookup completes. Thus the DHT network eventually converges to the best value for each record, as a result of nodes collaborating with one another. +**Entry correction.** Nodes that returned _worse_ records are updated via a +direct `PUT_VALUE` RPC call when the lookup completes. Thus the DHT network +eventually converges to the best value for each record, as a result of nodes +collaborating with one another. ### Algorithm @@ -116,21 +141,27 @@ We keep track of: 2. Pick as many peers from the candidate peers (`Pn`) as the `α` concurrency factor allows. Send each a `GET_VALUE(K)` request, and mark it as _queried_ in `Pq`. 3. Upon a response: 1. If successful, and we receive a value: - 1. If this is the first value we've seen, we store it in `best`, along with the peer who sent it in `Pb`. - 2. Otherwise, we resolve the conflict by calling `Validator.Select(best, new)`: - 1. If the new value wins, store it in `best`, and mark all formerly “best" peers (`Pb`) as _outdated peers_ (`Po`). The current peer becomes the new best peer (`Pb`). - 2. If the new value loses, we add the current peer to `Po`. - 2. If successful without a value, the response will contain the closest nodes the peer knows to the key `K`. Add them to the candidate list `Pn`, except for those that have already been queried. + 1. If this is the first value we've seen, we store it in `best`, along + with the peer who sent it in `Pb`. + 2. Otherwise, we resolve the conflict by calling `Validator.Select(best, + new)`: + 1. If the new value wins, store it in `best`, and mark all formerly + “best" peers (`Pb`) as _outdated peers_ (`Po`). The current peer + becomes the new best peer (`Pb`). + 2. If the new value loses, we add the current peer to `Po`. + 2. If successful without a value, the response will contain the closest + nodes the peer knows to the key `K`. Add them to the candidate list `Pn`, + except for those that have already been queried. 3. If an error or timeout occurs, discard it. 4. Go to 1. ## Entry validation -When constructing a DHT node, it is possible to supply a record `Validator` object conforming to this interface: +When constructing a DHT node, it is possible to supply a record `Validator` +object conforming to this interface: -``` -// Validator is an interface that should be implemented by record validators. -type Validator interface { +``` // Validator is an interface that should be implemented by record +validators. type Validator interface { // Validate validates the given record, returning an error if it's // invalid (e.g., expired, signed by the wrong key, etc.). @@ -144,20 +175,30 @@ type Validator interface { } ``` -`Validate()` is a pure function that reports the validity of a record. It may validate a cryptographic signature, or else. It is called on two occasions: +`Validate()` is a pure function that reports the validity of a record. It may +validate a cryptographic signature, or else. It is called on two occasions: 1. To validate incoming values in response to `GET_VALUE` calls. -2. To validate outgoing values before storing them in the network via `PUT_VALUE` calls. +2. To validate outgoing values before storing them in the network via + `PUT_VALUE` calls. -Similarly, `Select()` is a pure function that returns the best record out of 2 or more candidates. It may use a sequence number, a timestamp, or other heuristic to make the decision. +Similarly, `Select()` is a pure function that returns the best record out of 2 +or more candidates. It may use a sequence number, a timestamp, or other +heuristic to make the decision. ## Public key records -Apart from storing arbitrary values, the libp2p Kad DHT stores node public keys in records under the `/pk` namespace. That is, the entry `/pk/` will store the public key of peer `peerID`. +Apart from storing arbitrary values, the libp2p Kad DHT stores node public keys +in records under the `/pk` namespace. That is, the entry `/pk/` will +store the public key of peer `peerID`. -DHT implementations may optimise public key lookups by providing a `GetPublicKey(peer.ID) (ci.PubKey)` method, that, for example, first checks if the key exists in the local peerstore. +DHT implementations may optimise public key lookups by providing a +`GetPublicKey(peer.ID) (ci.PubKey)` method, that, for example, first checks if +the key exists in the local peerstore. -The lookup for public key entries is identical to a standard entry lookup, except that a custom `Validator` strategy is applied. It checks that equality `SHA256(value) == peerID` stands when: +The lookup for public key entries is identical to a standard entry lookup, +except that a custom `Validator` strategy is applied. It checks that equality +`SHA256(value) == peerID` stands when: 1. Receiving a response from a `GET_VALUE` lookup. 2. Storing a public key in the DHT via `PUT_VALUE`. @@ -166,48 +207,69 @@ The record is rejected if the validation fails. ## Provider records -Nodes must keep track of which nodes advertise that they provide a given key (CID). These provider advertisements should expire, by default, after 24 hours. These records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS` messages. +Nodes must keep track of which nodes advertise that they provide a given key +(CID). These provider advertisements should expire, by default, after 24 hours. +These records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS` +messages. _WIP (jhiesey): explain what actually happens when `Provide()` is called._ -For performance reasons, a node may prune expired advertisements only periodically, e.g. every hour. +For performance reasons, a node may prune expired advertisements only +periodically, e.g. every hour. ## Node lookups _WIP (raulk)._ ## Bootstrap process -The bootstrap process is responsible for keeping the routing table filled and healthy throughout time. It runs once on startup, then periodically with a configurable frequency (default: 5 minutes). +The bootstrap process is responsible for keeping the routing table filled and +healthy throughout time. It runs once on startup, then periodically with a +configurable frequency (default: 5 minutes). -On every run, we generate a random node ID and we look it up via the process defined in *Node lookups*. Peers encountered throughout the search are inserted in the routing table, as per usual business. +On every run, we generate a random node ID and we look it up via the process +defined in *Node lookups*. Peers encountered throughout the search are inserted +in the routing table, as per usual business. -This process is repeated as many times per run as configuration parameter `QueryCount` (default: 1). Every repetition is subject to a `QueryTimeout` (default: 10 seconds), which upon firing, aborts the run. +This process is repeated as many times per run as configuration parameter +`QueryCount` (default: 1). Every repetition is subject to a `QueryTimeout` +(default: 10 seconds), which upon firing, aborts the run. ## RPC messages -_WIP (jheisey): consider just dumping a nicely formatted and simplified protobuf._ - -See [protobuf definition](https://github.com/libp2p/go-libp2p-kad-dht/blob/master/pb/dht.proto) - -On any error, the entire stream is reset. This is probably not the behavior we want. - -* `FIND_NODE(key bytes) -> (nodes PeerInfo[])` -Finds the `ncp` (default:6) nodes closest to `key` from the routing table and returns an array of `PeerInfo`s. If a node with id equal to `key` is found, returns only the `PeerInfo` for that node. -* `GET_VALUE(key bytes) -> (record Record, closerPeers PeerInfo[])` -If `key` is a public key (begins with `/pk/`) and the key is known, returns a `Record` containing that key. Otherwise, returns the `Record` for the given key (if in the datastore) and an array of `PeerInfo`s for closer peers. -* `PUT_VALUE(key bytes, value Record) -> ()` -Validates `value` and, if it is valid, stores it in the datastore. -* `GET_PROVIDERS(key bytes) -> (providerPeers PeerInfo[], closerPeers PeerInfo[])` -Verifies `key` is a valid CID. Returns `providerPeers` if in the providers cache, and an array of closer peers. -* `ADD_PROVIDER(key, providerPeers PeerInfo[]) -> ()` -Verifies `key` is a valid CID. For each provider `PeerInfo` that matches the sender's id and contains one or more multiaddrs, that provider info is added to the peerbook and the peer is added as a provider for the CID. +_WIP (jheisey): consider just dumping a nicely formatted and simplified +protobuf._ + +See [protobuf +definition](https://github.com/libp2p/go-libp2p-kad-dht/blob/master/pb/dht.proto) + +On any error, the entire stream is reset. This is probably not the behavior we +want. + +* `FIND_NODE(key bytes) -> (nodes PeerInfo[])` Finds the `ncp` (default:6) nodes +closest to `key` from the routing table and returns an array of `PeerInfo`s. If +a node with id equal to `key` is found, returns only the `PeerInfo` for that +node. +* `GET_VALUE(key bytes) -> (record Record, closerPeers PeerInfo[])` If `key` is +a public key (begins with `/pk/`) and the key is known, returns a `Record` +containing that key. Otherwise, returns the `Record` for the given key (if in +the datastore) and an array of `PeerInfo`s for closer peers. +* `PUT_VALUE(key bytes, value Record) -> ()` Validates `value` and, if it is +valid, stores it in the datastore. +* `GET_PROVIDERS(key bytes) -> (providerPeers PeerInfo[], closerPeers +PeerInfo[])` Verifies `key` is a valid CID. Returns `providerPeers` if in the +providers cache, and an array of closer peers. +* `ADD_PROVIDER(key, providerPeers PeerInfo[]) -> ()` Verifies `key` is a valid +CID. For each provider `PeerInfo` that matches the sender's id and contains one +or more multiaddrs, that provider info is added to the peerbook and the peer is +added as a provider for the CID. * `PING() -> ()` Tests connectivity to destination node. Currently never sent. # Appendix A: differences in implementations The `addProvider` handler behaves differently across implementations: * in js-libp2p-kad-dht, the sender is added as a provider unconditionally. - * in go-libp2p-kad-dht, it is added once per instance of that peer in the `providerPeers` array. + * in go-libp2p-kad-dht, it is added once per instance of that peer in the + `providerPeers` array. --- From 8b89dc2521b48bf6edab7c93e8129156a7f5f02c Mon Sep 17 00:00:00 2001 From: John Hiesey Date: Thu, 20 Dec 2018 21:42:07 -0800 Subject: [PATCH 03/39] Update Kademlia DHT spec --- kad-dht/README.md | 157 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 132 insertions(+), 25 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index baf7ae57c..51e3be271 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -55,9 +55,20 @@ than 3 inflight requests, at any given time. ## Record keys -Records in the DHT are keyed by CID [4]. There are intentions to move to -multihash [5] keys in the near future, as certain CID components like the -multicodec are redundant. +Records in the DHT are keyed by CID [4], roughly speaking. There are intentions +to move to multihash [5] keys in the future, as certain CID components like the +multicodec are redundant. This will be an incompatible change. + +The format of `key` varies depending on message type; however, in all cases +`dXOR(sha256(key1), sha256(key2))` see [Distance function](#distance-function-dxor) +is used as the distance between two keys. + +* For `GET_VALUE` and `PUT_VALUE`, `key` is an unstructured array of bytes, except +if it is being used to look up a public key for a `PeerId`, in which case it is +the ASCII string '/pk/' concatenated with the binary `PeerId`. +* For `ADD_PROVIDER` and `GET_PROVIDERS`, `key` is interpreted and validated as +a CID. +* For `FIND_NODE`, `key` is a binary `PeerId` ## Interfaces @@ -212,7 +223,17 @@ Nodes must keep track of which nodes advertise that they provide a given key These records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS` messages. -_WIP (jhiesey): explain what actually happens when `Provide()` is called._ +When `Provide(key)` is called, the DHT finds the closest peers to `key` using +the `FIND_NODE` RPC, and then sends a `ADD_PROVIDER` RPC with its own +`PeerInfo` to each of these peers. + +Each peer that receives the `ADD_PROVIDER` RPC should validate that the +received `PeerInfo` matches the sender's `peerID`, and if it does, that peer +must store a record in its datastore the received `PeerInfo` record. + +When a node receives a `GET_PROVIDERS` RPC, it must look up the requested +key in its datastore, and respond with any corresponding records in its +datastore, plus a list of closer peers in its routing table. For performance reasons, a node may prune expired advertisements only periodically, e.g. every hour. @@ -236,33 +257,119 @@ This process is repeated as many times per run as configuration parameter ## RPC messages -_WIP (jheisey): consider just dumping a nicely formatted and simplified -protobuf._ - See [protobuf definition](https://github.com/libp2p/go-libp2p-kad-dht/blob/master/pb/dht.proto) On any error, the entire stream is reset. This is probably not the behavior we want. -* `FIND_NODE(key bytes) -> (nodes PeerInfo[])` Finds the `ncp` (default:6) nodes -closest to `key` from the routing table and returns an array of `PeerInfo`s. If -a node with id equal to `key` is found, returns only the `PeerInfo` for that -node. -* `GET_VALUE(key bytes) -> (record Record, closerPeers PeerInfo[])` If `key` is -a public key (begins with `/pk/`) and the key is known, returns a `Record` -containing that key. Otherwise, returns the `Record` for the given key (if in -the datastore) and an array of `PeerInfo`s for closer peers. -* `PUT_VALUE(key bytes, value Record) -> ()` Validates `value` and, if it is -valid, stores it in the datastore. -* `GET_PROVIDERS(key bytes) -> (providerPeers PeerInfo[], closerPeers -PeerInfo[])` Verifies `key` is a valid CID. Returns `providerPeers` if in the -providers cache, and an array of closer peers. -* `ADD_PROVIDER(key, providerPeers PeerInfo[]) -> ()` Verifies `key` is a valid -CID. For each provider `PeerInfo` that matches the sender's id and contains one -or more multiaddrs, that provider info is added to the peerbook and the peer is -added as a provider for the CID. -* `PING() -> ()` Tests connectivity to destination node. Currently never sent. +All RPC messages conform to the following protobuf: +```protobuf +// Record represents a dht record that contains a value +// for a key value pair +message Record { + // The key that references this record + bytes key = 1; + + // The actual value this record is storing + bytes value = 2; + + // Note: These fields were removed from the Record message + // hash of the authors public key + //optional string author = 3; + // A PKI signature for the key+value+author + //optional bytes signature = 4; + + // Time the record was received, set by receiver + string timeReceived = 5; +}; + +message Message { + enum MessageType { + PUT_VALUE = 0; + GET_VALUE = 1; + ADD_PROVIDER = 2; + GET_PROVIDERS = 3; + FIND_NODE = 4; + PING = 5; + } + + enum ConnectionType { + // sender does not have a connection to peer, and no extra information (default) + NOT_CONNECTED = 0; + + // sender has a live connection to peer + CONNECTED = 1; + + // sender recently connected to peer + CAN_CONNECT = 2; + + // sender recently tried to connect to peer repeatedly but failed to connect + // ("try" here is loose, but this should signal "made strong effort, failed") + CANNOT_CONNECT = 3; + } + + message Peer { + // ID of a given peer. + bytes id = 1; + + // multiaddrs for a given peer + repeated bytes addrs = 2; + + // used to signal the sender's connection capabilities to the peer + ConnectionType connection = 3; + } + + // defines what type of message it is. + MessageType type = 1; + + // defines what coral cluster level this query/response belongs to. + // in case we want to implement coral's cluster rings in the future. + int32 clusterLevelRaw = 10; // NOT USED + + // Used to specify the key associated with this message. + // PUT_VALUE, GET_VALUE, ADD_PROVIDER, GET_PROVIDERS + bytes key = 2; + + // Used to return a value + // PUT_VALUE, GET_VALUE + Record record = 3; + + // Used to return peers closer to a key in a query + // GET_VALUE, GET_PROVIDERS, FIND_NODE + repeated Peer closerPeers = 8; + + // Used to return Providers + // GET_VALUE, ADD_PROVIDER, GET_PROVIDERS + repeated Peer providerPeers = 9; +} +``` + +Any time a relevant `Peer` record is encountered, the associated multiaddrs +are stored in the node's peerbook. + +These are the requirements for each `MessageType`: +* `FIND_NODE`: `key` must be set in the request. `closerPeers` is set in the +response; for an exact match exactly one `Peer` is returned; otherwise `ncp` +(default: 6) closest `Peer`s are returned. + +* `GET_VALUE`: `key` must be set in the request. If `key` is a public key +(begins with `/pk/`) and the key is known, the response has `record` set to +that key. Otherwise, `record` is set to the value for the given key (if found +in the datastore) and `closerPeers` is set to indicate closer peers. + +* `PUT_VALUE`: `key` and `record` must be set in the request. The target +node validates `record`, and if it is valid, it stores it in the datastore. + +* `GET_PROVIDERS`: `key` must be set in the request. The target node returns +the closest known `providerPeers` (if any) and the closest known `closerPeers`. + +* `ADD_PROVIDER`: `key` and `providerPeers` must be set in the request. The +target node verifies `key` is a valid CID, all `providerPeers` that +match the RPC sender's PeerID are recorded as providers. + +* `PING`: Target node responds with `PING`. Nodes should respond to this +message but it is currently never sent. # Appendix A: differences in implementations From cb971aa17b2ee6bb7f20df6daed2e7cd4c7e861d Mon Sep 17 00:00:00 2001 From: Steven Allen Date: Wed, 7 Aug 2019 18:06:28 -0700 Subject: [PATCH 04/39] dht: fix distance function --- kad-dht/README.md | 23 +++++++---------------- 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 51e3be271..871161e21 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -26,16 +26,6 @@ Code snippets use a Go-like syntax. * [Raúl Kripalani](https://github.com/raulk) * [John Hiesey](https://github.com/jhiesey) -## Distance function (dXOR) - -The libp2p Kad DHT uses the **XOR distance metric** as defined in the original -Kademlia paper [0]. Peer IDs are normalised through the SHA256 hash function. - -For recap, `dXOR(sha256(id1), sha256(id2))` is the number of common leftmost -bits between SHA256 of each peer IDs. The `dXOR` between us and a peer X -designates the bucket index that peer X will take up in the Kademlia routing -table. - ## Kademlia routing table The data structure backing this system is a k-bucket routing table, closely @@ -59,13 +49,14 @@ Records in the DHT are keyed by CID [4], roughly speaking. There are intentions to move to multihash [5] keys in the future, as certain CID components like the multicodec are redundant. This will be an incompatible change. -The format of `key` varies depending on message type; however, in all cases -`dXOR(sha256(key1), sha256(key2))` see [Distance function](#distance-function-dxor) -is used as the distance between two keys. +The format of `key` varies depending on message type; however, in all cases, the +distance between the two keys is `XOR(sha256(key1), sha256(key2))`. -* For `GET_VALUE` and `PUT_VALUE`, `key` is an unstructured array of bytes, except -if it is being used to look up a public key for a `PeerId`, in which case it is -the ASCII string '/pk/' concatenated with the binary `PeerId`. +* For `GET_VALUE` and `PUT_VALUE`, `key` is an unstructured array of bytes. + However, all nodes in the DHT will have rules to _validate_ whether or not a + value is valid for an associated key. For example, the default validator + accepts keys of the form `/pk/BINARY_PEER_ID` mapped the serialized public key + associated with the peer ID in question. * For `ADD_PROVIDER` and `GET_PROVIDERS`, `key` is interpreted and validated as a CID. * For `FIND_NODE`, `key` is a binary `PeerId` From 4d4395855544720d4cc96822cbacc978fad3cd7e Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 7 May 2021 16:10:42 +0200 Subject: [PATCH 05/39] kad-dht: Add document header --- kad-dht/README.md | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 871161e21..460626e89 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -1,5 +1,24 @@ # libp2p Kademlia DHT specification +| Lifecycle Stage | Maturity | Status | Latest Revision | +|-----------------|----------------|--------|-----------------| +| 3A | Recommendation | Active | r0, 2021-05-07 | + +Authors: [@raulk], [@jhiesey], [@mxinden] +Interest Group: + +[@raulk]: https://github.com/raulk +[@jhiesey]: https://github.com/jhiesey +[@mxinden]: https://github.com/mxinden + +See the [lifecycle document][lifecycle-spec] for context about maturity level and spec status. + +[lifecycle-spec]: https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md + +--- + +## Overview + The Kademlia Distributed Hash Table (DHT) subsystem in libp2p is a DHT implementation largely based on the Kademlia [0] whitepaper, augmented with notions from S/Kademlia [1], Coral [2] and mainlineDHT \[3\]. @@ -17,15 +36,6 @@ behaviour similar to Kademlia-based libraries. Code snippets use a Go-like syntax. -## Authors - -* Protocol Labs. - -## Editors - -* [Raúl Kripalani](https://github.com/raulk) -* [John Hiesey](https://github.com/jhiesey) - ## Kademlia routing table The data structure backing this system is a k-bucket routing table, closely From 55d32ff0cccd384c3e1805f80838a0866a0684b5 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 7 May 2021 17:03:06 +0200 Subject: [PATCH 06/39] kad-dht: Wrap lines at 80 and fix typos --- kad-dht/README.md | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 460626e89..37fd41331 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -11,7 +11,8 @@ Interest Group: [@jhiesey]: https://github.com/jhiesey [@mxinden]: https://github.com/mxinden -See the [lifecycle document][lifecycle-spec] for context about maturity level and spec status. +See the [lifecycle document][lifecycle-spec] for context about maturity level +and spec status. [lifecycle-spec]: https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md @@ -43,8 +44,8 @@ following the design outlined in the Kademlia paper [0]. The default value for `k` is 20, and the maximum bucket count matches the size of the SHA256 function, i.e. 256 buckets. -The routing table is unfolded lazily, starting with a single bucket a position 0 -(representing the most distant peers), and splitting it subsequently as closer +The routing table is unfolded lazily, starting with a single bucket at position +0 (representing the most distant peers), and splitting it subsequently as closer peers are found, and the capacity of the nearmost bucket is exceeded. ## Alpha concurrency factor (α) @@ -132,25 +133,34 @@ collaborating with one another. ### Algorithm -Let's assume we’re looking for key `K`. We first try to fetch the value from the local store. If found, and `Q == { 0, 1 }`, the search is complete. +Let's assume we’re looking for key `K`. We first try to fetch the value from the +local store. If found, and `Q == { 0, 1 }`, the search is complete. -Otherwise, the local result counts for one towards the search of `Q` values. We then enter an iterative network search. +Otherwise, the local result counts for one towards the search of `Q` values. We +then enter an iterative network search. We keep track of: * the number of values we've fetched (`cnt`). * the best value we've found (`best`), and which peers returned it (`Pb`) -* the set of peers we've already queried (`Pq`) and the set of next query candidates sorted by distance from `K` in ascending order (`Pn`). +* the set of peers we've already queried (`Pq`) and the set of next query + candidates sorted by distance from `K` in ascending order (`Pn`). * the set of peers with outdated values (`Po`). -**Initialization**: seed `Pn` with the `α` peers from our routing table we know are closest to `K`, based on the XOR distance function. +**Initialization**: seed `Pn` with the `α` peers from our routing table we know +are closest to `K`, based on the XOR distance function. **Then we loop:** *WIP (raulk): lookup timeout.* -1. If we have collected `Q` or more answers, we cancel outstanding requests, return `best`, and we notify the peers holding an outdated value (`Po`) of the best value we discovered, by sending `PUT_VALUE(K, best)` messages. _Return._ -2. Pick as many peers from the candidate peers (`Pn`) as the `α` concurrency factor allows. Send each a `GET_VALUE(K)` request, and mark it as _queried_ in `Pq`. +1. If we have collected `Q` or more answers, we cancel outstanding requests, + return `best`, and we notify the peers holding an outdated value (`Po`) of + the best value we discovered, by sending `PUT_VALUE(K, best)` messages. + _Return._ +2. Pick as many peers from the candidate peers (`Pn`) as the `α` concurrency + factor allows. Send each a `GET_VALUE(K)` request, and mark it as _queried_ + in `Pq`. 3. Upon a response: 1. If successful, and we receive a value: 1. If this is the first value we've seen, we store it in `best`, along @@ -172,9 +182,10 @@ We keep track of: When constructing a DHT node, it is possible to supply a record `Validator` object conforming to this interface: -``` // Validator is an interface that should be implemented by record -validators. type Validator interface { - +``` go +// Validator is an interface that should be implemented by record +// validators. +type Validator interface { // Validate validates the given record, returning an error if it's // invalid (e.g., expired, signed by the wrong key, etc.). Validate(key string, value []byte) error @@ -244,6 +255,7 @@ periodically, e.g. every hour. _WIP (raulk)._ ## Bootstrap process + The bootstrap process is responsible for keeping the routing table filled and healthy throughout time. It runs once on startup, then periodically with a configurable frequency (default: 5 minutes). From 3de65ae1050ba767ba8eb098c84415e67725a9a7 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 7 May 2021 17:28:48 +0200 Subject: [PATCH 07/39] kad-dht: Document to use one stream per request --- kad-dht/README.md | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 37fd41331..497202013 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -270,13 +270,17 @@ This process is repeated as many times per run as configuration parameter ## RPC messages -See [protobuf -definition](https://github.com/libp2p/go-libp2p-kad-dht/blob/master/pb/dht.proto) +Remote procedure calls are performed by: -On any error, the entire stream is reset. This is probably not the behavior we -want. +1. Opening a new stream. +2. Sending the RPC request message. +3. Listening for the RPC response message. +4. Closing the stream. + +On any error, the stream is reset. All RPC messages conform to the following protobuf: + ```protobuf // Record represents a dht record that contains a value // for a key value pair @@ -358,10 +362,8 @@ message Message { } ``` -Any time a relevant `Peer` record is encountered, the associated multiaddrs -are stored in the node's peerbook. - These are the requirements for each `MessageType`: + * `FIND_NODE`: `key` must be set in the request. `closerPeers` is set in the response; for an exact match exactly one `Peer` is returned; otherwise `ncp` (default: 6) closest `Peer`s are returned. @@ -384,7 +386,10 @@ match the RPC sender's PeerID are recorded as providers. * `PING`: Target node responds with `PING`. Nodes should respond to this message but it is currently never sent. -# Appendix A: differences in implementations +Note: Any time a relevant `Peer` record is encountered, the associated +multiaddrs are stored in the node's peerbook. + +## Appendix A: differences in implementations The `addProvider` handler behaves differently across implementations: * in js-libp2p-kad-dht, the sender is added as a provider unconditionally. @@ -393,7 +398,7 @@ The `addProvider` handler behaves differently across implementations: --- -# References +## References [0]: Maymounkov, P., & Mazières, D. (2002). Kademlia: A Peer-to-Peer Information System Based on the XOR Metric. In P. Druschel, F. Kaashoek, & A. Rowstron (Eds.), Peer-to-Peer Systems (pp. 53–65). Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-45748-8_5 From 6909f8a97685c3f60679dbe7c9e81a5ab411bcf7 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 10 May 2021 16:46:43 +0200 Subject: [PATCH 08/39] kad-dht: Reword validation description There is no need for in-process interface level consistency across libp2p Kademlia implementations. This commit reworks the entry validation specification, naming the `Validator` interface as one possible option. --- kad-dht/README.md | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 497202013..4b5a696a2 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -69,7 +69,7 @@ distance between the two keys is `XOR(sha256(key1), sha256(key2))`. accepts keys of the form `/pk/BINARY_PEER_ID` mapped the serialized public key associated with the peer ID in question. * For `ADD_PROVIDER` and `GET_PROVIDERS`, `key` is interpreted and validated as -a CID. + a CID. * For `FIND_NODE`, `key` is a binary `PeerId` ## Interfaces @@ -123,8 +123,8 @@ When looking up an entry in the DHT, the implementor should collect at least `Q` (quorum) responses from distinct nodes to check for consistency before returning an answer. -Should the responses be different, the `Validator.Select()` function is used to -resolve the conflict and select the _best_ result. +Should the responses be different, the implementation should use some validation +mechanism to resolve the conflict and select the _best_ result. **Entry correction.** Nodes that returned _worse_ records are updated via a direct `PUT_VALUE` RPC call when the lookup completes. Thus the DHT network @@ -165,8 +165,8 @@ are closest to `K`, based on the XOR distance function. 1. If successful, and we receive a value: 1. If this is the first value we've seen, we store it in `best`, along with the peer who sent it in `Pb`. - 2. Otherwise, we resolve the conflict by calling `Validator.Select(best, - new)`: + 2. Otherwise, we resolve the conflict by e.g. calling + `Validator.Select(best, new)`: 1. If the new value wins, store it in `best`, and mark all formerly “best" peers (`Pb`) as _outdated peers_ (`Po`). The current peer becomes the new best peer (`Pb`). @@ -179,8 +179,9 @@ are closest to `K`, based on the XOR distance function. ## Entry validation -When constructing a DHT node, it is possible to supply a record `Validator` -object conforming to this interface: +Implementations should validate DHT entries during retrieval and before storage +e.g. by allowing to supply a record `Validator` when constructing a DHT node. +Below is a sample interface of such a `Validator`: ``` go // Validator is an interface that should be implemented by record @@ -201,13 +202,13 @@ type Validator interface { `Validate()` is a pure function that reports the validity of a record. It may validate a cryptographic signature, or else. It is called on two occasions: -1. To validate incoming values in response to `GET_VALUE` calls. -2. To validate outgoing values before storing them in the network via - `PUT_VALUE` calls. +1. To validate values retrieved in a `GET_VALUE` query. +2. To validate values received in a `PUT_VALUE` query before storing them in the + local data store. Similarly, `Select()` is a pure function that returns the best record out of 2 or more candidates. It may use a sequence number, a timestamp, or other -heuristic to make the decision. +heuristic of the value to make the decision. ## Public key records @@ -220,8 +221,8 @@ DHT implementations may optimise public key lookups by providing a the key exists in the local peerstore. The lookup for public key entries is identical to a standard entry lookup, -except that a custom `Validator` strategy is applied. It checks that equality -`SHA256(value) == peerID` stands when: +except that a custom entry validation strategy is applied. It checks that +equality `SHA256(value) == peerID` stands when: 1. Receiving a response from a `GET_VALUE` lookup. 2. Storing a public key in the DHT via `PUT_VALUE`. From b3693d0a6894895c73a0e4fdff78237f360562a9 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Wed, 12 May 2021 16:04:17 +0200 Subject: [PATCH 09/39] kad-dht: Document FIND_NODE --- kad-dht/README.md | 65 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 52 insertions(+), 13 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 4b5a696a2..93a852ff4 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -37,18 +37,23 @@ behaviour similar to Kademlia-based libraries. Code snippets use a Go-like syntax. +## Replication parameter (`k`) + +The amount of replication is governed by the replication parameter `k`. The +default value for `k` is 20. + ## Kademlia routing table The data structure backing this system is a k-bucket routing table, closely -following the design outlined in the Kademlia paper [0]. The default value for -`k` is 20, and the maximum bucket count matches the size of the SHA256 function, -i.e. 256 buckets. +following the design outlined in the Kademlia paper [0]. The bucket size is +equal to the replication paramter `k`, and the maximum bucket count matches the +size of the SHA256 function, i.e. 256 buckets. The routing table is unfolded lazily, starting with a single bucket at position 0 (representing the most distant peers), and splitting it subsequently as closer peers are found, and the capacity of the nearmost bucket is exceeded. -## Alpha concurrency factor (α) +## Alpha concurrency parameter (`α`) The concurrency of node and value lookups are limited by parameter `α`, with a default value of 3. This implies that each lookup process can perform no more @@ -133,7 +138,11 @@ collaborating with one another. ### Algorithm -Let's assume we’re looking for key `K`. We first try to fetch the value from the +The below is one possible algorithm to lookup a value on the DHT. +Implementations may diverge from this base algorithm as long as they continue to +adhere to the wire format. + +Let's assume we’re looking for key `Key`. We first try to fetch the value from the local store. If found, and `Q == { 0, 1 }`, the search is complete. Otherwise, the local result counts for one towards the search of `Q` values. We @@ -144,22 +153,20 @@ We keep track of: * the number of values we've fetched (`cnt`). * the best value we've found (`best`), and which peers returned it (`Pb`) * the set of peers we've already queried (`Pq`) and the set of next query - candidates sorted by distance from `K` in ascending order (`Pn`). + candidates sorted by distance from `Key` in ascending order (`Pn`). * the set of peers with outdated values (`Po`). **Initialization**: seed `Pn` with the `α` peers from our routing table we know -are closest to `K`, based on the XOR distance function. +are closest to `Key`, based on the XOR distance function. **Then we loop:** -*WIP (raulk): lookup timeout.* - 1. If we have collected `Q` or more answers, we cancel outstanding requests, return `best`, and we notify the peers holding an outdated value (`Po`) of - the best value we discovered, by sending `PUT_VALUE(K, best)` messages. + the best value we discovered, by sending `PUT_VALUE(Key, best)` messages. _Return._ 2. Pick as many peers from the candidate peers (`Pn`) as the `α` concurrency - factor allows. Send each a `GET_VALUE(K)` request, and mark it as _queried_ + factor allows. Send each a `GET_VALUE(Key)` request, and mark it as _queried_ in `Pq`. 3. Upon a response: 1. If successful, and we receive a value: @@ -172,7 +179,7 @@ are closest to `K`, based on the XOR distance function. becomes the new best peer (`Pb`). 2. If the new value loses, we add the current peer to `Po`. 2. If successful without a value, the response will contain the closest - nodes the peer knows to the key `K`. Add them to the candidate list `Pn`, + nodes the peer knows to the key `Key`. Add them to the candidate list `Pn`, except for those that have already been queried. 3. If an error or timeout occurs, discard it. 4. Go to 1. @@ -253,7 +260,39 @@ periodically, e.g. every hour. ## Node lookups -_WIP (raulk)._ +The below is one possible algorithm to lookup a node closest to a given key on +the DHT. Implementations may diverge from this base algorithm as long as they +continue to adhere to the wire format. + +Let's assume we’re looking for nodes closest to key `Key`. We then enter an +iterative network search. + +We keep track of: + +* the set of peers we've already queried (`Pq`) and the set of next query + candidates sorted by distance from `Key` in ascending order (`Pn`). + +**Initialization**: seed `Pn` with the `α` peers from our routing table we know +are closest to `Key`, based on the XOR distance function. + +**Then we loop:** + +1. > The lookup terminates when the initiator has queried and gotten responses + from the k (see [#replication-parameter-k]) closest nodes it has seen. + + (See Kademlia paper [0].) + + The lookup might terminate early in case the local node queried all known + nodes, with the number of nodes being smaller than `k`. +2. Pick as many peers from the candidate peers (`Pn`) as the `α` concurrency + factor allows. Send each a `FIND_NODE(Key)` request, and mark it as _queried_ + in `Pq`. +3. Upon a response: + 2. If successful the response will contain the `k` closest nodes the peer + knows to the key `Key`. Add them to the candidate list `Pn`, except for + those that have already been queried. + 3. If an error or timeout occurs, discard it. +4. Go to 1. ## Bootstrap process From f05ee26ed4636a23f00ed86eeddf963ebb4de689 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Wed, 12 May 2021 17:30:52 +0200 Subject: [PATCH 10/39] kad-dht: Restructure by DHT operations --- kad-dht/README.md | 248 ++++++++++++++++++++-------------------------- 1 file changed, 109 insertions(+), 139 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 93a852ff4..7576992fa 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -37,12 +37,14 @@ behaviour similar to Kademlia-based libraries. Code snippets use a Go-like syntax. -## Replication parameter (`k`) +## Definitions + +### Replication parameter (`k`) The amount of replication is governed by the replication parameter `k`. The default value for `k` is 20. -## Kademlia routing table +### Kademlia routing table The data structure backing this system is a k-bucket routing table, closely following the design outlined in the Kademlia paper [0]. The bucket size is @@ -53,78 +55,96 @@ The routing table is unfolded lazily, starting with a single bucket at position 0 (representing the most distant peers), and splitting it subsequently as closer peers are found, and the capacity of the nearmost bucket is exceeded. -## Alpha concurrency parameter (`α`) +### Alpha concurrency parameter (`α`) The concurrency of node and value lookups are limited by parameter `α`, with a default value of 3. This implies that each lookup process can perform no more than 3 inflight requests, at any given time. -## Record keys +### Distance -Records in the DHT are keyed by CID [4], roughly speaking. There are intentions -to move to multihash [5] keys in the future, as certain CID components like the -multicodec are redundant. This will be an incompatible change. +In all cases, the distance between two keys is `XOR(sha256(key1), +sha256(key2))`. -The format of `key` varies depending on message type; however, in all cases, the -distance between the two keys is `XOR(sha256(key1), sha256(key2))`. +## DHT operations -* For `GET_VALUE` and `PUT_VALUE`, `key` is an unstructured array of bytes. - However, all nodes in the DHT will have rules to _validate_ whether or not a - value is valid for an associated key. For example, the default validator - accepts keys of the form `/pk/BINARY_PEER_ID` mapped the serialized public key - associated with the peer ID in question. -* For `ADD_PROVIDER` and `GET_PROVIDERS`, `key` is interpreted and validated as - a CID. -* For `FIND_NODE`, `key` is a binary `PeerId` +The libp2p Kademlia DHT offers the following types of routing operations: -## Interfaces +- **Peer routing** - _Finding_ the closest nodes to a given key (`FIND_NODE`). +- **Value routing** - _Putting_ a value to the nodes closest to the value's key + (`PUT_VALUE`) and _getting_ a value by its key from the nodes closest to that + key (`GET_VALUE`). +- **Content routing** - _Adding_ oneself to the list of providers for a given + key at the nodes closest to that key (`ADD_PROVIDER`) and _getting_ providers + for a given key from the nodes closest to that key (`GET_PROVIDERS`). -The libp2p Kad DHT implementation satisfies the routing interfaces: +In addition the libp2p Kademlia DHT offers the auxiliary _bootstrap_ operation. -```go -type Routing interface { - ContentRouting - PeerRouting - ValueStore +### Peer routing - // Kicks off the bootstrap process. - Bootstrap(context.Context) error -} +The below is one possible algorithm to find nodes closest to a given key on +the DHT. Implementations may diverge from this base algorithm as long as they +continue to adhere to the wire format. -// ContentRouting is used to find information about who has what content. -type ContentRouting interface { - // Provide adds the given CID to the content routing system. If 'true' is - // passed, it also announces it, otherwise it is just kept in the local - // accounting of which objects are being provided. - Provide(context.Context, cid.Cid, bool) error +Let's assume we’re looking for nodes closest to key `Key`. We then enter an +iterative network search. - // Search for peers who are able to provide a given key. - FindProvidersAsync(context.Context, cid.Cid, int) <-chan pstore.PeerInfo -} +We keep track of: -// PeerRouting is a way to find information about certain peers. -// -// This can be implemented by a simple lookup table, a tracking server, -// or even a DHT (like herein). -type PeerRouting interface { - // FindPeer searches for a peer with given ID, returns a pstore.PeerInfo - // with relevant addresses. - FindPeer(context.Context, peer.ID) (pstore.PeerInfo, error) -} +* the set of peers we've already queried (`Pq`) and the set of next query + candidates sorted by distance from `Key` in ascending order (`Pn`). -// ValueStore is a basic Put/Get interface. -type ValueStore interface { - // PutValue adds value corresponding to given Key. - PutValue(context.Context, string, []byte, ...ropts.Option) error +**Initialization**: seed `Pn` with the `α` peers from our routing table we know +are closest to `Key`, based on the XOR distance function. - // GetValue searches for the value corresponding to given Key. - GetValue(context.Context, string, ...ropts.Option) ([]byte, error) -} -``` +**Then we loop:** + +1. > The lookup terminates when the initiator has queried and gotten responses + from the k (see [#replication-parameter-k]) closest nodes it has seen. -## Value lookups + (See Kademlia paper [0].) -When looking up an entry in the DHT, the implementor should collect at least `Q` + The lookup might terminate early in case the local node queried all known + nodes, with the number of nodes being smaller than `k`. +2. Pick as many peers from the candidate peers (`Pn`) as the `α` concurrency + factor allows. Send each a `FIND_NODE(Key)` request, and mark it as _queried_ + in `Pq`. +3. Upon a response: + 2. If successful the response will contain the `k` closest nodes the peer + knows to the key `Key`. Add them to the candidate list `Pn`, except for + those that have already been queried. + 3. If an error or timeout occurs, discard it. +4. Go to 1. + +### Value routing + +Value routing can be used both to (1) _put_ and _get_ arbitrary values and (2) +to _put_ and _get_ the public keys of nodes. Node public keys are stored in +records under the `/pk` namespace. That is, the entry `/pk/` will store +the public key of peer `peerID`. + +DHT implementations may optimise public key lookups by providing a +`GetPublicKey(peer.ID) (ci.PubKey)` method, that, for example, first checks if +the key exists in the local peerstore. + +The lookup for public key values is identical to the lookup of arbitrary values, +except that a custom value validation strategy is applied. It checks that +equality `SHA256(value) == peerID` stands when: + +1. Receiving a response from a `GET_VALUE` lookup. +2. Storing a public key in the DHT via `PUT_VALUE`. + +The record is rejected if the validation fails. + +#### Putting values + +To _put_ a value the DHT finds the `k` closest peers to the key of the value +using the `FIND_NODE` RPC (see [peer routing section](#peer-routing)), and then sends +a `PUT_VALUE` RPC message with the record value to each of the `k` peers. + +#### Getting values + +When _gettting_ a value in the DHT, the implementor should collect at least `Q` (quorum) responses from distinct nodes to check for consistency before returning an answer. @@ -136,8 +156,6 @@ direct `PUT_VALUE` RPC call when the lookup completes. Thus the DHT network eventually converges to the best value for each record, as a result of nodes collaborating with one another. -### Algorithm - The below is one possible algorithm to lookup a value on the DHT. Implementations may diverge from this base algorithm as long as they continue to adhere to the wire format. @@ -154,7 +172,7 @@ We keep track of: * the best value we've found (`best`), and which peers returned it (`Pb`) * the set of peers we've already queried (`Pq`) and the set of next query candidates sorted by distance from `Key` in ascending order (`Pn`). -* the set of peers with outdated values (`Po`). +* the set of peers with outdated values (`Po`). **Initialization**: seed `Pn` with the `α` peers from our routing table we know are closest to `Key`, based on the XOR distance function. @@ -184,7 +202,7 @@ are closest to `Key`, based on the XOR distance function. 3. If an error or timeout occurs, discard it. 4. Go to 1. -## Entry validation +#### Entry validation Implementations should validate DHT entries during retrieval and before storage e.g. by allowing to supply a record `Validator` when constructing a DHT node. @@ -217,40 +235,27 @@ Similarly, `Select()` is a pure function that returns the best record out of 2 or more candidates. It may use a sequence number, a timestamp, or other heuristic of the value to make the decision. -## Public key records - -Apart from storing arbitrary values, the libp2p Kad DHT stores node public keys -in records under the `/pk` namespace. That is, the entry `/pk/` will -store the public key of peer `peerID`. - -DHT implementations may optimise public key lookups by providing a -`GetPublicKey(peer.ID) (ci.PubKey)` method, that, for example, first checks if -the key exists in the local peerstore. - -The lookup for public key entries is identical to a standard entry lookup, -except that a custom entry validation strategy is applied. It checks that -equality `SHA256(value) == peerID` stands when: - -1. Receiving a response from a `GET_VALUE` lookup. -2. Storing a public key in the DHT via `PUT_VALUE`. - -The record is rejected if the validation fails. - -## Provider records +### Content routing Nodes must keep track of which nodes advertise that they provide a given key (CID). These provider advertisements should expire, by default, after 24 hours. These records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS` messages. -When `Provide(key)` is called, the DHT finds the closest peers to `key` using -the `FIND_NODE` RPC, and then sends a `ADD_PROVIDER` RPC with its own -`PeerInfo` to each of these peers. +When the local node wants to indicate that it provides the value for a given +key, the DHT finds the closest peers to `key` using the `FIND_NODE` RPC (see +[peer routing section](#peer-routing)), and then sends a `ADD_PROVIDER` RPC with +its own `PeerInfo` to each of these peers. Each peer that receives the `ADD_PROVIDER` RPC should validate that the received `PeerInfo` matches the sender's `peerID`, and if it does, that peer must store a record in its datastore the received `PeerInfo` record. +_Getting_ the providers for a given key is done in the same way as _getting_ a +value for a given key (see [getting values section](#getting-values)) expect +that instead of using the `GET_VALUE` RPC message the `GET_PROVIDERS` RPC +message is used. + When a node receives a `GET_PROVIDERS` RPC, it must look up the requested key in its datastore, and respond with any corresponding records in its datastore, plus a list of closer peers in its routing table. @@ -258,51 +263,15 @@ datastore, plus a list of closer peers in its routing table. For performance reasons, a node may prune expired advertisements only periodically, e.g. every hour. -## Node lookups - -The below is one possible algorithm to lookup a node closest to a given key on -the DHT. Implementations may diverge from this base algorithm as long as they -continue to adhere to the wire format. - -Let's assume we’re looking for nodes closest to key `Key`. We then enter an -iterative network search. - -We keep track of: - -* the set of peers we've already queried (`Pq`) and the set of next query - candidates sorted by distance from `Key` in ascending order (`Pn`). - -**Initialization**: seed `Pn` with the `α` peers from our routing table we know -are closest to `Key`, based on the XOR distance function. - -**Then we loop:** - -1. > The lookup terminates when the initiator has queried and gotten responses - from the k (see [#replication-parameter-k]) closest nodes it has seen. - - (See Kademlia paper [0].) - - The lookup might terminate early in case the local node queried all known - nodes, with the number of nodes being smaller than `k`. -2. Pick as many peers from the candidate peers (`Pn`) as the `α` concurrency - factor allows. Send each a `FIND_NODE(Key)` request, and mark it as _queried_ - in `Pq`. -3. Upon a response: - 2. If successful the response will contain the `k` closest nodes the peer - knows to the key `Key`. Add them to the candidate list `Pn`, except for - those that have already been queried. - 3. If an error or timeout occurs, discard it. -4. Go to 1. - -## Bootstrap process +### Bootstrap process The bootstrap process is responsible for keeping the routing table filled and healthy throughout time. It runs once on startup, then periodically with a configurable frequency (default: 5 minutes). On every run, we generate a random node ID and we look it up via the process -defined in *Node lookups*. Peers encountered throughout the search are inserted -in the routing table, as per usual business. +defined in [*Node lookups*](#node-lookups). Peers encountered throughout the +search are inserted in the routing table, as per usual business. This process is repeated as many times per run as configuration parameter `QueryCount` (default: 1). Every repetition is subject to a `QueryTimeout` @@ -404,27 +373,30 @@ message Message { These are the requirements for each `MessageType`: -* `FIND_NODE`: `key` must be set in the request. `closerPeers` is set in the -response; for an exact match exactly one `Peer` is returned; otherwise `ncp` -(default: 6) closest `Peer`s are returned. +* `FIND_NODE`: In the request `key` must be set to the binary `PeerId` of the + node to be found. `closerPeers` is set in the response; for an exact match + exactly one `Peer` is returned; otherwise `k` closest `Peer`s are returned. -* `GET_VALUE`: `key` must be set in the request. If `key` is a public key -(begins with `/pk/`) and the key is known, the response has `record` set to -that key. Otherwise, `record` is set to the value for the given key (if found -in the datastore) and `closerPeers` is set to indicate closer peers. +* `GET_VALUE`: In the request `key` is an unstructured array of bytes. If `key` + is a public key (begins with `/pk/`) and the key is known, the response has + `record` set to that key. Otherwise, `record` is set to the value for the + given key (if found in the datastore) and `closerPeers` is set to the `k` + closest peers. -* `PUT_VALUE`: `key` and `record` must be set in the request. The target -node validates `record`, and if it is valid, it stores it in the datastore. +* `PUT_VALUE`: In the request `key` is an unstructured array of bytes. The + target node validates `record`, and if it is valid, it stores it in the + datastore. -* `GET_PROVIDERS`: `key` must be set in the request. The target node returns -the closest known `providerPeers` (if any) and the closest known `closerPeers`. +* `GET_PROVIDERS`: In the request `key` is set to a CID. The target node + returns the closest known `providerPeers` (if any) and the `k` closest known + `closerPeers`. -* `ADD_PROVIDER`: `key` and `providerPeers` must be set in the request. The -target node verifies `key` is a valid CID, all `providerPeers` that -match the RPC sender's PeerID are recorded as providers. +* `ADD_PROVIDER`: In the request `key` is set to a CID. The target node verifies + `key` is a valid CID, all `providerPeers` that match the RPC sender's PeerID + are recorded as providers. -* `PING`: Target node responds with `PING`. Nodes should respond to this -message but it is currently never sent. +* `PING`: Target node responds with `PING`. Nodes should respond to this message + but it is currently never sent. Note: Any time a relevant `Peer` record is encountered, the associated multiaddrs are stored in the node's peerbook. @@ -448,6 +420,4 @@ The `addProvider` handler behaves differently across implementations: [3]: [bep_0005.rst_post](http://bittorrent.org/beps/bep_0005.html) -[4]: [GitHub - ipld/cid: Self-describing content-addressed identifiers for distributed systems](https://github.com/ipld/cid) - [5]: [GitHub - multiformats/multihash: Self describing hashes - for future proofing](https://github.com/multiformats/multihash) From 6ae2551281df0a7c5f6dc7ec704f47cdffb4a84e Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 10:59:18 +0200 Subject: [PATCH 11/39] kad-dht: Allow stream reuse and document length prefix --- kad-dht/README.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/kad-dht/README.md b/kad-dht/README.md index 7576992fa..e2461e7de 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -288,6 +288,14 @@ Remote procedure calls are performed by: On any error, the stream is reset. +Implementations may choose to re-use streams by sending one or more RPC request +messages on a single outgoing stream before closing it. Implementations must +handle additional RPC request messages on an incoming stream. + +All RPC messages sent over a stream are prefixed with the message length in +bytes, encoded as an unsigned variable length integer as defined by the +[multiformats unsigned-varint spec][uvarint-spec]. + All RPC messages conform to the following protobuf: ```protobuf @@ -421,3 +429,5 @@ The `addProvider` handler behaves differently across implementations: [3]: [bep_0005.rst_post](http://bittorrent.org/beps/bep_0005.html) [5]: [GitHub - multiformats/multihash: Self describing hashes - for future proofing](https://github.com/multiformats/multihash) + +[uvarint-spec]: https://github.com/multiformats/unsigned-varint From 0126af14220d2935b6d0a173e65e3d73273f122a Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 11:07:36 +0200 Subject: [PATCH 12/39] kad-dht: Require closer peers even with value --- kad-dht/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index e2461e7de..918b3d0ba 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -196,9 +196,9 @@ are closest to `Key`, based on the XOR distance function. “best" peers (`Pb`) as _outdated peers_ (`Po`). The current peer becomes the new best peer (`Pb`). 2. If the new value loses, we add the current peer to `Po`. - 2. If successful without a value, the response will contain the closest - nodes the peer knows to the key `Key`. Add them to the candidate list `Pn`, - except for those that have already been queried. + 2. If successful with or without a value, the response will contain the + closest nodes the peer knows to the key `Key`. Add them to the candidate + list `Pn`, except for those that have already been queried. 3. If an error or timeout occurs, discard it. 4. Go to 1. From 008f1a8bace32e13b2213db10e820834199db026 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 11:35:41 +0200 Subject: [PATCH 13/39] kad-dht: Reword algorithm sections --- kad-dht/README.md | 41 ++++++++++++++++++++--------------------- 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 918b3d0ba..fc6be2d4d 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -89,15 +89,13 @@ continue to adhere to the wire format. Let's assume we’re looking for nodes closest to key `Key`. We then enter an iterative network search. -We keep track of: +We keep track of the set of peers we've already queried (`Pq`) and the set of +next query candidates sorted by distance from `Key` in ascending order (`Pn`). +At initialization `Pn` is seeded with the `α` peers from our routing table we +know are closest to `Key`, based on the XOR distance function (see [distance +definition](#distance)). -* the set of peers we've already queried (`Pq`) and the set of next query - candidates sorted by distance from `Key` in ascending order (`Pn`). - -**Initialization**: seed `Pn` with the `α` peers from our routing table we know -are closest to `Key`, based on the XOR distance function. - -**Then we loop:** +Then we loop: 1. > The lookup terminates when the initiator has queried and gotten responses from the k (see [#replication-parameter-k]) closest nodes it has seen. @@ -110,10 +108,10 @@ are closest to `Key`, based on the XOR distance function. factor allows. Send each a `FIND_NODE(Key)` request, and mark it as _queried_ in `Pq`. 3. Upon a response: - 2. If successful the response will contain the `k` closest nodes the peer + 1. If successful the response will contain the `k` closest nodes the peer knows to the key `Key`. Add them to the candidate list `Pn`, except for those that have already been queried. - 3. If an error or timeout occurs, discard it. + 2. If an error or timeout occurs, discard it. 4. Go to 1. ### Value routing @@ -138,9 +136,9 @@ The record is rejected if the validation fails. #### Putting values -To _put_ a value the DHT finds the `k` closest peers to the key of the value -using the `FIND_NODE` RPC (see [peer routing section](#peer-routing)), and then sends -a `PUT_VALUE` RPC message with the record value to each of the `k` peers. +To _put_ a value the DHT finds `k` or less closest peers to the key of the value +using the `FIND_NODE` RPC (see [peer routing section](#peer-routing)), and then +sends a `PUT_VALUE` RPC message with the record value to each of the peers. #### Getting values @@ -148,10 +146,11 @@ When _gettting_ a value in the DHT, the implementor should collect at least `Q` (quorum) responses from distinct nodes to check for consistency before returning an answer. -Should the responses be different, the implementation should use some validation -mechanism to resolve the conflict and select the _best_ result. +Entry validation: Should the responses from different peers diverge, the +implementation should use some validation mechanism to resolve the conflict and +select the _best_ result (see [entry validation section](#entry-validation)). -**Entry correction.** Nodes that returned _worse_ records are updated via a +Entry correction: Nodes that returned _worse_ records are updated via a direct `PUT_VALUE` RPC call when the lookup completes. Thus the DHT network eventually converges to the best value for each record, as a result of nodes collaborating with one another. @@ -174,10 +173,10 @@ We keep track of: candidates sorted by distance from `Key` in ascending order (`Pn`). * the set of peers with outdated values (`Po`). -**Initialization**: seed `Pn` with the `α` peers from our routing table we know +At initialization we seed `Pn` with the `α` peers from our routing table we know are closest to `Key`, based on the XOR distance function. -**Then we loop:** +Then we loop: 1. If we have collected `Q` or more answers, we cancel outstanding requests, return `best`, and we notify the peers holding an outdated value (`Po`) of @@ -243,13 +242,13 @@ These records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS` messages. When the local node wants to indicate that it provides the value for a given -key, the DHT finds the closest peers to `key` using the `FIND_NODE` RPC (see -[peer routing section](#peer-routing)), and then sends a `ADD_PROVIDER` RPC with +key, the DHT finds the closest peers to the key using the `FIND_NODE` RPC (see +[peer routing section](#peer-routing)), and then sends an `ADD_PROVIDER` RPC with its own `PeerInfo` to each of these peers. Each peer that receives the `ADD_PROVIDER` RPC should validate that the received `PeerInfo` matches the sender's `peerID`, and if it does, that peer -must store a record in its datastore the received `PeerInfo` record. +must store the `PeerInfo` in its datastore. _Getting_ the providers for a given key is done in the same way as _getting_ a value for a given key (see [getting values section](#getting-values)) expect From c21ca3a0f3110946e86362e4d67782cb356b6882 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 11:47:31 +0200 Subject: [PATCH 14/39] kad-dht: Rework references --- kad-dht/README.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index fc6be2d4d..ae034badf 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -22,7 +22,7 @@ and spec status. The Kademlia Distributed Hash Table (DHT) subsystem in libp2p is a DHT implementation largely based on the Kademlia [0] whitepaper, augmented with -notions from S/Kademlia [1], Coral [2] and mainlineDHT \[3\]. +notions from S/Kademlia [1], Coral [2] and the [BitTorrent DHT][bittorrent]. This specification assumes the reader has prior knowledge of those systems. So rather than explaining DHT mechanics from scratch, we focus on differential @@ -425,8 +425,6 @@ The `addProvider` handler behaves differently across implementations: [2]: Freedman, M. J., & Mazières, D. (2003). Sloppy Hashing and Self-Organizing Clusters. In IPTPS. Springer Berlin / Heidelberg. Retrieved from www.coralcdn.org/docs/coral-iptps03.ps -[3]: [bep_0005.rst_post](http://bittorrent.org/beps/bep_0005.html) - -[5]: [GitHub - multiformats/multihash: Self describing hashes - for future proofing](https://github.com/multiformats/multihash) +[bittorrent]: http://bittorrent.org/beps/bep_0005.html [uvarint-spec]: https://github.com/multiformats/unsigned-varint From f240a22be4d93f873d154533e99d393102f3377c Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 11:47:56 +0200 Subject: [PATCH 15/39] kad-dht: Remove appendix detailing difference of implementations --- kad-dht/README.md | 7 ------- 1 file changed, 7 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index ae034badf..d8949fc1b 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -408,13 +408,6 @@ These are the requirements for each `MessageType`: Note: Any time a relevant `Peer` record is encountered, the associated multiaddrs are stored in the node's peerbook. -## Appendix A: differences in implementations - -The `addProvider` handler behaves differently across implementations: - * in js-libp2p-kad-dht, the sender is added as a provider unconditionally. - * in go-libp2p-kad-dht, it is added once per instance of that peer in the - `providerPeers` array. - --- ## References From 6c1c2248dc02979367615d8ca55b952a0dd7bdcf Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 12:05:52 +0200 Subject: [PATCH 16/39] kad-dht: Document early GET_VALUE termination --- kad-dht/README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index d8949fc1b..7dd6ed579 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -178,10 +178,11 @@ are closest to `Key`, based on the XOR distance function. Then we loop: -1. If we have collected `Q` or more answers, we cancel outstanding requests, - return `best`, and we notify the peers holding an outdated value (`Po`) of - the best value we discovered, by sending `PUT_VALUE(Key, best)` messages. - _Return._ +1. If we have collected `Q` or more answers, we cancel outstanding requests and + return `best`. If there are no outstanding requests and `Pn` is empty we + terminate early and return `best`. In either case we notify the peers holding + an outdated value (`Po`) of the best value we discovered, by sending + `PUT_VALUE(Key, best)` messages. 2. Pick as many peers from the candidate peers (`Pn`) as the `α` concurrency factor allows. Send each a `GET_VALUE(Key)` request, and mark it as _queried_ in `Pq`. From defe08df0e6a684342412b135740ef54e6a33dae Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 12:15:28 +0200 Subject: [PATCH 17/39] kad-dht: Do not specify k-bucket split strategy (for now) --- kad-dht/README.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 7dd6ed579..1819f56aa 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -51,10 +51,6 @@ following the design outlined in the Kademlia paper [0]. The bucket size is equal to the replication paramter `k`, and the maximum bucket count matches the size of the SHA256 function, i.e. 256 buckets. -The routing table is unfolded lazily, starting with a single bucket at position -0 (representing the most distant peers), and splitting it subsequently as closer -peers are found, and the capacity of the nearmost bucket is exceeded. - ### Alpha concurrency parameter (`α`) The concurrency of node and value lookups are limited by parameter `α`, with a From e1442fa3f109944c8eda8368c781ef5935572806 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 14:55:24 +0200 Subject: [PATCH 18/39] kad-dht: Deprecate PING message type --- kad-dht/README.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 1819f56aa..1d15cd41c 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -399,8 +399,10 @@ These are the requirements for each `MessageType`: `key` is a valid CID, all `providerPeers` that match the RPC sender's PeerID are recorded as providers. -* `PING`: Target node responds with `PING`. Nodes should respond to this message - but it is currently never sent. +* `PING`: Deprecated message type replaced by the dedicated [ping + protocol][ping]. Implementations may still handle incoming `PING` requests for + backwards compatibility. Implementations must not actively send `PING` + requests. Note: Any time a relevant `Peer` record is encountered, the associated multiaddrs are stored in the node's peerbook. @@ -418,3 +420,5 @@ multiaddrs are stored in the node's peerbook. [bittorrent]: http://bittorrent.org/beps/bep_0005.html [uvarint-spec]: https://github.com/multiformats/unsigned-varint + +[ping]: https://github.com/libp2p/specs/issues/183 From 26dd2f3db71a4e04dead0bfe4f5aeed7f7657dc9 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 15:44:50 +0200 Subject: [PATCH 19/39] kad-dht: Document timeReceived formatted with RFC3339 --- kad-dht/README.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 1d15cd41c..bc62abb54 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -5,6 +5,7 @@ | 3A | Recommendation | Active | r0, 2021-05-07 | Authors: [@raulk], [@jhiesey], [@mxinden] + Interest Group: [@raulk]: https://github.com/raulk @@ -305,12 +306,14 @@ message Record { bytes value = 2; // Note: These fields were removed from the Record message - // hash of the authors public key - //optional string author = 3; + // + // Hash of the authors public key + // optional string author = 3; // A PKI signature for the key+value+author - //optional bytes signature = 4; + // optional bytes signature = 4; // Time the record was received, set by receiver + // Formatted according to https://datatracker.ietf.org/doc/html/rfc3339 string timeReceived = 5; }; From a9ec52376b0cd923f8f1c6a8e6919c10d763805f Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 15:52:09 +0200 Subject: [PATCH 20/39] kad-dht: Use peer ID instead of node ID in bootstrap --- kad-dht/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index bc62abb54..6e8096eee 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -266,7 +266,7 @@ The bootstrap process is responsible for keeping the routing table filled and healthy throughout time. It runs once on startup, then periodically with a configurable frequency (default: 5 minutes). -On every run, we generate a random node ID and we look it up via the process +On every run, we generate a random peer ID and we look it up via the process defined in [*Node lookups*](#node-lookups). Peers encountered throughout the search are inserted in the routing table, as per usual business. From 77168f90578f7fea3a4c8938aaac444ef2293c12 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 16:36:31 +0200 Subject: [PATCH 21/39] kad-dht: Fix peer routing link --- kad-dht/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 6e8096eee..fca5e50d0 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -267,7 +267,7 @@ healthy throughout time. It runs once on startup, then periodically with a configurable frequency (default: 5 minutes). On every run, we generate a random peer ID and we look it up via the process -defined in [*Node lookups*](#node-lookups). Peers encountered throughout the +defined in [peer routing](#peer-routing). Peers encountered throughout the search are inserted in the routing table, as per usual business. This process is repeated as many times per run as configuration parameter From dbe1ff7fd8372fadfaa47a0c26a2767d95c97cb8 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 16:39:24 +0200 Subject: [PATCH 22/39] README: Add kademlia to protocols index --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index f8dbd3634..65142ef7b 100644 --- a/README.md +++ b/README.md @@ -71,6 +71,7 @@ security, multiplexing, and other purposes. The protocols described below all use [protocol buffers](https://developers.google.com/protocol-buffers/docs/proto?hl=en) (aka protobuf) to define message schemas. Version `proto2` is used unless stated otherwise. - [identify][spec_identify] - Exchange keys and addresses with other peers +- [kademlia][spec_kademlia] - The Kademlia Distributed Hash Table (DHT) subsystem - [mplex][spec_mplex] - The friendly stream multiplexer - [plaintext][spec_plaintext] - An insecure transport for non-production usage - [pnet][spec_pnet] - Private networking in libp2p using pre-shared keys @@ -102,6 +103,7 @@ you feel an issue isn't the appropriate place for your topic, please join our [spec_lifecycle]: 00-framework-01-spec-lifecycle.md [spec_header]: 00-framework-02-document-header.md [spec_identify]: ./identify/README.md +[spec_kademlia]: ./kad-dht/README.md [spec_mplex]: ./mplex/README.md [spec_pnet]: ./pnet/Private-Networks-PSK-V1.md [spec_pubsub]: ./pubsub/README.md From aa7e8fc4b8ae28376d8f918805497bd9593ab231 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 14 May 2021 16:46:47 +0200 Subject: [PATCH 23/39] kad-dht: Fix protobuf indentation --- kad-dht/README.md | 136 +++++++++++++++++++++++----------------------- 1 file changed, 68 insertions(+), 68 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index fca5e50d0..716a19cf6 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -299,82 +299,82 @@ All RPC messages conform to the following protobuf: // Record represents a dht record that contains a value // for a key value pair message Record { - // The key that references this record - bytes key = 1; + // The key that references this record + bytes key = 1; - // The actual value this record is storing - bytes value = 2; + // The actual value this record is storing + bytes value = 2; - // Note: These fields were removed from the Record message + // Note: These fields were removed from the Record message // - // Hash of the authors public key - // optional string author = 3; - // A PKI signature for the key+value+author - // optional bytes signature = 4; + // Hash of the authors public key + // optional string author = 3; + // A PKI signature for the key+value+author + // optional bytes signature = 4; - // Time the record was received, set by receiver + // Time the record was received, set by receiver // Formatted according to https://datatracker.ietf.org/doc/html/rfc3339 - string timeReceived = 5; + string timeReceived = 5; }; message Message { - enum MessageType { - PUT_VALUE = 0; - GET_VALUE = 1; - ADD_PROVIDER = 2; - GET_PROVIDERS = 3; - FIND_NODE = 4; - PING = 5; - } - - enum ConnectionType { - // sender does not have a connection to peer, and no extra information (default) - NOT_CONNECTED = 0; - - // sender has a live connection to peer - CONNECTED = 1; - - // sender recently connected to peer - CAN_CONNECT = 2; - - // sender recently tried to connect to peer repeatedly but failed to connect - // ("try" here is loose, but this should signal "made strong effort, failed") - CANNOT_CONNECT = 3; - } - - message Peer { - // ID of a given peer. - bytes id = 1; - - // multiaddrs for a given peer - repeated bytes addrs = 2; - - // used to signal the sender's connection capabilities to the peer - ConnectionType connection = 3; - } - - // defines what type of message it is. - MessageType type = 1; - - // defines what coral cluster level this query/response belongs to. - // in case we want to implement coral's cluster rings in the future. - int32 clusterLevelRaw = 10; // NOT USED - - // Used to specify the key associated with this message. - // PUT_VALUE, GET_VALUE, ADD_PROVIDER, GET_PROVIDERS - bytes key = 2; - - // Used to return a value - // PUT_VALUE, GET_VALUE - Record record = 3; - - // Used to return peers closer to a key in a query - // GET_VALUE, GET_PROVIDERS, FIND_NODE - repeated Peer closerPeers = 8; - - // Used to return Providers - // GET_VALUE, ADD_PROVIDER, GET_PROVIDERS - repeated Peer providerPeers = 9; + enum MessageType { + PUT_VALUE = 0; + GET_VALUE = 1; + ADD_PROVIDER = 2; + GET_PROVIDERS = 3; + FIND_NODE = 4; + PING = 5; + } + + enum ConnectionType { + // sender does not have a connection to peer, and no extra information (default) + NOT_CONNECTED = 0; + + // sender has a live connection to peer + CONNECTED = 1; + + // sender recently connected to peer + CAN_CONNECT = 2; + + // sender recently tried to connect to peer repeatedly but failed to connect + // ("try" here is loose, but this should signal "made strong effort, failed") + CANNOT_CONNECT = 3; + } + + message Peer { + // ID of a given peer. + bytes id = 1; + + // multiaddrs for a given peer + repeated bytes addrs = 2; + + // used to signal the sender's connection capabilities to the peer + ConnectionType connection = 3; + } + + // defines what type of message it is. + MessageType type = 1; + + // defines what coral cluster level this query/response belongs to. + // in case we want to implement coral's cluster rings in the future. + int32 clusterLevelRaw = 10; // NOT USED + + // Used to specify the key associated with this message. + // PUT_VALUE, GET_VALUE, ADD_PROVIDER, GET_PROVIDERS + bytes key = 2; + + // Used to return a value + // PUT_VALUE, GET_VALUE + Record record = 3; + + // Used to return peers closer to a key in a query + // GET_VALUE, GET_PROVIDERS, FIND_NODE + repeated Peer closerPeers = 8; + + // Used to return Providers + // GET_VALUE, ADD_PROVIDER, GET_PROVIDERS + repeated Peer providerPeers = 9; } ``` From d742e2ebd907ec7077541b29c526deb94fc22deb Mon Sep 17 00:00:00 2001 From: Max Inden Date: Thu, 27 May 2021 19:33:14 +0200 Subject: [PATCH 24/39] kad-dht/README.md: Fix typo --- kad-dht/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 716a19cf6..69ac530d3 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -249,7 +249,7 @@ received `PeerInfo` matches the sender's `peerID`, and if it does, that peer must store the `PeerInfo` in its datastore. _Getting_ the providers for a given key is done in the same way as _getting_ a -value for a given key (see [getting values section](#getting-values)) expect +value for a given key (see [getting values section](#getting-values)) except that instead of using the `GET_VALUE` RPC message the `GET_PROVIDERS` RPC message is used. From 9355a8f4893563f43c495e6ee52deb5e7a5c1e1b Mon Sep 17 00:00:00 2001 From: Max Inden Date: Thu, 3 Jun 2021 12:10:15 +0200 Subject: [PATCH 25/39] kad-dht/README: Remove requirement on kbucket data structure --- kad-dht/README.md | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 69ac530d3..fc1dfbe03 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -45,12 +45,22 @@ Code snippets use a Go-like syntax. The amount of replication is governed by the replication parameter `k`. The default value for `k` is 20. +### Distance + +In all cases, the distance between two keys is `XOR(sha256(key1), +sha256(key2))`. + ### Kademlia routing table -The data structure backing this system is a k-bucket routing table, closely -following the design outlined in the Kademlia paper [0]. The bucket size is -equal to the replication paramter `k`, and the maximum bucket count matches the -size of the SHA256 function, i.e. 256 buckets. +An implementation of this specification must try to maintain `k` peers with +shared key prefix of length `L`, for every `L` in `[0..(keyspace-length - 1)]`, +in its routing table. Given the keyspace length of 256 through the sha256 hash +function, `L` can take values between 0 (inclusive) and 255 (inclusive). The +local node shares a prefix length of 256 with its own key only. + +Implementations may use any data structure to maintain their routing table. +Examples are the k-bucket data structure outlined in the Kademlia paper [0] or +XOR-tries (see [go-libp2p-xor]). ### Alpha concurrency parameter (`α`) @@ -58,11 +68,6 @@ The concurrency of node and value lookups are limited by parameter `α`, with a default value of 3. This implies that each lookup process can perform no more than 3 inflight requests, at any given time. -### Distance - -In all cases, the distance between two keys is `XOR(sha256(key1), -sha256(key2))`. - ## DHT operations The libp2p Kademlia DHT offers the following types of routing operations: @@ -425,3 +430,5 @@ multiaddrs are stored in the node's peerbook. [uvarint-spec]: https://github.com/multiformats/unsigned-varint [ping]: https://github.com/libp2p/specs/issues/183 + +[go-libp2p-xor]: https://github.com/libp2p/go-libp2p-xor From 072360fac3546b39b2ee03bbff5a1b725554457e Mon Sep 17 00:00:00 2001 From: Max Inden Date: Thu, 3 Jun 2021 14:46:35 +0200 Subject: [PATCH 26/39] kad-dht/README: Restructure and reword DHT operations section --- kad-dht/README.md | 42 ++++++++++++++++++++++++++++++------------ 1 file changed, 30 insertions(+), 12 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index fc1dfbe03..94c28d006 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -70,15 +70,29 @@ than 3 inflight requests, at any given time. ## DHT operations -The libp2p Kademlia DHT offers the following types of routing operations: +The libp2p Kademlia DHT offers the following types of operations: -- **Peer routing** - _Finding_ the closest nodes to a given key (`FIND_NODE`). -- **Value routing** - _Putting_ a value to the nodes closest to the value's key - (`PUT_VALUE`) and _getting_ a value by its key from the nodes closest to that - key (`GET_VALUE`). -- **Content routing** - _Adding_ oneself to the list of providers for a given - key at the nodes closest to that key (`ADD_PROVIDER`) and _getting_ providers - for a given key from the nodes closest to that key (`GET_PROVIDERS`). +- **Peer routing** + + - Finding the closest nodes to a given key via `FIND_NODE`. + +- **Value storage and retrieval** + + - Storing a value on the nodes closest to the value's key by looking up the + closest nodes via `FIND_NODE` and then putting the value to those nodes via + `PUT_VALUE`. + + - Getting a value by its key from the nodes closest to that key via + `GET_VALUE`. + +- **Content provider advertisement and discovery** + + - Adding oneself to the list of providers for a given key at the nodes closest + to that key by finding the closest nodes via `FIND_NODE` and then adding + oneself via `ADD_PROVIDER`. + + - Getting providers for a given key from the nodes closest to that key via + `GET_PROVIDERS`. In addition the libp2p Kademlia DHT offers the auxiliary _bootstrap_ operation. @@ -116,7 +130,7 @@ Then we loop: 2. If an error or timeout occurs, discard it. 4. Go to 1. -### Value routing +### Value storage and retrieval Value routing can be used both to (1) _put_ and _get_ arbitrary values and (2) to _put_ and _get_ the public keys of nodes. Node public keys are stored in @@ -136,13 +150,13 @@ equality `SHA256(value) == peerID` stands when: The record is rejected if the validation fails. -#### Putting values +#### Value storage To _put_ a value the DHT finds `k` or less closest peers to the key of the value using the `FIND_NODE` RPC (see [peer routing section](#peer-routing)), and then sends a `PUT_VALUE` RPC message with the record value to each of the peers. -#### Getting values +#### Value retrieval When _gettting_ a value in the DHT, the implementor should collect at least `Q` (quorum) responses from distinct nodes to check for consistency before returning @@ -237,13 +251,15 @@ Similarly, `Select()` is a pure function that returns the best record out of 2 or more candidates. It may use a sequence number, a timestamp, or other heuristic of the value to make the decision. -### Content routing +### Content provider advertisement and discovery Nodes must keep track of which nodes advertise that they provide a given key (CID). These provider advertisements should expire, by default, after 24 hours. These records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS` messages. +#### Content provider advertisement + When the local node wants to indicate that it provides the value for a given key, the DHT finds the closest peers to the key using the `FIND_NODE` RPC (see [peer routing section](#peer-routing)), and then sends an `ADD_PROVIDER` RPC with @@ -253,6 +269,8 @@ Each peer that receives the `ADD_PROVIDER` RPC should validate that the received `PeerInfo` matches the sender's `peerID`, and if it does, that peer must store the `PeerInfo` in its datastore. +#### Content provider discovery + _Getting_ the providers for a given key is done in the same way as _getting_ a value for a given key (see [getting values section](#getting-values)) except that instead of using the `GET_VALUE` RPC message the `GET_PROVIDERS` RPC From c4d4b536445b0635c88fc52348464c776cdc2bc3 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Thu, 3 Jun 2021 14:54:46 +0200 Subject: [PATCH 27/39] kad-dht/README: Seed with k instead of alpha peers --- kad-dht/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 94c28d006..c57e9e872 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -107,7 +107,7 @@ iterative network search. We keep track of the set of peers we've already queried (`Pq`) and the set of next query candidates sorted by distance from `Key` in ascending order (`Pn`). -At initialization `Pn` is seeded with the `α` peers from our routing table we +At initialization `Pn` is seeded with the `k` peers from our routing table we know are closest to `Key`, based on the XOR distance function (see [distance definition](#distance)). From b07409173c9efbde4b31deb87b3c30b17c0c7379 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Wed, 9 Jun 2021 11:16:39 +0200 Subject: [PATCH 28/39] kad-dht/README: Require algorithms to make progress towards target key --- kad-dht/README.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index c57e9e872..16bcc672d 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -98,9 +98,10 @@ In addition the libp2p Kademlia DHT offers the auxiliary _bootstrap_ operation. ### Peer routing -The below is one possible algorithm to find nodes closest to a given key on -the DHT. Implementations may diverge from this base algorithm as long as they -continue to adhere to the wire format. +The below is one possible algorithm to find nodes closest to a given key on the +DHT. Implementations may diverge from this base algorithm as long as they adhere +to the wire format and make progress towards the target key. + Let's assume we’re looking for nodes closest to key `Key`. We then enter an iterative network search. @@ -172,8 +173,8 @@ eventually converges to the best value for each record, as a result of nodes collaborating with one another. The below is one possible algorithm to lookup a value on the DHT. -Implementations may diverge from this base algorithm as long as they continue to -adhere to the wire format. +Implementations may diverge from this base algorithm as long as they adhere to +the wire format and make progress towards the target key. Let's assume we’re looking for key `Key`. We first try to fetch the value from the local store. If found, and `Q == { 0, 1 }`, the search is complete. From a065aacf219181c6d576140df1aa9ad86f1fcc29 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 25 Jun 2021 11:17:38 +0200 Subject: [PATCH 29/39] kad-dht/README: Remove `/pk` special namespace --- kad-dht/README.md | 26 +++----------------------- 1 file changed, 3 insertions(+), 23 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 16bcc672d..69354c4fd 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -133,24 +133,6 @@ Then we loop: ### Value storage and retrieval -Value routing can be used both to (1) _put_ and _get_ arbitrary values and (2) -to _put_ and _get_ the public keys of nodes. Node public keys are stored in -records under the `/pk` namespace. That is, the entry `/pk/` will store -the public key of peer `peerID`. - -DHT implementations may optimise public key lookups by providing a -`GetPublicKey(peer.ID) (ci.PubKey)` method, that, for example, first checks if -the key exists in the local peerstore. - -The lookup for public key values is identical to the lookup of arbitrary values, -except that a custom value validation strategy is applied. It checks that -equality `SHA256(value) == peerID` stands when: - -1. Receiving a response from a `GET_VALUE` lookup. -2. Storing a public key in the DHT via `PUT_VALUE`. - -The record is rejected if the validation fails. - #### Value storage To _put_ a value the DHT finds `k` or less closest peers to the key of the value @@ -408,11 +390,9 @@ These are the requirements for each `MessageType`: node to be found. `closerPeers` is set in the response; for an exact match exactly one `Peer` is returned; otherwise `k` closest `Peer`s are returned. -* `GET_VALUE`: In the request `key` is an unstructured array of bytes. If `key` - is a public key (begins with `/pk/`) and the key is known, the response has - `record` set to that key. Otherwise, `record` is set to the value for the - given key (if found in the datastore) and `closerPeers` is set to the `k` - closest peers. +* `GET_VALUE`: In the request `key` is an unstructured array of bytes. `record` + is set to the value for the given key (if found in the datastore) and + `closerPeers` is set to the `k` closest peers. * `PUT_VALUE`: In the request `key` is an unstructured array of bytes. The target node validates `record`, and if it is valid, it stores it in the From 20b3b73032c0becf11a33affcade45bcb4dea79c Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 25 Jun 2021 11:24:39 +0200 Subject: [PATCH 30/39] kad-dht/README: Replicate record to closest peers without it --- kad-dht/README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 69354c4fd..f60d19324 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -180,8 +180,9 @@ Then we loop: 1. If we have collected `Q` or more answers, we cancel outstanding requests and return `best`. If there are no outstanding requests and `Pn` is empty we terminate early and return `best`. In either case we notify the peers holding - an outdated value (`Po`) of the best value we discovered, by sending - `PUT_VALUE(Key, best)` messages. + an outdated value (`Po`) of the best value we discovered, or holding no value + for the given key, even though being among the `k` closest peers to the key, + by sending `PUT_VALUE(Key, best)` messages. 2. Pick as many peers from the candidate peers (`Pn`) as the `α` concurrency factor allows. Send each a `GET_VALUE(Key)` request, and mark it as _queried_ in `Pq`. From 3e13846c28e78a97b8fc8d0b55b8cd5bcc1d4aef Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 25 Jun 2021 11:26:40 +0200 Subject: [PATCH 31/39] kad-dht/README: Demote validate purity to `should` --- kad-dht/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index f60d19324..c22aced2c 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -224,8 +224,8 @@ type Validator interface { } ``` -`Validate()` is a pure function that reports the validity of a record. It may -validate a cryptographic signature, or else. It is called on two occasions: +`Validate()` should be a pure function that reports the validity of a record. It +may validate a cryptographic signature, or else. It is called on two occasions: 1. To validate values retrieved in a `GET_VALUE` query. 2. To validate values received in a `PUT_VALUE` query before storing them in the From 6ec65b543f6545922d368a4ec5bd73fd25388dd9 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 25 Jun 2021 11:48:31 +0200 Subject: [PATCH 32/39] kad-dht/README: Do not require storing provider addresses --- kad-dht/README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index c22aced2c..ae7b540f3 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -249,9 +249,11 @@ key, the DHT finds the closest peers to the key using the `FIND_NODE` RPC (see [peer routing section](#peer-routing)), and then sends an `ADD_PROVIDER` RPC with its own `PeerInfo` to each of these peers. -Each peer that receives the `ADD_PROVIDER` RPC should validate that the -received `PeerInfo` matches the sender's `peerID`, and if it does, that peer -must store the `PeerInfo` in its datastore. +Each peer that receives the `ADD_PROVIDER` RPC should validate that the received +`PeerInfo` matches the sender's `peerID`, and if it does, that peer should store +the `PeerInfo` in its datastore. Implementations may choose to not store the +addresses of the providing peer e.g. to reduce the amount of required storage or +to prevent storing potentially outdated address information. #### Content provider discovery From 1dcb2184d7c5f4af2d9079d08bce2ad05b414ee9 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 25 Jun 2021 11:51:24 +0200 Subject: [PATCH 33/39] kad-dht/README: Remove periodic record pruning section --- kad-dht/README.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index ae7b540f3..879ee7838 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -266,9 +266,6 @@ When a node receives a `GET_PROVIDERS` RPC, it must look up the requested key in its datastore, and respond with any corresponding records in its datastore, plus a list of closer peers in its routing table. -For performance reasons, a node may prune expired advertisements only -periodically, e.g. every hour. - ### Bootstrap process The bootstrap process is responsible for keeping the routing table filled and From c755a419228709a8b17809a3c4f983ac3d26659b Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 25 Jun 2021 12:00:31 +0200 Subject: [PATCH 34/39] kad-dht/README: Include bootstrap lookup for oneself --- kad-dht/README.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 879ee7838..94f064cad 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -276,9 +276,12 @@ On every run, we generate a random peer ID and we look it up via the process defined in [peer routing](#peer-routing). Peers encountered throughout the search are inserted in the routing table, as per usual business. -This process is repeated as many times per run as configuration parameter -`QueryCount` (default: 1). Every repetition is subject to a `QueryTimeout` -(default: 10 seconds), which upon firing, aborts the run. +This is repeated as many times per run as configuration parameter `QueryCount` +(default: 1). In addition, to improve awareness of nodes close to oneself, +implementations should include a lookup for their own peer ID. + +Every repetition is subject to a `QueryTimeout` (default: 10 seconds), which +upon firing, aborts the run. ## RPC messages From e9c18bd3ed61af7f1c109ffb284abef3f6ea1000 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 25 Jun 2021 16:32:56 +0200 Subject: [PATCH 35/39] kad-dht/README: Make k value recommended instead of default --- kad-dht/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 94f064cad..53438017e 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -43,7 +43,7 @@ Code snippets use a Go-like syntax. ### Replication parameter (`k`) The amount of replication is governed by the replication parameter `k`. The -default value for `k` is 20. +recommended value for `k` is 20. ### Distance From dab454963595e670ae4d254d125cab4b1f3fa5ab Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 25 Jun 2021 16:57:33 +0200 Subject: [PATCH 36/39] kad-dht/README: Always return k closest peers with FIND_NODE --- kad-dht/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 53438017e..76b98d06d 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -390,8 +390,8 @@ message Message { These are the requirements for each `MessageType`: * `FIND_NODE`: In the request `key` must be set to the binary `PeerId` of the - node to be found. `closerPeers` is set in the response; for an exact match - exactly one `Peer` is returned; otherwise `k` closest `Peer`s are returned. + node to be found. In the response `closerPeers` is set to the `k` closest + `Peer`s. * `GET_VALUE`: In the request `key` is an unstructured array of bytes. `record` is set to the value for the given key (if found in the datastore) and From 324f9155e57af15bca9c373ae823bf8f7710f810 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 25 Jun 2021 17:08:06 +0200 Subject: [PATCH 37/39] kad-dht/README: Extend on reasoning for quorums --- kad-dht/README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 76b98d06d..3bca2dcf6 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -141,9 +141,11 @@ sends a `PUT_VALUE` RPC message with the record value to each of the peers. #### Value retrieval -When _gettting_ a value in the DHT, the implementor should collect at least `Q` -(quorum) responses from distinct nodes to check for consistency before returning -an answer. +When _getting_ a value from the DHT, implementions may use a mechanism like +quorums to define confidence in the values found on the DHT, put differently a +mechanism to determine when a query is _finished_. E.g. with quorums one would +collect at least `Q` (quorum) responses from distinct nodes to check for +consistency before returning an answer. Entry validation: Should the responses from different peers diverge, the implementation should use some validation mechanism to resolve the conflict and From dbd17a25fa6d812121b47a2af3a6245c794b9994 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 25 Jun 2021 17:09:33 +0200 Subject: [PATCH 38/39] kad-dht/README: Stress republishing to close nodes once more --- kad-dht/README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 3bca2dcf6..12bb5c5a9 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -151,10 +151,11 @@ Entry validation: Should the responses from different peers diverge, the implementation should use some validation mechanism to resolve the conflict and select the _best_ result (see [entry validation section](#entry-validation)). -Entry correction: Nodes that returned _worse_ records are updated via a -direct `PUT_VALUE` RPC call when the lookup completes. Thus the DHT network -eventually converges to the best value for each record, as a result of nodes -collaborating with one another. +Entry correction: Nodes that returned _worse_ records and nodes that returned no +record but where among the closest to the key, are updated via a direct +`PUT_VALUE` RPC call when the lookup completes. Thus the DHT network eventually +converges to the best value for each record, as a result of nodes collaborating +with one another. The below is one possible algorithm to lookup a value on the DHT. Implementations may diverge from this base algorithm as long as they adhere to From 3e6f8f5e9a98b7078fb9ae7020179aa7d2c86947 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 28 Jun 2021 10:43:59 +0200 Subject: [PATCH 39/39] kad-dht/README: Add disclaimer for bootstrap process --- kad-dht/README.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 12bb5c5a9..db098a15a 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -272,12 +272,16 @@ datastore, plus a list of closer peers in its routing table. ### Bootstrap process The bootstrap process is responsible for keeping the routing table filled and -healthy throughout time. It runs once on startup, then periodically with a -configurable frequency (default: 5 minutes). - -On every run, we generate a random peer ID and we look it up via the process -defined in [peer routing](#peer-routing). Peers encountered throughout the -search are inserted in the routing table, as per usual business. +healthy throughout time. The below is one possible algorithm to bootstrap. +Implementations may diverge from this base algorithm as long as they adhere to +the wire format and keep their routing table up-to-date, especially with peers +closest to themselves. + +The process runs once on startup, then periodically with a configurable +frequency (default: 5 minutes). On every run, we generate a random peer ID and +we look it up via the process defined in [peer routing](#peer-routing). Peers +encountered throughout the search are inserted in the routing table, as per +usual business. This is repeated as many times per run as configuration parameter `QueryCount` (default: 1). In addition, to improve awareness of nodes close to oneself,