From 91bc1a87999d65c26cc9c42b6c1dd69d2b14145b Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Fri, 20 Jan 2023 17:57:31 +0100 Subject: [PATCH 01/55] draft double-hash-dht IPIP --- IPIP/0000-double-hash-dht.md | 259 +++++++++++++++++++++++++++++++++++ 1 file changed, 259 insertions(+) create mode 100644 IPIP/0000-double-hash-dht.md diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md new file mode 100644 index 00000000..5c5bdbc7 --- /dev/null +++ b/IPIP/0000-double-hash-dht.md @@ -0,0 +1,259 @@ +# IPIP 0000: Double Hash DHT + + + +- Start Date: 2023-01-18 +- Related Resources: + - [Specs in Notion](https://pl-strflt.notion.site/Double-Hashing-for-Privacy-ff44e3156ce040579289996fec9af609) + - Implementation: https://github.com/ChainSafe/go-libp2p-kad-dht + - https://github.com/ipfs/specs/pull/334 + - https://github.com/ipfs/specs/issues/345 + +## Summary + + +/TODO +This is the suggested template for new IPIPs. + +## Motivation + +IPFS is currently lacking of many privacy protections. One of its principal weaknesses currently lies in the lack of privacy protections for the DHT content routing subsystem. Currently in the IPFS DHT, neither readers (clients retrieving content) nor writers (hosts storing and distributing content) have much privacy with regard to content they consume or publish. It is trivial for a DHT server node to associate the requester's identity with the accessed content during the routing process. A curious DHT server node, can request the same CIDs to find out what content other users are consuming. Improving privacy in the IPFS DHT has been a strong request from the community for some time. + +The changes described in this document introduce a DHT privacy upgrade boosting the reader’s privacy. It will prevent DHT tracking as described above, and add Provider Records Authentication. The proposed modifications also add a slight Writer Privacy improvement as a side effect. + +## Detailed design + +### Definitions + + +- **`CID`** is the IPFS [Content IDentifier](https://github.com/multiformats/cid) +- **`MH`** is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the digest of a hash function over some content. +- **`HASH2`** is defined as `SHA256(bytes("CR_DOUBLEHASH") || MH)`. It represents the location of the Kademlia keyspace for the Provider Record associated with `CID`. +- **Content Provider** is the node storing some content, and advertising it to the DHT. +- **DHT Servers** are nodes running the IPFS public DHT. In this documents, DHT Servers mostly refer to the DHT Servers storing the Provider Records associated with specific `CID`s, and not the DHT Servers helping routing lookup requests to the right keyspace location. +- **Client** is an IPFS client looking up a content identified by an already known `CID`. +- **Publish Process** is the process of the Content Provider communicating to the DHT Servers that it provides some content identified by `CID`. +- **Lookup Process** is the process of the Client retreiving the content identified by `CID`. +- **`PeerID`** s define stable [peer identities](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md). The `PeerID` is derived from the node's cryptographic public key. +- **`multiaddrs`** are the [network addresses](https://github.com/libp2p/specs/tree/master/addressing) associated with a `PeerID`. It represents the location(s) of the peer. +- **`KeyPrefix`** is defined as a prefix of lenght `l` bits of `HASH2`. +- **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use it to encrypt the data sent to the Client during the lookup process. +- **`TS`** is the Timestamp (unix timestamp) when the Content Provider published the content. +- **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. +- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce $AESGCM_{MH}(CPPeerID, RandomNonce)$. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec) of the encryption algorithm used (AESGCM), the bytes array of the encrypted payload, and the Nonce. +- **`Signature`** is the signature of the `EncPeerID` encrypted payload (not including the varint nor the nonce) and `TS` using the Content Provider's private key, either with ed25519 or rsa signature algorithm, depending on the keys of the Content Provider. +- **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists on the following fields: [`EncPeerID`, `TS`, `Signature`]. +- **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a nested dictionary/map: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. There is only one single correct `ServerKey` for each `HASH2`. However, any peer can forge a valid Publish request (with invalid `EncPeerID` but valid `Signature`) undetected by the DHT Server. The DHT server isn't able to distinguish which `ServerKey` is correct as it doesn't have the knowledge of `MH`, hence it has to keep both and serve both upon request for `HASH2`. + +**Magic Values** +- bytes("CR_DOUBLEHASH") +- bytes("CR_SERVERKEY") +- AESGCM varint +- Max number of Provider Records returned by a DHT Server for a single request: `128` + +### Current DHT + +The following process describes the event of a client looking up a CID in the IPFS DHT: +1. Client computes `Hash(MH)` (`MH` is the MultiHash included in the CID). +2. Client looks for the closest peers to `Hash(MH)` in XOR distance in its Routing Table. +3. Client sends a DHT lookup request for `CID` to these DHT servers. +4. Upon receiving the request, the DHT servers search if there is an entry for `MH` in their Provider Store. If yes, go to 10. Else continue. +5. DHT servers compute `Hash(MH)`. +6. DHT servers find the 20 closest peers to `Hash(HM)` in XOR distance in their Routing Table. +7. DHT servers return the 20 `peerids` and `multiaddrs` of these peers to Client. +8. Client sends a DHT lookup request for `CID` to the closest peers in XOR distance to `Hash(MH)` that it received. +9. Go to 4. +10. The DHT servers storing the Provider Record(s) associated with `MH` send them to Client. (Currently, if a Provider Record has been published less than 30 min before being requested, the DHT servers also send the `multiaddresses` of the Content Provider to Client). +11. If the response from the DHT server doesn't include the `multiaddrs` associated with the Content Providers' `peerid`s, Client performs a DHT `FindPeer` request to find the `multiaddrs` of the returned `peerid`s. +12. Client sends a Bitswap request for `CID` to the Content Provider (known `peerid` and `multiaddrs`). +13. Content Provider sends the requested content back to Client. + +### Overall design + +**Publish Process** +1. Content Provider wants to publish some content with identifier `CID`. +2. Content Provider computes $HASH2\leftarrow{}SHA256(bytes("CR\_DOUBLEHASH") || MH)$ (`MH` is the MultiHash included in the CID). +3. Content Provider starts a DHT lookup request for the 20 closest `peerid`s in XOR distance to `HASH2`. +4. Content Provider encrypts its own `peerid` (`CPPeerID`) with `MH`, using AES-GCM. $EncPeerID\leftarrow{}AESGCM_{MH}(CPPeerID)$ +5. Content Provider takes the current timestamp `TS`. +6. Content Provider signs `EncPeerID` and `TS` using its private key. $Signature\leftarrow{}Sign_{privkey}(EncPeerID || TS)$ +7. Content Provider computes $ServerKey\leftarrow{}SHA256(bytes("CR\_SERVERKEY") || MH)$. +8. Once the lookup request has returned the 20 closest peers, Content Provider sends a Publish request to these DHT servers. The Publish request contains [`HASH2`, `EncPeerID`, `TS`, `Signature`, `ServerKey`]. +9. Each DHT server verifies `Signature` against the `peerid` of the Content Provider used to open the libp2p connection. $Verify(Signature, CPPeerID, EncPeerID || TS)$. It verifies that `TS` is _recent enough_. If invalid, send an error to the client. +10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `peerid` of the Content Provider. If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. +11. Each DHT server confirms to Content Provider that the Provider Record has been successfully added. +12. The proces is over once Content Provider has received 20 confirmations. + +**Lookup Process** +1. Client computes $HASH2=SHA256(bytes("CR\_DOUBLEHASH") || MH)$ (`MH` is the MultiHash included in the CID). +2. Client selects a prefix of `HASH2`, $KeyPrefix\leftarrow{}HASH2[:l]$ for a defined `l` (see [`l` selection](#prefix-length-selection)). +2. Client finds the closest `peerid`s to `HASH2` in XOR distance in its Routing Table. +3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. +4. The DHT servers find the 20 closest `peerid`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `peerid`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. +5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. +6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: $Enc_{ServerKey}(EncPeerID, TS, Signature, multiaddrs)$, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable). DHT Servers can decide to put a maximal limit of returned Provider Record per request. If too many `HASH2` are matching `KeyPrefix`, they select randomly 128 matching provider records per request, and send a flag to Client to signal that the limit was reached. +7. The DHT servers send `message` to Client. +8. Client computes $ServerKey\leftarrow{}SHA256(bytes("CR\_SERVERKEY") || MH)$. +9. Client tries to decrypt all returned encrypted payloads using `ServerKey`. If at least one encrypted payload can be decrypted, go to 12. +10. Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. +11. Go to 4. +12. For each decrypted payload, Client decrypts $CPPeerID\leftarrow{}Dec_{MH}(EncPeerID)$. +13. Client verifies that `Signature` verifies with `CPPeerID`: $Verify(Signature, CPPeerID, EncPeerID || TS)$ +14. Client checks that `TS` is still valid. +15. If none of the decrypted payloads is valid, go to 4. +16. If the decrypted payload doesn't include the `multiaddrs` associated with `CPPeerID`, Client performs a DHT `FindPeer` request to find the `multiaddrs` associated with `CPPeerID`. +17. Client sends a Bitswap request for `CID` to the Content Provider (known `CPPeerID` and `multiaddrs`). +18. Content Provider sends the requested content back to Client. + + +### Prefix length selection + +The goal of DHT prefix requests is to provide [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) to content lookup, in addition to the pseudonimity gained from double hashing. Each DHT prefix lookup query returns an expected number of `k` Provider Records matching `KeyPrefix`, with `k` being a system parameter. The user should be able to define a custom `k` from the configuration files, according to their privacy needs. The default value is `k = 8`. + +The prefix `l` is derived from `k` and the number of CIDs published to the DHT: $l \leftarrow{} log_2(\frac{\#CIDs}{k})$. However, the total number of CIDs published to the DHT can be hard to approximate, and the initial `l` value can be determined by approximation and dichotomy. At the first startup, the node looks up for random keys starting with a `l = 26`. Then, by dichotomy it adapts `l` so that a lookup for a prefix of length `l` matches on average ~`k` Provider Records. + +Each node keeps track of the number of `HASH2` matching the last `KeyPrefix` requested in the last 128 lookups. `a` is defined as the average number of matches for the last 128 requests. At any point in time, if $a \gt 2\times k$, then `l` should increase (`l = l + 1`), and if $a \lt \frac{k}{2}$, then `l` should decrease (`l = l - 1`). On node shutdown, `a` is saved on disk, allowing a quick restart with an accurate `l` value. + +Note that DHT Servers can set an upperbound on the number of Provider Records they serve for each lookup request. So a too small `l` may result in not discovering the target Provider Record. + +**Prefix magic numbers** +- `k`-anonymity privacy parameter, by default `k = 8` +- Size of moving average of number of Provider Records matching a prefix: `128` +- Initial prefix length: `26`. There are currently ~850M distinct CIDs published in the DHT ([source](https://pl-strflt.notion.site/2022-09-20-Hydras-Analysis-5db53b6af3e04a46aaf7a776e65ae97d)). $log_2(\frac{850M}{8})=26.663$. As the number of CIDs in the network grows exponentially, the prefix length is expected to decrease linearly for a constant `k`. + +### _Closest_ keys to a key prefix + + + +Computing the XOR distance between two binary bitstrings of different lengths isn't possible. Hence finding the N closest keys to a key prefix in the Kademlia keyspace doesn't make sense. We can however find the keys matching the prefix (e.g `prefix == key[:l]` for $key \in \{0, 1\}^{256}, prefix \in \{0, 1\}^{l}, l \leq 256$), and the keys _close_ from matching the prefix. Randomness is used as tie breaker. + +The following pseudo-code defines the algorithm to find `N` keys matching or _close_ from matching a prefix. The main idea is to truncate the leaves of the Kademlia trie to the length of the prefix `l`. If `M` keys match prefix, for $M \ge N$, then `N` keys must be picked at random among the `M` candidates. If `M` keys match prefix, for $M \lt N$, we must still find `Q = N - M` keys. We iterate on the truncated Kademlia leaves of depth `l` ordered by XOR distance to `prefix`, starting from the closest. Supposing there are `P` keys in the truncated Kademlia leaf, and that we are missing `Q` keys, if $P \ge Q$, we select `Q` keys at random among the `P` candidates, otherwise, if $P \lt Q$ we take the `P` keys, set `Q = Q - P` and iterate on the following leaf until we find `N` keys. + +``` +func closest_to_match(prefix, N, all_keys) { + selected_keys = [] + l = len(prefix) // len(prefix) if the bit length of the prefix + + // iterate on all prefixes of length l from closest to furthest from 'prefix' + for counter = 0; len(selected_keys) < N && counter < 2**l; counter += 1 { + + leaf = prefix XOR binary(counter, l) // binary(x, l) gives the binary representation of a number x, on l bits + + // get all keys matching to the prefix 'leaf' + matching_keys = find_matching_keys(leaf, all_keys) + + // add at most (N-len(selected_keys)) to selected_keys + if len(matching_keys) <= N - len(selected_keys) { + selected_keys += matching_keys + } else { + random_selection = select_N_random(matching_keys, N - len(selected_keys)) + selected_keys += random_selection + } + } + return selected_keys +} +``` + + +## Test fixtures + + + + +## Design rationale + +### Provider Store + +The data structure of the DHT Servers' Provider Store is a nested dictionary/map whose structure is: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. + +The same `HASH2` always produces the same `ServerKey`, as both `HASH2` and `ServerKey` result in a deterministic hash operation on `MH` prepended with a constant prefix. However, a misbehaving node could publish an advertisement for `HASH2` while ignoring `MH`, and forge a random `ServerKey`. The DHT server not knowing `MH` cannot determine which `ServerKey` is the one associated with `HASH2`, and hence need to keep all different `ServerKey`s. However, the number of forged `ServerKey`s is expected to be small as the Client aren't able to decrypt payload encrypted with a forged `ServerKey`, and detect that the Provider Record isn't legitimate. The only reason a misbehaving peer would want to publish forged `ServerKey`s is to exhaust the storage resources of a specific target DHT Server. + +Content can be provider by multiple Content Providers, hence `HASH2` -> `ServerKey` points to potentially multiple `CPPeerID`s, each Content Provider having its own Provider Record. As the `CPPeerID` is obtained from the open libp2p connection, we assume that it is impossible to impersonate another `CPPeerID`. Each Content Provider can have a single Provider Record for each `HASH2`. When a Content Provider republishes a Provider Record, the DHT Server only keeps the valid Provider Record whose `TS` is the largest value. DHT Servers drop all Provider Records from published by the same `CPPeerID` with the same `HASH2` but multiple different `ServerKey`s. A well behaving node can compute the right `ServerKey` and doesn't try to exhaust the storage resources of the DHT Server. Only a misbehaving node forges invalid `ServerKey`s, and if multiple `ServerKey`s are associated with the same (`HASH2`, `CPPeerID`) it implies that at least one of the two `ServerKey` is incorrect. + +### Cryptographic algorithms + +**SHA256** + +**AESGCM** + + +### User benefit + + + +### Reader Privacy + +### Writer Privacy + +### Provider Record Authenticity + +### Provider Records Enumeration + +Easier monitoring of the DHT, random key query + +### Better Kademlia Routing Table Refresh + +Get rid of 456 KB in the IPFS source code https://github.com/libp2p/go-libp2p-kbucket/blob/master/bucket_prefixmap.go + +## Compatibility + +Breaking change + + +## Security + +Threat Model (or it should be in a distinct section) + +DOS (sending the multiaddrs of the target peer for every served provider record) can be solved in the future with signed peer records. + + +## Alternatives + +This approach is a first fix to the DHT (low hanging fruit). Other alternative to add privacy in the DHT include Mixnets and ephemeral peerids. + +Alternatives for migration: +- slow breaking change (give enough time so that only a _small_ number of participants break) +- DHT duplication +- Universal DHT (WIP). + + + +## Open Questions + +- Is it wise to encrypt the `CPPeerID` using `MH` directly? It would be possible to derive another identifier from `MH` (such as `Hash("SOME_CONSTANT" || MH)`). `MH` is the master identifier of the content, hence if it is revealed all other identifers can trivially be found. However, it is computationnaly impossible to recover `MH` from `Hash("SOME_CONSTANT" || MH)`. +- It may be fine to use `TS` as nonce, it spares bytes on the wire. However, if two Content Providers publish the same content at the same time (`TS` either in seconds or milliseconds), then the DHT Server may be able to forge a valid Provider Records for itself. + +## Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From e1ed413471a8e92be2902bbd4e1259c2ccc746fe Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Mon, 23 Jan 2023 10:48:15 +0100 Subject: [PATCH 02/55] fixing Github markdown --- IPIP/0000-double-hash-dht.md | 42 ++++++++++++++++++++---------------- 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 5c5bdbc7..ffd80673 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -7,7 +7,7 @@ submit your IPIP, please use number 0000 and an abbreviated title in the filenam - Start Date: 2023-01-18 - Related Resources: - [Specs in Notion](https://pl-strflt.notion.site/Double-Hashing-for-Privacy-ff44e3156ce040579289996fec9af609) - - Implementation: https://github.com/ChainSafe/go-libp2p-kad-dht + - [WIP Implementation](https://github.com/ChainSafe/go-libp2p-kad-dht) - https://github.com/ipfs/specs/pull/334 - https://github.com/ipfs/specs/issues/345 @@ -35,14 +35,15 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **DHT Servers** are nodes running the IPFS public DHT. In this documents, DHT Servers mostly refer to the DHT Servers storing the Provider Records associated with specific `CID`s, and not the DHT Servers helping routing lookup requests to the right keyspace location. - **Client** is an IPFS client looking up a content identified by an already known `CID`. - **Publish Process** is the process of the Content Provider communicating to the DHT Servers that it provides some content identified by `CID`. -- **Lookup Process** is the process of the Client retreiving the content identified by `CID`. +- **Lookup Process** is the proces +s of the Client retreiving the content identified by `CID`. - **`PeerID`** s define stable [peer identities](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md). The `PeerID` is derived from the node's cryptographic public key. - **`multiaddrs`** are the [network addresses](https://github.com/libp2p/specs/tree/master/addressing) associated with a `PeerID`. It represents the location(s) of the peer. - **`KeyPrefix`** is defined as a prefix of lenght `l` bits of `HASH2`. - **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use it to encrypt the data sent to the Client during the lookup process. - **`TS`** is the Timestamp (unix timestamp) when the Content Provider published the content. - **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. -- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce $AESGCM_{MH}(CPPeerID, RandomNonce)$. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec) of the encryption algorithm used (AESGCM), the bytes array of the encrypted payload, and the Nonce. +- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, CPPeerID || RandomNonce)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec) of the encryption algorithm used (AESGCM), the bytes array of the encrypted payload, and the Nonce. - **`Signature`** is the signature of the `EncPeerID` encrypted payload (not including the varint nor the nonce) and `TS` using the Content Provider's private key, either with ed25519 or rsa signature algorithm, depending on the keys of the Content Provider. - **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists on the following fields: [`EncPeerID`, `TS`, `Signature`]. - **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a nested dictionary/map: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. There is only one single correct `ServerKey` for each `HASH2`. However, any peer can forge a valid Publish request (with invalid `EncPeerID` but valid `Signature`) undetected by the DHT Server. The DHT server isn't able to distinguish which `ServerKey` is correct as it doesn't have the knowledge of `MH`, hence it has to keep both and serve both upon request for `HASH2`. @@ -74,33 +75,33 @@ The following process describes the event of a client looking up a CID in the IP **Publish Process** 1. Content Provider wants to publish some content with identifier `CID`. -2. Content Provider computes $HASH2\leftarrow{}SHA256(bytes("CR\_DOUBLEHASH") || MH)$ (`MH` is the MultiHash included in the CID). +2. Content Provider computes `HASH2`$\leftarrow{}$`SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). 3. Content Provider starts a DHT lookup request for the 20 closest `peerid`s in XOR distance to `HASH2`. -4. Content Provider encrypts its own `peerid` (`CPPeerID`) with `MH`, using AES-GCM. $EncPeerID\leftarrow{}AESGCM_{MH}(CPPeerID)$ +4. Content Provider encrypts its own `peerid` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = AESGCM(MH, CPPeerID || Nonce)` 5. Content Provider takes the current timestamp `TS`. -6. Content Provider signs `EncPeerID` and `TS` using its private key. $Signature\leftarrow{}Sign_{privkey}(EncPeerID || TS)$ -7. Content Provider computes $ServerKey\leftarrow{}SHA256(bytes("CR\_SERVERKEY") || MH)$. +6. Content Provider signs `EncPeerID` and `TS` using its private key. `Signature = Sign(privkey, EncPeerID || TS)` +7. Content Provider computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. 8. Once the lookup request has returned the 20 closest peers, Content Provider sends a Publish request to these DHT servers. The Publish request contains [`HASH2`, `EncPeerID`, `TS`, `Signature`, `ServerKey`]. -9. Each DHT server verifies `Signature` against the `peerid` of the Content Provider used to open the libp2p connection. $Verify(Signature, CPPeerID, EncPeerID || TS)$. It verifies that `TS` is _recent enough_. If invalid, send an error to the client. +9. Each DHT server verifies `Signature` against the `peerid` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is _recent enough_. If invalid, send an error to the client. 10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `peerid` of the Content Provider. If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. 11. Each DHT server confirms to Content Provider that the Provider Record has been successfully added. 12. The proces is over once Content Provider has received 20 confirmations. **Lookup Process** -1. Client computes $HASH2=SHA256(bytes("CR\_DOUBLEHASH") || MH)$ (`MH` is the MultiHash included in the CID). -2. Client selects a prefix of `HASH2`, $KeyPrefix\leftarrow{}HASH2[:l]$ for a defined `l` (see [`l` selection](#prefix-length-selection)). +1. Client computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). +2. Client selects a prefix of `HASH2`, `KeyPrefix = HASH2[:l]` for a defined `l` (see [`l` selection](#prefix-length-selection)). 2. Client finds the closest `peerid`s to `HASH2` in XOR distance in its Routing Table. 3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. 4. The DHT servers find the 20 closest `peerid`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `peerid`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. 5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: $Enc_{ServerKey}(EncPeerID, TS, Signature, multiaddrs)$, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable). DHT Servers can decide to put a maximal limit of returned Provider Record per request. If too many `HASH2` are matching `KeyPrefix`, they select randomly 128 matching provider records per request, and send a flag to Client to signal that the limit was reached. +6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `Enc(ServerKey, EncPeerID || TS || Signature || multiaddrs)`, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable). DHT Servers can decide to put a maximal limit of returned Provider Record per request. If too many `HASH2` are matching `KeyPrefix`, they select randomly 128 matching provider records per request, and send a flag to Client to signal that the limit was reached. 7. The DHT servers send `message` to Client. -8. Client computes $ServerKey\leftarrow{}SHA256(bytes("CR\_SERVERKEY") || MH)$. +8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. 9. Client tries to decrypt all returned encrypted payloads using `ServerKey`. If at least one encrypted payload can be decrypted, go to 12. 10. Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. 11. Go to 4. -12. For each decrypted payload, Client decrypts $CPPeerID\leftarrow{}Dec_{MH}(EncPeerID)$. -13. Client verifies that `Signature` verifies with `CPPeerID`: $Verify(Signature, CPPeerID, EncPeerID || TS)$ +12. For each decrypted payload, Client decrypts `CPPeerID = Dec(MH, EncPeerID)`. +13. Client verifies that `Signature` verifies with `CPPeerID`: `Verify(CPPeerID, Signature, EncPeerID || TS)`. 14. Client checks that `TS` is still valid. 15. If none of the decrypted payloads is valid, go to 4. 16. If the decrypted payload doesn't include the `multiaddrs` associated with `CPPeerID`, Client performs a DHT `FindPeer` request to find the `multiaddrs` associated with `CPPeerID`. @@ -120,9 +121,9 @@ summary of changes. When adding new specification files, list all of them. --> ### Prefix length selection -The goal of DHT prefix requests is to provide [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) to content lookup, in addition to the pseudonimity gained from double hashing. Each DHT prefix lookup query returns an expected number of `k` Provider Records matching `KeyPrefix`, with `k` being a system parameter. The user should be able to define a custom `k` from the configuration files, according to their privacy needs. The default value is `k = 8`. +The goal of DHT prefix requests is to provide [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) to content lookup, in addition to the pseudonimity gained from double hashing. Each DHT prefix lookup query returns an expected number of `k` Provider Records matching `KeyPrefix`, with `k` being a system parameter. The user should be able to define a custom `k` from the configuration files, according to their privacy needs. The default value `k = 8` is discussed in [Design rationale](#reader-privacy). -The prefix `l` is derived from `k` and the number of CIDs published to the DHT: $l \leftarrow{} log_2(\frac{\#CIDs}{k})$. However, the total number of CIDs published to the DHT can be hard to approximate, and the initial `l` value can be determined by approximation and dichotomy. At the first startup, the node looks up for random keys starting with a `l = 26`. Then, by dichotomy it adapts `l` so that a lookup for a prefix of length `l` matches on average ~`k` Provider Records. +The prefix `l` is derived from `k` and the number of CIDs published to the DHT: $l \leftarrow{} log_2(\frac{\\#CIDs}{k})$. However, the total number of CIDs published to the DHT can be hard to approximate, and the initial `l` value can be determined by approximation and dichotomy. At the first startup, the node looks up for random keys starting with a `l = 26`. Then, by dichotomy it adapts `l` so that a lookup for a prefix of length `l` matches on average ~`k` Provider Records. Each node keeps track of the number of `HASH2` matching the last `KeyPrefix` requested in the last 128 lookups. `a` is defined as the average number of matches for the last 128 requests. At any point in time, if $a \gt 2\times k$, then `l` should increase (`l = l + 1`), and if $a \lt \frac{k}{2}$, then `l` should decrease (`l = l - 1`). On node shutdown, `a` is saved on disk, allowing a quick restart with an accurate `l` value. @@ -139,7 +140,7 @@ Note that DHT Servers can set an upperbound on the number of Provider Records th Computing the XOR distance between two binary bitstrings of different lengths isn't possible. Hence finding the N closest keys to a key prefix in the Kademlia keyspace doesn't make sense. We can however find the keys matching the prefix (e.g `prefix == key[:l]` for $key \in \{0, 1\}^{256}, prefix \in \{0, 1\}^{l}, l \leq 256$), and the keys _close_ from matching the prefix. Randomness is used as tie breaker. -The following pseudo-code defines the algorithm to find `N` keys matching or _close_ from matching a prefix. The main idea is to truncate the leaves of the Kademlia trie to the length of the prefix `l`. If `M` keys match prefix, for $M \ge N$, then `N` keys must be picked at random among the `M` candidates. If `M` keys match prefix, for $M \lt N$, we must still find `Q = N - M` keys. We iterate on the truncated Kademlia leaves of depth `l` ordered by XOR distance to `prefix`, starting from the closest. Supposing there are `P` keys in the truncated Kademlia leaf, and that we are missing `Q` keys, if $P \ge Q$, we select `Q` keys at random among the `P` candidates, otherwise, if $P \lt Q$ we take the `P` keys, set `Q = Q - P` and iterate on the following leaf until we find `N` keys. +The following pseudo-code defines the algorithm to find `N` keys matching or _close_ from matching a prefix. The main idea is to truncate the leaves of the Kademlia trie to the length of the prefix `l`. If `M` keys match prefix, for $M \ge N$, then `N` keys must be picked at random among the `M` candidates. If `M` keys match prefix, for $M \lt N$, we must still find `Q = N - M` keys. We iterate on the truncated Kademlia leaves of depth `l` ordered by XOR distance to `prefix`, starting from the closest. Supposing there are `P` keys in the current truncated Kademlia leaf, and that we are missing `Q` keys, if $P \ge Q$, we select `Q` keys at random among the `P` candidates, otherwise, if $P \lt Q$ we take the `P` keys, set `Q = Q - P` and iterate on the following leaf until we find `N` keys. ``` func closest_to_match(prefix, N, all_keys) { @@ -149,7 +150,8 @@ func closest_to_match(prefix, N, all_keys) { // iterate on all prefixes of length l from closest to furthest from 'prefix' for counter = 0; len(selected_keys) < N && counter < 2**l; counter += 1 { - leaf = prefix XOR binary(counter, l) // binary(x, l) gives the binary representation of a number x, on l bits + leaf = prefix XOR binary(counter, l) + // binary(x, l) gives the binary representation of a number x, on l bits // get all keys matching to the prefix 'leaf' matching_keys = find_matching_keys(leaf, all_keys) @@ -208,6 +210,10 @@ How will end users benefit from this work? ### Reader Privacy +**`k`-anonymity** + +Default parameter selection: `k = 8` + ### Writer Privacy ### Provider Record Authenticity From ad5b1acf46dfe6c5c816c4f3ecc8c1c4ff9a3337 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Mon, 23 Jan 2023 14:18:59 +0100 Subject: [PATCH 03/55] added cryptographic algorithms rationale --- IPIP/0000-double-hash-dht.md | 44 +++++++++++++++++++++++++----------- 1 file changed, 31 insertions(+), 13 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index ffd80673..5945a328 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -3,7 +3,7 @@ - +![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) - Start Date: 2023-01-18 - Related Resources: - [Specs in Notion](https://pl-strflt.notion.site/Double-Hashing-for-Privacy-ff44e3156ce040579289996fec9af609) @@ -15,7 +15,6 @@ submit your IPIP, please use number 0000 and an abbreviated title in the filenam /TODO -This is the suggested template for new IPIPs. ## Motivation @@ -71,13 +70,13 @@ The following process describes the event of a client looking up a CID in the IP 12. Client sends a Bitswap request for `CID` to the Content Provider (known `peerid` and `multiaddrs`). 13. Content Provider sends the requested content back to Client. -### Overall design +### Double Hash DHT design **Publish Process** 1. Content Provider wants to publish some content with identifier `CID`. 2. Content Provider computes `HASH2`$\leftarrow{}$`SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). 3. Content Provider starts a DHT lookup request for the 20 closest `peerid`s in XOR distance to `HASH2`. -4. Content Provider encrypts its own `peerid` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = AESGCM(MH, CPPeerID || Nonce)` +4. Content Provider encrypts its own `peerid` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = varint || Nonce || AESGCM(MH, CPPeerID || Nonce)` 5. Content Provider takes the current timestamp `TS`. 6. Content Provider signs `EncPeerID` and `TS` using its private key. `Signature = Sign(privkey, EncPeerID || TS)` 7. Content Provider computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. @@ -136,8 +135,6 @@ Note that DHT Servers can set an upperbound on the number of Provider Records th ### _Closest_ keys to a key prefix - - Computing the XOR distance between two binary bitstrings of different lengths isn't possible. Hence finding the N closest keys to a key prefix in the Kademlia keyspace doesn't make sense. We can however find the keys matching the prefix (e.g `prefix == key[:l]` for $key \in \{0, 1\}^{256}, prefix \in \{0, 1\}^{l}, l \leq 256$), and the keys _close_ from matching the prefix. Randomness is used as tie breaker. The following pseudo-code defines the algorithm to find `N` keys matching or _close_ from matching a prefix. The main idea is to truncate the leaves of the Kademlia trie to the length of the prefix `l`. If `M` keys match prefix, for $M \ge N$, then `N` keys must be picked at random among the `M` candidates. If `M` keys match prefix, for $M \lt N$, we must still find `Q = N - M` keys. We iterate on the truncated Kademlia leaves of depth `l` ordered by XOR distance to `prefix`, starting from the closest. Supposing there are `P` keys in the current truncated Kademlia leaf, and that we are missing `Q` keys, if $P \ge Q$, we select `Q` keys at random among the `P` candidates, otherwise, if $P \lt Q$ we take the `P` keys, set `Q = Q - P` and iterate on the following leaf until we find `N` keys. @@ -181,19 +178,39 @@ file already includes this information. ## Design rationale -### Provider Store - -The data structure of the DHT Servers' Provider Store is a nested dictionary/map whose structure is: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. +### Cryptographic algorithms -The same `HASH2` always produces the same `ServerKey`, as both `HASH2` and `ServerKey` result in a deterministic hash operation on `MH` prepended with a constant prefix. However, a misbehaving node could publish an advertisement for `HASH2` while ignoring `MH`, and forge a random `ServerKey`. The DHT server not knowing `MH` cannot determine which `ServerKey` is the one associated with `HASH2`, and hence need to keep all different `ServerKey`s. However, the number of forged `ServerKey`s is expected to be small as the Client aren't able to decrypt payload encrypted with a forged `ServerKey`, and detect that the Provider Record isn't legitimate. The only reason a misbehaving peer would want to publish forged `ServerKey`s is to exhaust the storage resources of a specific target DHT Server. +**SHA256** -Content can be provider by multiple Content Providers, hence `HASH2` -> `ServerKey` points to potentially multiple `CPPeerID`s, each Content Provider having its own Provider Record. As the `CPPeerID` is obtained from the open libp2p connection, we assume that it is impossible to impersonate another `CPPeerID`. Each Content Provider can have a single Provider Record for each `HASH2`. When a Content Provider republishes a Provider Record, the DHT Server only keeps the valid Provider Record whose `TS` is the largest value. DHT Servers drop all Provider Records from published by the same `CPPeerID` with the same `HASH2` but multiple different `ServerKey`s. A well behaving node can compute the right `ServerKey` and doesn't try to exhaust the storage resources of the DHT Server. Only a misbehaving node forges invalid `ServerKey`s, and if multiple `ServerKey`s are associated with the same (`HASH2`, `CPPeerID`) it implies that at least one of the two `ServerKey` is incorrect. +SHA256 is the algorithm currently in use in IPFS to generate 256-bits digests used as Kademlia identifiers. Note that SHA256 refers to the algorithm of [SHA2](https://en.wikipedia.org/wiki/SHA-2) algorithm with a 256 bits digest size. -### Cryptographic algorithms +A future change of Cryptographic Hash Function will require a _DHT Migration_ as the Provider Records _location_ in the Kademlia keyspace will change, for they are defined by the Hash Function. It means that all Provider Records must be published using both the new and the old hash function for the transition period. We want to avoid performing theses migrations as much as possible, but we must be ready for it as it is likely to happen in the lifespan of IPFS. -**SHA256** +Changing the Hash function used to derive `ServerKey` requires the DHT Server to support multiple Provider Records indexed by a different `ServerKey` for the same `HASH2` for the migration period. **AESGCM** + +[AESGCM](https://en.wikipedia.org/wiki/Galois/Counter_Mode) (Advanced Encryption Standard in Galois/Counter Mode) is a AEAD (Authenticated Encryption with Associated Data) mode of operation for symmetric-key cryptographic block ciphers which is widely adopted for its performance. It takes as input an Initialization Vector (IV) that needs to be unique (Nonce) for each encryption performed with the same key. This algorithm was selected for its securty, its performance and its large industry adoption. + +The nonce size is set to `12` (default for AES GCM). AESGCM is used with encryption keys of 256 bits (SHA256 digests in this context). + +A change in the encryption algorithm of the Provider Record implies that the Content Providers must publish 2 Provider Records, one with each encryption scheme. The Client and the DHT Server learn which encryption algorithm has been used by the Content Provider from the `varint` contained in `EncPeerID`. When a new encryption algorithm DHT servers may need to store multiple Provider Records in its Provider Store for the same `HASH2` and the same `CPPeerID`. We restrict the number of Provider Record for each pair (`HASH2`, `CPPeerID`) to `3` (the `varint`s must be distinct), in order to allow some flexibility, while keeping the potential number of _garbage_ Provider Records published by hostile nodes low. + +A change in the encryption algorithm used between the DHT Server and the Client (Lookup step 7.) means that the Client and the DHT Server must negociate the encryption algorithm, as long as it still uses a 256-bits key. + +**Signature scheme** + +TODO + +### Provider Store + +The data structure of the DHT Servers' Provider Store is a nested dictionary/map whose structure is: `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`]. + +The same `HASH2` always produces the same `ServerKey` (as long as the same Hashing Algorithm was used), as both `HASH2` and `ServerKey` result in a deterministic hash operation on `MH` prepended with a constant prefix. However, a misbehaving node could publish an advertisement for `HASH2` while not knowing `MH`, and forge a random `ServerKey`. The DHT Server not knowing `MH` cannot determine which `ServerKey` is the one associated with `HASH2`, and hence need to keep all different `ServerKey`s. However, the number of forged `ServerKey`s is expected to be small as the Client aren't able to decrypt payload encrypted with a forged `ServerKey`, and detect that the Provider Record isn't legitimate. The only reason a misbehaving peer would want to publish forged `ServerKey`s is to exhaust the storage resources of a specific target DHT Server. + +Content can be provider by multiple Content Providers, hence `HASH2` -> `ServerKey` points to potentially multiple `CPPeerID`s, each Content Provider having its own Provider Record. As the `CPPeerID` is obtained from the open libp2p connection, we assume that it is not possible to impersonate another `CPPeerID`. Each Content Provider can have a single Provider Record for each `HASH2`, and for each available `varint`. During a migration, we expect to have multiple Provider Records for the same pair (`HASH2`, `CPPeerID`), the Provider Store keeps 1 Provider Records for each distinct (`HASH2`, `CPPeerID`, `varint`) with a maximum of `3` per pair (`HASH2`, `CPeerID`). If there are more than 3 candidates, the ones with the lowest `TS` are discarded. +When a Content Provider republishes a Provider Record, the DHT Server only keeps the valid Provider Record whose `TS` is the largest value, for the given `varint`. We expect to have a single `varint` in use most of the time. DHT Servers drop all Provider Records from published by the same `CPPeerID` with the same `HASH2` but multiple different `ServerKey`s. A well behaving node can compute the right `ServerKey` and doesn't try to exhaust the storage resources of the DHT Server. Only a misbehaving node forges invalid `ServerKey`s, and if multiple `ServerKey`s are associated with the same (`HASH2`, `CPPeerID`) it implies that at least one of the two `ServerKey` is incorrect, so the Content Provider is misbehaving. + -9. Each DHT server verifies `Signature` against the `peerid` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is _recent enough_. If invalid, send an error to the client. -10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `peerid` of the Content Provider. If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. +9. Each DHT server verifies `Signature` against the `PeerID` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is _recent enough_. If invalid, send an error to the client. +10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `PeerID` of the Content Provider. If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. 11. Each DHT server confirms to Content Provider that the Provider Record has been successfully added. 12. The proces is over once Content Provider has received 20 confirmations. **Lookup Process** 1. Client computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). 2. Client selects a prefix of `HASH2`, `KeyPrefix = HASH2[:l]` for a defined `l` (see [`l` selection](#prefix-length-selection)). -2. Client finds the closest `peerid`s to `HASH2` in XOR distance in its Routing Table. -3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. -4. The DHT servers find the 20 closest `peerid`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `peerid`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. +2. Client finds the closest `PeerID`s to `HASH2` in XOR distance in its Routing Table. +3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. +4. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. 5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `Enc(ServerKey, EncPeerID || TS || Signature || multiaddrs)`, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable). DHT Servers can decide to put a maximal limit of returned Provider Record per request. If too many `HASH2` are matching `KeyPrefix`, they select randomly 128 matching provider records per request, and send a flag to Client to signal that the limit was reached. +6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `Enc(ServerKey, EncPeerID || TS || Signature || multiaddrs)`, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. DHT Servers can decide to put a maximal limit of returned Provider Record per request. If too many `HASH2` are matching `KeyPrefix`, they select randomly 128 matching provider records per request, and send a flag to Client to signal that the limit was reached. 7. The DHT servers send `message` to Client. 8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. 9. Client tries to decrypt all returned encrypted payloads using `ServerKey`. If at least one encrypted payload can be decrypted, go to 12. @@ -165,17 +165,6 @@ func closest_to_match(prefix, N, all_keys) { } ``` - -## Test fixtures - - - - ## Design rationale ### Cryptographic algorithms @@ -200,7 +189,12 @@ A change in the encryption algorithm used between the DHT Server and the Client **Signature scheme** -TODO +The signature scheme is the default one from libp2p. The available algorithms are available [here](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#key-types) We use the private key, from which the node's `PeerID` is derived to sign `(EncPeerID || TS)`. Every node with the knowledge of the signing `peerid` can verify the signature. + +```go +privKey := host.Peerstore().PrivKey(host.ID()) +signature, err := privKey.Sign(data) +``` ### Provider Store @@ -211,6 +205,13 @@ The same `HASH2` always produces the same `ServerKey` (as long as the same Hashi Content can be provider by multiple Content Providers, hence `HASH2` -> `ServerKey` points to potentially multiple `CPPeerID`s, each Content Provider having its own Provider Record. As the `CPPeerID` is obtained from the open libp2p connection, we assume that it is not possible to impersonate another `CPPeerID`. Each Content Provider can have a single Provider Record for each `HASH2`, and for each available `varint`. During a migration, we expect to have multiple Provider Records for the same pair (`HASH2`, `CPPeerID`), the Provider Store keeps 1 Provider Records for each distinct (`HASH2`, `CPPeerID`, `varint`) with a maximum of `3` per pair (`HASH2`, `CPeerID`). If there are more than 3 candidates, the ones with the lowest `TS` are discarded. When a Content Provider republishes a Provider Record, the DHT Server only keeps the valid Provider Record whose `TS` is the largest value, for the given `varint`. We expect to have a single `varint` in use most of the time. DHT Servers drop all Provider Records from published by the same `CPPeerID` with the same `HASH2` but multiple different `ServerKey`s. A well behaving node can compute the right `ServerKey` and doesn't try to exhaust the storage resources of the DHT Server. Only a misbehaving node forges invalid `ServerKey`s, and if multiple `ServerKey`s are associated with the same (`HASH2`, `CPPeerID`) it implies that at least one of the two `ServerKey` is incorrect, so the Content Provider is misbehaving. +### `k`-anonymity + +The `k`-anonymity parameter `k` is user defined, it can be modified in the configuration files. Users requiring a higher level of privacy can increase their value of `k`. +Default parameter selection: `k = 8` + +Maximal number of returned keys + -### User benefit - - +## User benefits ### Reader Privacy -**`k`-anonymity** +**Double Hashing** -Default parameter selection: `k = 8` +Currently any DHT Server observing a request can associate the Client's `PeerID` with the requested `CID`. If the `CID` is not already known, curious DHT Servers observing a DHT request can replay the request, and retrieve the content that the client is accessing, which is a significant privacy concern. Using `HASH2` as DHT Content Identifier prevents curious DHT Servers not knowing `MH`, the preimage of `HASH2` from retrieving the content associated with `HASH2`. Curious DHT Servers can still replay the DHT request for `HASH2` and find the Content Providers. However, they are not able to make a valid Bitswap request to the Content Providers, for they don't have the knowledge of the Content Identifier used by Bitswap (`CID`) for the content being identified by `HASH2` in the DHT. + +Double Hashing is also necessary for Prefix Requests and Provider Record Encryption. + +**Prefix Requests** + +A Prefix Request consists in requesting a Prefix of a key, instead of a full length Kademlia key. A Prefix corresponds to a branch of the binary tree, and potentially matches multiple existing keys. Prefix Request Routing works exactly like the normal Kademlia Routing, hence a DHT Prefix Request always converges eventually. The goal of Prefix Requests is to match multiple Provider Records for a single request. Insead of requesting `HASH2` the Client now requests `Prefix`, a prefix of `HASH2` of length `l` bits, and the DHT Server storing the Provider Records matching to `Prefix` doesn't know exactly which content is accessed and returns all Provider Records whose `HASH2` matches `Prefix`. + +This provides [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) when a curious DHT Server tries to associate the Client's `PeerID` with the requested `HASH2`, with `k` defined as the average number of Provider Records matching a Prefix of length `l`. `k` is a system parameter and defines the `k`-anonymity level, and `l` is derived from `k` (see [Prefix Length Selection](#prefix-length-selection)). Prefix Request also enables [Plausible Deniability](https://en.wikipedia.org/wiki/Deniable_encryption) for the Client. The DHT Server cannot prove that a Client identified by its `PeerID` or `IP Address` tried to access some content identified by its `HASH2`. + +However Prefix Requests don't offer [`l`-diversity](https://en.wikipedia.org/wiki/L-diversity) nor [`t`-closeness](https://en.wikipedia.org/wiki/T-closeness), as frequency analysis is still easy to perform. For example, a `Prefix` matches a very popular Provider Records and a few unpopular ones. The DHT Server nodes can take a better-than-random guess when a new request is recieved for this `Prefix` that there is a higher chance that the Client is requesting the popular content's Provider Record compared with an unpopular one. However, the DHT Server cannot prove the the Client has accessed the popular content. + +**Provider Record Encryption** + +Provider Record Encryption also builds on top of Double Hashing. The Provider Record Encryption prevents curious DHT Servers observing a request for `Prefix` but not storing any Provider Record matching `Prefix`, to replay the request for `Prefix` and get all published keys matching `Prefix` including `HASH2` of the content accessed by the Client. It prevents all curious actors from building a global dictionary of `HASH2` to Content Providers for all content published in the IPFS public DHT. It is necessary to know the `MH` of the content (included in the `CID`) to learn about its Content Providers. + +Curious DHT Servers observing a request from `PeerID` for `Prefix` cannot associate the Client's `PeerID` with the Content Provider's `PeerID` because they cannot read the Provider Record. ### Writer Privacy +Writer Privacy is NOT the goal of this design. However, as a side effect, Write Privacy gets improved in some specific cases. +- Content Providers do NOT get any additional privacy from the Client fetching the data +- Content Providers can now hide to the DHT Server peers hosting their Provider Records which data they are serving, as long as the DHT Servers don't know the preimage of `HASH2`: `MH`. The DHT Servers are not able to query the content associated with the Provider Records they are storing. However, they can approximatively monitor the number of requests associated to the content by observing the requests in the keysubspace matching to `Prefix` of `HASH2` of the content. DHT Servers can take an educated guess on the association of `HASH2` with the Content Provider's `PeerID`. The DHT Servers storing the Provider Record are able to share information about the Content Provider with potential accomplices. +- Content Providers get additional privacy from curious DHT Servers observing a request, but NOT storing the Provider Record. These DHT Servers can still replay the DHT request, but are unable to discover the Content Provider's `PeerID` associated with `Preix` because the Provider Records are encrypted, and the content itself. This holds as long as the DHT Servers don't know the `MH` (or `CID`). + ### Provider Record Authenticity +The Provider Records are now signed by the Content Provider. This prevents a malicious DHT Server from forging a Provider Record for an arbitrary key. The Clients need to verify the Signature against the Content Provider's `PeerID` and send a Bitswap request to the Content Provider only if the Signature is valid. Content Providers can only publish Provider Records for themselves. + ### Provider Records Enumeration -Easier monitoring of the DHT, random key query +Enumarating the number of Provider Records in the DHT becomes trivial thank to the Double Hashing and Prefix Requests. Knowledge of the preimage of the requested key isn't required anymore for a valid Kademlia request. An easy Provider Records Enumeration, or Approximation if crawling the complete DHT isn't an option enables a better monitoring of the DHT load and activity. ### Better Kademlia Routing Table Refresh -Get rid of 456 KB in the IPFS source code https://github.com/libp2p/go-libp2p-kbucket/blob/master/bucket_prefixmap.go +As knowledge of the preimage of the requested key isn't necessary in the Double Hashing DHT, nodes gain the ability to request _truely_ random keys in the DHT. -## Compatibility +Requesting random keys is necessary for the Kademlia Bucket Refresh Process. On refresh, if a bucket has empty slots, the node will make a request for a random forged key falling in this specific bucket. In the current implementation, as the prefix of a requested key is necessary, Kademlia uses a [list of precomputed preimages](https://github.com/libp2p/go-libp2p-kbucket/blob/master/bucket_prefixmap.go), 1 matching every 15-bits key prefix. Hence, the random forged key, is never random, its definition set is the list of precomputed preimages, and not the full keyspace. This can lead to degraded performance and security vulnerabilities. -Breaking change - +Double Hashing enables the nodes to select a _truly_ random key from the Kademlia keyspace (limited by the randomness algorithm) matching the appropriate bucket.The 456KB [list of precomputed preimages](https://github.com/libp2p/go-libp2p-kbucket/blob/master/bucket_prefixmap.go) can be removed from the IPFS source code, once the migration to the Double Hashing DHT is complete. + +### Simplicity + +It is generally less complex to find content in the DHT by requesting its Kademlia identifier (keyspace location), instead of requesting the preimage of its keyspace location. + +## Migration + +This design is a breaking change and requires a major DHT migration. + +**WIP** ## Security Threat Model (or it should be in a distinct section) DOS (sending the multiaddrs of the target peer for every served provider record) can be solved in the future with signed peer records. + +Privacy depends on the secrecy of `CID`. ## Alternatives -This approach is a first fix to the DHT (low hanging fruit). Other alternative to add privacy in the DHT include Mixnets and ephemeral peerids. +This approach is a first fix to the DHT privacy (low hanging fruit). Other alternative to add privacy in the DHT include Mixnets (incl. Tor) and ephemeral PeerIDs. Alternatives for migration: - slow breaking change (give enough time so that only a _small_ number of participants break) @@ -276,7 +305,7 @@ Describe alternate designs that were considered and related work. - Is it wise to encrypt the `CPPeerID` using `MH` directly? It would be possible to derive another identifier from `MH` (such as `Hash("SOME_CONSTANT" || MH)`). `MH` is the master identifier of the content, hence if it is revealed all other identifers can trivially be found. However, it is computationnaly impossible to recover `MH` from `Hash("SOME_CONSTANT" || MH)`. - It may be fine to use `TS` as nonce, it spares bytes on the wire. However, if two Content Providers publish the same content at the same time (`TS` either in seconds or milliseconds), then the DHT Server may be able to forge a valid Provider Records for itself. -- Move to SHA3?? +- Move to SHA3?? Now or never (or with the universal DHT) ## Copyright From 89cb1edf06f3fb8371f3dd2d3312860ab4d60721 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Tue, 31 Jan 2023 14:43:38 +0100 Subject: [PATCH 05/55] added threat model section --- IPIP/0000-double-hash-dht.md | 65 ++++++++++++++++++++++++++---------- 1 file changed, 47 insertions(+), 18 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index a7a736e6..564d8f37 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -50,7 +50,8 @@ s of the Client retreiving the content identified by `CID`. **Magic Values** - bytes("CR_DOUBLEHASH") - bytes("CR_SERVERKEY") -- AESGCM varint +- AESGCM varint: `TODO` +- Double SHA256 varint: `DBL_SHA2_256 = 86` - Max number of Provider Records returned by a DHT Server for a single request: `128` ### Current DHT @@ -277,35 +278,63 @@ This design is a breaking change and requires a major DHT migration. **WIP** -## Security +Alternatives for migration: +- slow breaking change (give enough time so that only a _small_ number of participants break) +- DHT duplication +- Universal DHT (WIP). -Threat Model (or it should be in a distinct section) +## Threat Model -DOS (sending the multiaddrs of the target peer for every served provider record) can be solved in the future with signed peer records. +### Reader Privacy -Privacy depends on the secrecy of `CID`. - +The Double Hashing DHT prevents DHT Server nodes to associate a Client's `PeerID` with the Content requested by the Client. DHT Servers no longer know _which Client is accessing which content_. This protection only works as long as the DHT Servers don't know the `CID` requested by the Client. Thus, the privacy of a request depends on the secrecy of the requested `CID`. -## Alternatives +A powerful adversary could crawl all discoverable `CID`s, e.g by sniffing Bitswap broadcasts or browsing the Web to discover new `CID`s. From this list of `CID`s, the adversary can compute the `HASH2`s associated with all the `CID`s and get a mapping `HASH2` $\rightarrow$ `CID` for many `CID`s. This adversary can run many DHT Servers, and upon request for some `Prefix`, check which `HASH2` are matching the `Prefix`. Using frequency analysis, the adversary can take an educated guess on which content the client is requesting. If the requested content is unknown to the adversary, but the adversary knows its `CID`, the adversary can trivially resolve the Content Providers from the DHT, and fetch the content over Bitswap. -This approach is a first fix to the DHT privacy (low hanging fruit). Other alternative to add privacy in the DHT include Mixnets (incl. Tor) and ephemeral PeerIDs. +DHT Servers serving the requested Provider Record to the Client has the ability to associate the Client's `PeerID` with the Content Providers `PeerID`. It can track _from which peer a Client is fetching content_. -Alternatives for migration: -- slow breaking change (give enough time so that only a _small_ number of participants break) -- DHT duplication -- Universal DHT (WIP). +The proposed solution makes _association attacks_ (associating the Client's `PeerID` with the requested `CID`) much more expensive for _public content_, but doesn't make them impossible to perform. However, malicious users cannot discover _private content_, and spy on users accessing it. If Alice advertises her holiday pictures to the public IPFS DHT and privately sends the root `CID` to Bob only, no adversary can retrieve the pictures, and no adversary can learn what Bob is accessing. Only the DHT Servers serving the Provider Record to Bob know that Bob is requesting some content from Alice's `PeerID`. + +The Client doesn't have any privacy protection from the Content Provider serving Content over Bitswap. + +### Signed Provider Records + +Provider Records are signed in the Double Hash DHT. This implies that malicious DHT Servers serving a Provider Record can no longer forge an arbitrary Provider Record corresponding to the requested `CID`. The Client can computationally verify that the Provider Record is valid, and was created by the Provider Record that has the knowledge of `CID`. + +### DDOS Protection + +The Double Hash DHT doesn't improve DDOS (Distributed Denial Of Service) protection. Upon recieving a DHT request from a Client for a valid Provider Record, DHT Servers can decide to return a `multiaddrs` corresponding to the IP address of a `target` host, not providing the requested content. The Client will open a connection to the returned `multiaddrs` and send a Bitswap request for the content. If the `CID` that was initially requested is popular, this will generate a lot of traffic toward the `target` coming from many different Clients. + +DDOS protection can be improved in the future on the Double Hash DHT by using signed Peer Records. + +### DHT Servers Resource Exhaustion + +An adversary user could try to exhaust the DHT resources by advertising garbage Provider Records. The adversary needs to generate random bytes (_garbage_), sign them and ask DHT Server nodes to store the garbage Provider Records. DHT Server nodes cannot computaionally decide whether a Provider Record is garbage or not, thus they must continue storing the Provider Records. Note that the adversary periodically needs to republish every Provider Record, which isn't trivial for a large number of Provider Records at the moment. This issue isn't mitigated in the current DHT. + +One possible mitigation could be to identify IP addresses publishing an _excessive_ number of Provider Records that are never accessed, and refusing to store more Provider Records for this IP. + +## Alternatives for DHT Reader Privacy + +Other approaches to improve Reader Privacy in the DHT mostly include Ephemeral `PeerID`s and [Mixnets](https://en.wikipedia.org/wiki/Mix_network). The first option is to use ephemeral `PeerID`s in order to escape tracking. This solution however doesn’t increase much the privacy level. It is still possible to enumerate the all `PeerID`s in the network and to associate all the `PeerID`s using the same IP addresses. Combining the Ephemeral `PeerID` approach with Double Hashing can help slighlty improve privacy. Having a different `PeerID` for the DHT Client and the DHT Server of the same IPFS node makes association of _which Content Provider requested which Content_ harder. The two `PeerID`s can still be associated as they use the same IP address, but the DHT Client cannot be discovered in a network crawl. + +Ephemeral `PeerID`s references: +- https://github.com/libp2p/libp2p/issues/37 + +The other alternative to increase the Reader Privacy level in the IPFS DHT is the use of Mixnets such as Tor or I2P. Mixnets usually provide an excellent Reader- and Writer Privacy level, but the latency is expected to increase significantly. Hence the use of Mixnets is generally not good for all use cases, but only when strong privacy guarantees are required. IPFS users willing to remain pseudonymous could use the extisting Tor network to hide their identity. Another alternative could be to create a Mixnet out of the IPFS network, e.g include mixing capabilities in every libp2p host. There has been some ongoing work on IPFS-Tor integration. + +Mixnets references: +- Berty's [go-libp2p-tor-transport](https://github.com/berty/go-libp2p-tor-transport) +- [Hosting an IPFS Gateway Through a Tor Proxy](https://www.minds.com/raymondsmith98/blog/tutorial-tor-hosting-an-ipfs-gateway-through-a-tor-proxy-857369540936916992) +- Mixnet and Content Routing ([IPFS Thing 2022 Video](https://www.youtube.com/watch?v=f85U8b5g-Ks), [Notes](https://hackmd.io/@nZ-twauPRISEa6G9zg3XRw/BkrcMOLd9)) by [noot](https://github.com/noot) +- [Nym Mixnet](https://nymtech.net/) +- https://github.com/ipfs/notes/issues/37 - ## Open Questions - Is it wise to encrypt the `CPPeerID` using `MH` directly? It would be possible to derive another identifier from `MH` (such as `Hash("SOME_CONSTANT" || MH)`). `MH` is the master identifier of the content, hence if it is revealed all other identifers can trivially be found. However, it is computationnaly impossible to recover `MH` from `Hash("SOME_CONSTANT" || MH)`. - It may be fine to use `TS` as nonce, it spares bytes on the wire. However, if two Content Providers publish the same content at the same time (`TS` either in seconds or milliseconds), then the DHT Server may be able to forge a valid Provider Records for itself. -- Move to SHA3?? Now or never (or with the universal DHT) +- Move to SHA3?? Now or never (or with the universal DHT) https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions ## Copyright From 0a138da09e6f8453157b104e4bf48951f8fb7a06 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Tue, 31 Jan 2023 15:03:51 +0100 Subject: [PATCH 06/55] quick spell checks --- IPIP/0000-double-hash-dht.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 564d8f37..390842ca 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -4,6 +4,7 @@ submit your IPIP, please use number 0000 and an abbreviated title in the filename, `0000-draft-title-abbrev.md`. --> ![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) +- DRI: [Guillaume Michel](https://github.com/guillaumemichel) - Start Date: 2023-01-18 - Related Resources: - [Specs in Notion](https://pl-strflt.notion.site/Double-Hashing-for-Privacy-ff44e3156ce040579289996fec9af609) @@ -18,7 +19,7 @@ submit your IPIP, please use number 0000 and an abbreviated title in the filenam ## Motivation -IPFS is currently lacking of many privacy protections. One of its principal weaknesses currently lies in the lack of privacy protections for the DHT content routing subsystem. Currently in the IPFS DHT, neither readers (clients retrieving content) nor writers (hosts storing and distributing content) have much privacy with regard to content they consume or publish. It is trivial for a DHT server node to associate the requester's identity with the accessed content during the routing process. A curious DHT server node, can request the same CIDs to find out what content other users are consuming. Improving privacy in the IPFS DHT has been a strong request from the community for some time. +IPFS is currently lacking of many privacy protections. One of its principal weaknesses currently lies in the lack of privacy protections for the DHT content routing subsystem. Currently in the IPFS DHT, neither readers (clients retrieving content) nor writers (hosts storing and distributing content) have much privacy with regard to content they consume or publish. It is trivial for a DHT server node to associate the requestor's identity with the accessed content during the routing process. A curious DHT server node, can request the same CIDs to find out what content other users are consuming. Improving privacy in the IPFS DHT has been a strong request from the community for some time. The changes described in this document introduce a DHT privacy upgrade boosting the reader’s privacy. It will prevent DHT tracking as described above, and add Provider Records Authentication. The proposed modifications also add a slight Writer Privacy improvement as a side effect. @@ -34,11 +35,10 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **DHT Servers** are nodes running the IPFS public DHT. In this documents, DHT Servers mostly refer to the DHT Servers storing the Provider Records associated with specific `CID`s, and not the DHT Servers helping routing lookup requests to the right keyspace location. - **Client** is an IPFS client looking up a content identified by an already known `CID`. - **Publish Process** is the process of the Content Provider communicating to the DHT Servers that it provides some content identified by `CID`. -- **Lookup Process** is the proces -s of the Client retreiving the content identified by `CID`. +- **Lookup Process** is the process of the Client retrieving the content identified by `CID`. - **`PeerID`** s define stable [peer identities](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md). The `PeerID` is derived from the node's cryptographic public key. - **`multiaddrs`** are the [network addresses](https://github.com/libp2p/specs/tree/master/addressing) associated with a `PeerID`. It represents the location(s) of the peer. -- **`KeyPrefix`** is defined as a prefix of lenght `l` bits of `HASH2`. +- **`KeyPrefix`** is defined as a prefix of length `l` bits of `HASH2`. - **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use it to encrypt the data sent to the Client during the lookup process. - **`TS`** is the Timestamp (unix timestamp) when the Content Provider published the content. - **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. @@ -180,13 +180,13 @@ Changing the Hash function used to derive `ServerKey` requires the DHT Server to **AESGCM** -[AESGCM](https://en.wikipedia.org/wiki/Galois/Counter_Mode) (Advanced Encryption Standard in Galois/Counter Mode) is a AEAD (Authenticated Encryption with Associated Data) mode of operation for symmetric-key cryptographic block ciphers which is widely adopted for its performance. It takes as input an Initialization Vector (IV) that needs to be unique (Nonce) for each encryption performed with the same key. This algorithm was selected for its securty, its performance and its large industry adoption. +[AESGCM](https://en.wikipedia.org/wiki/Galois/Counter_Mode) (Advanced Encryption Standard in Galois/Counter Mode) is a AEAD (Authenticated Encryption with Associated Data) mode of operation for symmetric-key cryptographic block ciphers which is widely adopted for its performance. It takes as input an Initialization Vector (IV) that needs to be unique (Nonce) for each encryption performed with the same key. This algorithm was selected for its security, its performance and its large industry adoption. The nonce size is set to `12` (default for AES GCM). AESGCM is used with encryption keys of 256 bits (SHA256 digests in this context). A change in the encryption algorithm of the Provider Record implies that the Content Providers must publish 2 Provider Records, one with each encryption scheme. The Client and the DHT Server learn which encryption algorithm has been used by the Content Provider from the `varint` contained in `EncPeerID`. When a new encryption algorithm DHT servers may need to store multiple Provider Records in its Provider Store for the same `HASH2` and the same `CPPeerID`. We restrict the number of Provider Record for each pair (`HASH2`, `CPPeerID`) to `3` (the `varint`s must be distinct), in order to allow some flexibility, while keeping the potential number of _garbage_ Provider Records published by hostile nodes low. -A change in the encryption algorithm used between the DHT Server and the Client (Lookup step 7.) means that the Client and the DHT Server must negociate the encryption algorithm, as long as it still uses a 256-bits key. +A change in the encryption algorithm used between the DHT Server and the Client (Lookup step 7.) means that the Client and the DHT Server must negotiate the encryption algorithm, as long as it still uses a 256-bits key. **Signature scheme** @@ -233,11 +233,11 @@ Double Hashing is also necessary for Prefix Requests and Provider Record Encrypt **Prefix Requests** -A Prefix Request consists in requesting a Prefix of a key, instead of a full length Kademlia key. A Prefix corresponds to a branch of the binary tree, and potentially matches multiple existing keys. Prefix Request Routing works exactly like the normal Kademlia Routing, hence a DHT Prefix Request always converges eventually. The goal of Prefix Requests is to match multiple Provider Records for a single request. Insead of requesting `HASH2` the Client now requests `Prefix`, a prefix of `HASH2` of length `l` bits, and the DHT Server storing the Provider Records matching to `Prefix` doesn't know exactly which content is accessed and returns all Provider Records whose `HASH2` matches `Prefix`. +A Prefix Request consists in requesting a Prefix of a key, instead of a full length Kademlia key. A Prefix corresponds to a branch of the binary tree, and potentially matches multiple existing keys. Prefix Request Routing works exactly like the normal Kademlia Routing, hence a DHT Prefix Request always converges eventually. The goal of Prefix Requests is to match multiple Provider Records for a single request. Instead of requesting `HASH2` the Client now requests `Prefix`, a prefix of `HASH2` of length `l` bits, and the DHT Server storing the Provider Records matching to `Prefix` doesn't know exactly which content is accessed and returns all Provider Records whose `HASH2` matches `Prefix`. This provides [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) when a curious DHT Server tries to associate the Client's `PeerID` with the requested `HASH2`, with `k` defined as the average number of Provider Records matching a Prefix of length `l`. `k` is a system parameter and defines the `k`-anonymity level, and `l` is derived from `k` (see [Prefix Length Selection](#prefix-length-selection)). Prefix Request also enables [Plausible Deniability](https://en.wikipedia.org/wiki/Deniable_encryption) for the Client. The DHT Server cannot prove that a Client identified by its `PeerID` or `IP Address` tried to access some content identified by its `HASH2`. -However Prefix Requests don't offer [`l`-diversity](https://en.wikipedia.org/wiki/L-diversity) nor [`t`-closeness](https://en.wikipedia.org/wiki/T-closeness), as frequency analysis is still easy to perform. For example, a `Prefix` matches a very popular Provider Records and a few unpopular ones. The DHT Server nodes can take a better-than-random guess when a new request is recieved for this `Prefix` that there is a higher chance that the Client is requesting the popular content's Provider Record compared with an unpopular one. However, the DHT Server cannot prove the the Client has accessed the popular content. +However Prefix Requests don't offer [`l`-diversity](https://en.wikipedia.org/wiki/L-diversity) nor [`t`-closeness](https://en.wikipedia.org/wiki/T-closeness), as frequency analysis is still easy to perform. For example, a `Prefix` matches a very popular Provider Records and a few unpopular ones. The DHT Server nodes can take a better-than-random guess when a new request is received for this `Prefix` that there is a higher chance that the Client is requesting the popular content's Provider Record compared with an unpopular one. However, the DHT Server cannot prove the the Client has accessed the popular content. **Provider Record Encryption** @@ -249,7 +249,7 @@ Curious DHT Servers observing a request from `PeerID` for `Prefix` cannot associ Writer Privacy is NOT the goal of this design. However, as a side effect, Write Privacy gets improved in some specific cases. - Content Providers do NOT get any additional privacy from the Client fetching the data -- Content Providers can now hide to the DHT Server peers hosting their Provider Records which data they are serving, as long as the DHT Servers don't know the preimage of `HASH2`: `MH`. The DHT Servers are not able to query the content associated with the Provider Records they are storing. However, they can approximatively monitor the number of requests associated to the content by observing the requests in the keysubspace matching to `Prefix` of `HASH2` of the content. DHT Servers can take an educated guess on the association of `HASH2` with the Content Provider's `PeerID`. The DHT Servers storing the Provider Record are able to share information about the Content Provider with potential accomplices. +- Content Providers can now hide to the DHT Server peers hosting their Provider Records which data they are serving, as long as the DHT Servers don't know the preimage of `HASH2`: `MH`. The DHT Servers are not able to query the content associated with the Provider Records they are storing. However, they can approximately monitor the number of requests associated to the content by observing the requests in the keysubspace matching to `Prefix` of `HASH2` of the content. DHT Servers can take an educated guess on the association of `HASH2` with the Content Provider's `PeerID`. The DHT Servers storing the Provider Record are able to share information about the Content Provider with potential accomplices. - Content Providers get additional privacy from curious DHT Servers observing a request, but NOT storing the Provider Record. These DHT Servers can still replay the DHT request, but are unable to discover the Content Provider's `PeerID` associated with `Preix` because the Provider Records are encrypted, and the content itself. This holds as long as the DHT Servers don't know the `MH` (or `CID`). ### Provider Record Authenticity @@ -258,11 +258,11 @@ The Provider Records are now signed by the Content Provider. This prevents a mal ### Provider Records Enumeration -Enumarating the number of Provider Records in the DHT becomes trivial thank to the Double Hashing and Prefix Requests. Knowledge of the preimage of the requested key isn't required anymore for a valid Kademlia request. An easy Provider Records Enumeration, or Approximation if crawling the complete DHT isn't an option enables a better monitoring of the DHT load and activity. +Enumerating the number of Provider Records in the DHT becomes trivial thank to the Double Hashing and Prefix Requests. Knowledge of the preimage of the requested key isn't required anymore for a valid Kademlia request. An easy Provider Records Enumeration, or Approximation if crawling the complete DHT isn't an option enables a better monitoring of the DHT load and activity. ### Better Kademlia Routing Table Refresh -As knowledge of the preimage of the requested key isn't necessary in the Double Hashing DHT, nodes gain the ability to request _truely_ random keys in the DHT. +As knowledge of the preimage of the requested key isn't necessary in the Double Hashing DHT, nodes gain the ability to request _truly_ random keys in the DHT. Requesting random keys is necessary for the Kademlia Bucket Refresh Process. On refresh, if a bucket has empty slots, the node will make a request for a random forged key falling in this specific bucket. In the current implementation, as the prefix of a requested key is necessary, Kademlia uses a [list of precomputed preimages](https://github.com/libp2p/go-libp2p-kbucket/blob/master/bucket_prefixmap.go), 1 matching every 15-bits key prefix. Hence, the random forged key, is never random, its definition set is the list of precomputed preimages, and not the full keyspace. This can lead to degraded performance and security vulnerabilities. @@ -303,13 +303,13 @@ Provider Records are signed in the Double Hash DHT. This implies that malicious ### DDOS Protection -The Double Hash DHT doesn't improve DDOS (Distributed Denial Of Service) protection. Upon recieving a DHT request from a Client for a valid Provider Record, DHT Servers can decide to return a `multiaddrs` corresponding to the IP address of a `target` host, not providing the requested content. The Client will open a connection to the returned `multiaddrs` and send a Bitswap request for the content. If the `CID` that was initially requested is popular, this will generate a lot of traffic toward the `target` coming from many different Clients. +The Double Hash DHT doesn't improve DDOS (Distributed Denial Of Service) protection. Upon receiving a DHT request from a Client for a valid Provider Record, DHT Servers can decide to return a `multiaddrs` corresponding to the IP address of a `target` host, not providing the requested content. The Client will open a connection to the returned `multiaddrs` and send a Bitswap request for the content. If the `CID` that was initially requested is popular, this will generate a lot of traffic toward the `target` coming from many different Clients. DDOS protection can be improved in the future on the Double Hash DHT by using signed Peer Records. ### DHT Servers Resource Exhaustion -An adversary user could try to exhaust the DHT resources by advertising garbage Provider Records. The adversary needs to generate random bytes (_garbage_), sign them and ask DHT Server nodes to store the garbage Provider Records. DHT Server nodes cannot computaionally decide whether a Provider Record is garbage or not, thus they must continue storing the Provider Records. Note that the adversary periodically needs to republish every Provider Record, which isn't trivial for a large number of Provider Records at the moment. This issue isn't mitigated in the current DHT. +An adversary user could try to exhaust the DHT resources by advertising garbage Provider Records. The adversary needs to generate random bytes (_garbage_), sign them and ask DHT Server nodes to store the garbage Provider Records. DHT Server nodes cannot computationally decide whether a Provider Record is garbage or not, thus they must continue storing the Provider Records. Note that the adversary periodically needs to republish every Provider Record, which isn't trivial for a large number of Provider Records at the moment. This issue isn't mitigated in the current DHT. One possible mitigation could be to identify IP addresses publishing an _excessive_ number of Provider Records that are never accessed, and refusing to store more Provider Records for this IP. @@ -320,7 +320,7 @@ Other approaches to improve Reader Privacy in the DHT mostly include Ephemeral ` Ephemeral `PeerID`s references: - https://github.com/libp2p/libp2p/issues/37 -The other alternative to increase the Reader Privacy level in the IPFS DHT is the use of Mixnets such as Tor or I2P. Mixnets usually provide an excellent Reader- and Writer Privacy level, but the latency is expected to increase significantly. Hence the use of Mixnets is generally not good for all use cases, but only when strong privacy guarantees are required. IPFS users willing to remain pseudonymous could use the extisting Tor network to hide their identity. Another alternative could be to create a Mixnet out of the IPFS network, e.g include mixing capabilities in every libp2p host. There has been some ongoing work on IPFS-Tor integration. +The other alternative to increase the Reader Privacy level in the IPFS DHT is the use of Mixnets such as Tor or I2P. Mixnets usually provide an excellent Reader- and Writer Privacy level, but the latency is expected to increase significantly. Hence the use of Mixnets is generally not good for all use cases, but only when strong privacy guarantees are required. IPFS users willing to remain pseudonymous could use the existing Tor network to hide their identity. Another alternative could be to create a Mixnet out of the IPFS network, e.g include mixing capabilities in every libp2p host. There has been some ongoing work on IPFS-Tor integration. Mixnets references: - Berty's [go-libp2p-tor-transport](https://github.com/berty/go-libp2p-tor-transport) @@ -332,9 +332,9 @@ Mixnets references: ## Open Questions -- Is it wise to encrypt the `CPPeerID` using `MH` directly? It would be possible to derive another identifier from `MH` (such as `Hash("SOME_CONSTANT" || MH)`). `MH` is the master identifier of the content, hence if it is revealed all other identifers can trivially be found. However, it is computationnaly impossible to recover `MH` from `Hash("SOME_CONSTANT" || MH)`. +- If we plan to move to using SHA3 instead of SHA2 to generate 256-bits digests, this migration is the perfect opportunity, as we will be breaking everything anyways. SHA3 was proved to be more secure against Length Extension Attacks. It has not be proven whether SHA2 or SHA3 is more collision resistant and secure against preimage attacks. See this [comparison](https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions). +- Is it wise to encrypt the `CPPeerID` using `MH` directly? It would be possible to derive another identifier from `MH` (such as `Hash("SOME_CONSTANT" || MH)`). `MH` is the master identifier of the content, hence if it is revealed all other identifiers can trivially be found. However, it is computationally impossible to recover `MH` from `Hash("SOME_CONSTANT" || MH)`. - It may be fine to use `TS` as nonce, it spares bytes on the wire. However, if two Content Providers publish the same content at the same time (`TS` either in seconds or milliseconds), then the DHT Server may be able to forge a valid Provider Records for itself. -- Move to SHA3?? Now or never (or with the universal DHT) https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions ## Copyright From 942a1f493629668d6b0c78cf1e6caead678db927 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Tue, 31 Jan 2023 15:15:38 +0100 Subject: [PATCH 07/55] added summary --- IPIP/0000-double-hash-dht.md | 32 +++++++++++++++----------------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 390842ca..04c1a71e 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -1,8 +1,5 @@ # IPIP 0000: Double Hash DHT - ![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) - DRI: [Guillaume Michel](https://github.com/guillaumemichel) - Start Date: 2023-01-18 @@ -14,8 +11,20 @@ submit your IPIP, please use number 0000 and an abbreviated title in the filenam ## Summary - -/TODO +This IPIP contains the up-to-date Spec of the IPFS Double Hash DHT. The Double Hashing DHT aims at providing some Reader Privacy guarantees to the IPFS DHT. + +This document is still WIP, all feedback is more than welcome. Make sure to write your thoughts about the [open questions](#open-questions) in the PR. + +## Table of Contents + +1. [Motivation](#motivation) +2. [Detailed Design](#detailed-design) +3. [Design Rationale](#design-rationale) +4. [User benefits](#user-benefits) +5. [Migration](#migration) +6. [Threat Model](#threat-model) +7. [Alternatives](#alternatives-for-dht-reader-privacy) +8. [Open Questions](#open-questions) ## Motivation @@ -23,7 +32,7 @@ IPFS is currently lacking of many privacy protections. One of its principal weak The changes described in this document introduce a DHT privacy upgrade boosting the reader’s privacy. It will prevent DHT tracking as described above, and add Provider Records Authentication. The proposed modifications also add a slight Writer Privacy improvement as a side effect. -## Detailed design +## Detailed Design ### Definitions @@ -108,17 +117,6 @@ The following process describes the event of a client looking up a CID in the IP 17. Client sends a Bitswap request for `CID` to the Content Provider (known `CPPeerID` and `multiaddrs`). 18. Content Provider sends the requested content back to Client. - ### Prefix length selection The goal of DHT prefix requests is to provide [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) to content lookup, in addition to the pseudonimity gained from double hashing. Each DHT prefix lookup query returns an expected number of `k` Provider Records matching `KeyPrefix`, with `k` being a system parameter. The user should be able to define a custom `k` from the configuration files, according to their privacy needs. The default value `k = 8` is discussed in [Design rationale](#reader-privacy). From 40b44d4d84bc7f3eda15ed356529837bcdcbfa72 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Wed, 1 Feb 2023 08:46:47 +0100 Subject: [PATCH 08/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Ivan Schasny <31857042+ischasny@users.noreply.github.com> --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 04c1a71e..6d8f1a1d 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -201,7 +201,7 @@ The data structure of the DHT Servers' Provider Store is a nested dictionary/map The same `HASH2` always produces the same `ServerKey` (as long as the same Hashing Algorithm was used), as both `HASH2` and `ServerKey` result in a deterministic hash operation on `MH` prepended with a constant prefix. However, a misbehaving node could publish an advertisement for `HASH2` while not knowing `MH`, and forge a random `ServerKey`. The DHT Server not knowing `MH` cannot determine which `ServerKey` is the one associated with `HASH2`, and hence need to keep all different `ServerKey`s. However, the number of forged `ServerKey`s is expected to be small as the Client aren't able to decrypt payload encrypted with a forged `ServerKey`, and detect that the Provider Record isn't legitimate. The only reason a misbehaving peer would want to publish forged `ServerKey`s is to exhaust the storage resources of a specific target DHT Server. -Content can be provider by multiple Content Providers, hence `HASH2` -> `ServerKey` points to potentially multiple `CPPeerID`s, each Content Provider having its own Provider Record. As the `CPPeerID` is obtained from the open libp2p connection, we assume that it is not possible to impersonate another `CPPeerID`. Each Content Provider can have a single Provider Record for each `HASH2`, and for each available `varint`. During a migration, we expect to have multiple Provider Records for the same pair (`HASH2`, `CPPeerID`), the Provider Store keeps 1 Provider Records for each distinct (`HASH2`, `CPPeerID`, `varint`) with a maximum of `3` per pair (`HASH2`, `CPeerID`). If there are more than 3 candidates, the ones with the lowest `TS` are discarded. +Content can be provided by multiple Content Providers, hence `HASH2` -> `ServerKey` points to potentially multiple `CPPeerID`s, each Content Provider having its own Provider Record. As the `CPPeerID` is obtained from the open libp2p connection, we assume that it is not possible to impersonate another `CPPeerID`. Each Content Provider can have a single Provider Record for each `HASH2`, and for each available `varint`. During a migration, we expect to have multiple Provider Records for the same pair (`HASH2`, `CPPeerID`), the Provider Store keeps 1 Provider Records for each distinct (`HASH2`, `CPPeerID`, `varint`) with a maximum of `3` per pair (`HASH2`, `CPeerID`). If there are more than 3 candidates, the ones with the lowest `TS` are discarded. When a Content Provider republishes a Provider Record, the DHT Server only keeps the valid Provider Record whose `TS` is the largest value, for the given `varint`. We expect to have a single `varint` in use most of the time. DHT Servers drop all Provider Records from published by the same `CPPeerID` with the same `HASH2` but multiple different `ServerKey`s. A well behaving node can compute the right `ServerKey` and doesn't try to exhaust the storage resources of the DHT Server. Only a misbehaving node forges invalid `ServerKey`s, and if multiple `ServerKey`s are associated with the same (`HASH2`, `CPPeerID`) it implies that at least one of the two `ServerKey` is incorrect, so the Content Provider is misbehaving. ### `k`-anonymity From 6c260f0c77ed03a254b7d6deb330309358880ec4 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 1 Feb 2023 09:21:10 +0100 Subject: [PATCH 09/55] update after ischasny comments --- IPIP/0000-double-hash-dht.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 04c1a71e..1fbea75b 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -59,7 +59,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting **Magic Values** - bytes("CR_DOUBLEHASH") - bytes("CR_SERVERKEY") -- AESGCM varint: `TODO` +- AESGCM [varint](https://github.com/multiformats/multicodec): `TODO` - Double SHA256 varint: `DBL_SHA2_256 = 86` - Max number of Provider Records returned by a DHT Server for a single request: `128` @@ -86,7 +86,7 @@ The following process describes the event of a client looking up a CID in the IP 1. Content Provider wants to publish some content with identifier `CID`. 2. Content Provider computes `HASH2`$\leftarrow{}$`SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). 3. Content Provider starts a DHT lookup request for the 20 closest `PeerID`s in XOR distance to `HASH2`. -4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = varint || Nonce || AESGCM(MH, CPPeerID || Nonce)` +4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = varint || Nonce || AESGCM(MH, CPPeerID || Nonce)`, with `varint` indicating the encryption algorithm in use, here AESGCM. 5. Content Provider takes the current timestamp `TS`. 6. Content Provider signs `EncPeerID` and `TS` using its private key. `Signature = Sign(privkey, EncPeerID || TS)` 7. Content Provider computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. From 2c20b47b46bd66600351539be27011eec41acf39 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 1 Feb 2023 11:25:17 +0100 Subject: [PATCH 10/55] replaced with for the DHT Server encrypted payload response --- IPIP/0000-double-hash-dht.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index cdcfd305..9a0f278b 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -103,10 +103,10 @@ The following process describes the event of a client looking up a CID in the IP 3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. 4. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. 5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `Enc(ServerKey, EncPeerID || TS || Signature || multiaddrs)`, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. DHT Servers can decide to put a maximal limit of returned Provider Record per request. If too many `HASH2` are matching `KeyPrefix`, they select randomly 128 matching provider records per request, and send a flag to Client to signal that the limit was reached. +6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || Enc(ServerKey, TS || Signature || multiaddrs)`, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. DHT Servers can decide to put a maximal limit of returned Provider Record per request. If too many `HASH2` are matching `KeyPrefix`, they select randomly 128 matching provider records per request, and send a flag to Client to signal that the limit was reached. 7. The DHT servers send `message` to Client. 8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. -9. Client tries to decrypt all returned encrypted payloads using `ServerKey`. If at least one encrypted payload can be decrypted, go to 12. +9. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 12. 10. Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. 11. Go to 4. 12. For each decrypted payload, Client decrypts `CPPeerID = Dec(MH, EncPeerID)`. From 591fdef9fa7af38299a6eb94c177b2f0536c564a Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 8 Feb 2023 16:24:53 +0100 Subject: [PATCH 11/55] added MatchLimit explanations --- IPIP/0000-double-hash-dht.md | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 9a0f278b..d64f29ff 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -61,7 +61,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting - bytes("CR_SERVERKEY") - AESGCM [varint](https://github.com/multiformats/multicodec): `TODO` - Double SHA256 varint: `DBL_SHA2_256 = 86` -- Max number of Provider Records returned by a DHT Server for a single request: `128` +- A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). ### Current DHT @@ -103,11 +103,11 @@ The following process describes the event of a client looking up a CID in the IP 3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. 4. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. 5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || Enc(ServerKey, TS || Signature || multiaddrs)`, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. DHT Servers can decide to put a maximal limit of returned Provider Record per request. If too many `HASH2` are matching `KeyPrefix`, they select randomly 128 matching provider records per request, and send a flag to Client to signal that the limit was reached. +6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || Enc(ServerKey, TS || Signature || multiaddrs)`, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. 7. The DHT servers send `message` to Client. 8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. 9. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 12. -10. Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. +10. If the `MatchLimit` and number of matching `HASH2`s was included in the `message`, Client makes multiple DHT lookup requests for longer prefixes (e.g `KeyPrefix||0` and `KeyPrefix||1`). Else Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. 11. Go to 4. 12. For each decrypted payload, Client decrypts `CPPeerID = Dec(MH, EncPeerID)`. 13. Client verifies that `Signature` verifies with `CPPeerID`: `Verify(CPPeerID, Signature, EncPeerID || TS)`. @@ -206,18 +206,12 @@ When a Content Provider republishes a Provider Record, the DHT Server only keeps ### `k`-anonymity -The `k`-anonymity parameter `k` is user defined, it can be modified in the configuration files. Users requiring a higher level of privacy can increase their value of `k`. -Default parameter selection: `k = 8` +Default: `k = 8`. +Default: `MatchLimit = 64`. -Maximal number of returned keys +The `k`-anonymity parameter `k` is user defined, it can be modified in the configuration files. Users requiring a higher level of privacy can increase their value of `k`. `8` is deemed to be private enough for standard IPFS users, while limiting the overhead in packet size of the DHT Server response to 8x. - +The `MatchLimit` prevents malformed or malicious requests to match all Provider Records that a DHT Server is providing at once. A Client can still fetch all Provider Records matching any `KeyPrefix`, but it must perform multiple DHT lookup requests for enough prefixes to the DHT Server. The `MatchLimit` protects the Server from having to send large amounts of data at once. `64` is already a large value, given that each `HASH2` can be associated with multiple Provider Records, one for each Content Provider, and the multiaddresses of all Content Providers can be sent along. The DHT provides _on average_ at most `64-anonymity` out-of-the-box and a better privacy level can be reached by sending multiple requests. ## User benefits From 6dafa772ccdfa25804543cf7bd9becb1c0808ee2 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 8 Feb 2023 16:43:19 +0100 Subject: [PATCH 12/55] added aes-256 as varint for aesgcm --- IPIP/0000-double-hash-dht.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index d64f29ff..b5c5ff03 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -51,7 +51,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use it to encrypt the data sent to the Client during the lookup process. - **`TS`** is the Timestamp (unix timestamp) when the Content Provider published the content. - **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. -- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, CPPeerID || RandomNonce)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec) of the encryption algorithm used (AESGCM), the bytes array of the encrypted payload, and the Nonce. +- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, CPPeerID || RandomNonce)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L69) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the Nonce. - **`Signature`** is the signature of the `EncPeerID` encrypted payload (not including the varint nor the nonce) and `TS` using the Content Provider's private key, either with ed25519 or rsa signature algorithm, depending on the keys of the Content Provider. - **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists on the following fields: [`EncPeerID`, `TS`, `Signature`]. - **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a nested dictionary/map: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. There is only one single correct `ServerKey` for each `HASH2`. However, any peer can forge a valid Publish request (with invalid `EncPeerID` but valid `Signature`) undetected by the DHT Server. The DHT server isn't able to distinguish which `ServerKey` is correct as it doesn't have the knowledge of `MH`, hence it has to keep both and serve both upon request for `HASH2`. @@ -59,8 +59,8 @@ The changes described in this document introduce a DHT privacy upgrade boosting **Magic Values** - bytes("CR_DOUBLEHASH") - bytes("CR_SERVERKEY") -- AESGCM [varint](https://github.com/multiformats/multicodec): `TODO` -- Double SHA256 varint: `DBL_SHA2_256 = 86` +- AES [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L69): `aes-256 = 0xa2` +- Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x56` - A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). ### Current DHT From 28383a2a3e1ae1ecefada4f62d2a82175abe33ab Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 8 Feb 2023 16:58:41 +0100 Subject: [PATCH 13/55] define Provider Records life duration --- IPIP/0000-double-hash-dht.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index b5c5ff03..4e3fd30c 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -62,6 +62,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting - AES [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L69): `aes-256 = 0xa2` - Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x56` - A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). +- Provider Record Timestamp (`TS`) validity period: `48h` ### Current DHT @@ -91,7 +92,7 @@ The following process describes the event of a client looking up a CID in the IP 6. Content Provider signs `EncPeerID` and `TS` using its private key. `Signature = Sign(privkey, EncPeerID || TS)` 7. Content Provider computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. 8. Once the lookup request has returned the 20 closest peers, Content Provider sends a Publish request to these DHT servers. The Publish request contains [`HASH2`, `EncPeerID`, `TS`, `Signature`, `ServerKey`]. -9. Each DHT server verifies `Signature` against the `PeerID` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is _recent enough_. If invalid, send an error to the client. +9. Each DHT server verifies `Signature` against the `PeerID` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is younger than `48h` and isn't in the future. If invalid, send an error to the client. 10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `PeerID` of the Content Provider. If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. 11. Each DHT server confirms to Content Provider that the Provider Record has been successfully added. 12. The proces is over once Content Provider has received 20 confirmations. @@ -111,7 +112,7 @@ The following process describes the event of a client looking up a CID in the IP 11. Go to 4. 12. For each decrypted payload, Client decrypts `CPPeerID = Dec(MH, EncPeerID)`. 13. Client verifies that `Signature` verifies with `CPPeerID`: `Verify(CPPeerID, Signature, EncPeerID || TS)`. -14. Client checks that `TS` is still valid. +14. Client checks that `TS` is younger than `48h`. 15. If none of the decrypted payloads is valid, go to 4. 16. If the decrypted payload doesn't include the `multiaddrs` associated with `CPPeerID`, Client performs a DHT `FindPeer` request to find the `multiaddrs` associated with `CPPeerID`. 17. Client sends a Bitswap request for `CID` to the Content Provider (known `CPPeerID` and `multiaddrs`). From 9e8e640b1335014abd49723d10f78df8a776e7f8 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 8 Feb 2023 18:04:36 +0100 Subject: [PATCH 14/55] added more specific data formats --- IPIP/0000-double-hash-dht.md | 39 ++++++++++++++++++------------------ 1 file changed, 19 insertions(+), 20 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 4e3fd30c..51653301 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -34,12 +34,19 @@ The changes described in this document introduce a DHT privacy upgrade boosting ## Detailed Design +**Magic Values** +- bytes("CR_DOUBLEHASH") +- bytes("CR_SERVERKEY") +- AES [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L69): `aes-256 = 0xa2` +- Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x56` +- A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). +- Provider Record Timestamp (`TS`) validity period: `48h` + ### Definitions - - **`CID`** is the IPFS [Content IDentifier](https://github.com/multiformats/cid) -- **`MH`** is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the digest of a hash function over some content. -- **`HASH2`** is defined as `SHA256(bytes("CR_DOUBLEHASH") || MH)`. It represents the location of the Kademlia keyspace for the Provider Record associated with `CID`. +- **`MH`** is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the digest of a hash function over some content. `MH` is represented as a 32-byte array. +- **`HASH2`** is defined as `SHA256(bytes("CR_DOUBLEHASH") || MH)`. It represents the location of the Kademlia keyspace for the Provider Record associated with `CID`. `HASH2` is represented as a 32-byte array. `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)`. - **Content Provider** is the node storing some content, and advertising it to the DHT. - **DHT Servers** are nodes running the IPFS public DHT. In this documents, DHT Servers mostly refer to the DHT Servers storing the Provider Records associated with specific `CID`s, and not the DHT Servers helping routing lookup requests to the right keyspace location. - **Client** is an IPFS client looking up a content identified by an already known `CID`. @@ -47,23 +54,15 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **Lookup Process** is the process of the Client retrieving the content identified by `CID`. - **`PeerID`** s define stable [peer identities](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md). The `PeerID` is derived from the node's cryptographic public key. - **`multiaddrs`** are the [network addresses](https://github.com/libp2p/specs/tree/master/addressing) associated with a `PeerID`. It represents the location(s) of the peer. -- **`KeyPrefix`** is defined as a prefix of length `l` bits of `HASH2`. -- **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use it to encrypt the data sent to the Client during the lookup process. -- **`TS`** is the Timestamp (unix timestamp) when the Content Provider published the content. +- **`KeyPrefix`** is defined as a prefix of length `l` bits of `HASH2`. `KeyPrefix` is represented by a `byte` concatenated with a variable sized array of bytes, containing at most 32 bytes. The leading `byte` represents the binary representation of `l - 1`, making prefixes of length `256` possible, but not prefixes of length `0`. The trailing byte array is of length `ceil(l/8)` bytes, and its content is the bits prefix right padded with zeros. +- **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use it to encrypt the data sent to the Client during the lookup process. `ServerKey` is represented as a 32-byte array. +- **`TS`** is the [Unix Timestamp](https://en.wikipedia.org/wiki/Unix_time) corresponding content publish time. `TS` is represented as a 32-bit **unsigned** Integer, allowing timestamps to range from `1970-01-01T00:00:00Z` to `2106-02-07T06:28:15Z` before reaching the overflow. - **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. -- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, CPPeerID || RandomNonce)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L69) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the Nonce. +- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L69) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0xa2`, `Nonce`, `AESGCM(MH, Nonce, CPPeerID)`]. - **`Signature`** is the signature of the `EncPeerID` encrypted payload (not including the varint nor the nonce) and `TS` using the Content Provider's private key, either with ed25519 or rsa signature algorithm, depending on the keys of the Content Provider. - **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists on the following fields: [`EncPeerID`, `TS`, `Signature`]. - **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a nested dictionary/map: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. There is only one single correct `ServerKey` for each `HASH2`. However, any peer can forge a valid Publish request (with invalid `EncPeerID` but valid `Signature`) undetected by the DHT Server. The DHT server isn't able to distinguish which `ServerKey` is correct as it doesn't have the knowledge of `MH`, hence it has to keep both and serve both upon request for `HASH2`. -**Magic Values** -- bytes("CR_DOUBLEHASH") -- bytes("CR_SERVERKEY") -- AES [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L69): `aes-256 = 0xa2` -- Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x56` -- A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). -- Provider Record Timestamp (`TS`) validity period: `48h` - ### Current DHT The following process describes the event of a client looking up a CID in the IPFS DHT: @@ -85,13 +84,13 @@ The following process describes the event of a client looking up a CID in the IP **Publish Process** 1. Content Provider wants to publish some content with identifier `CID`. -2. Content Provider computes `HASH2`$\leftarrow{}$`SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). +2. Content Provider computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). 3. Content Provider starts a DHT lookup request for the 20 closest `PeerID`s in XOR distance to `HASH2`. -4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = varint || Nonce || AESGCM(MH, CPPeerID || Nonce)`, with `varint` indicating the encryption algorithm in use, here AESGCM. +4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = [0xa2, Nonce, AESGCM(MH, Nonce, CPPeerID)]` 5. Content Provider takes the current timestamp `TS`. 6. Content Provider signs `EncPeerID` and `TS` using its private key. `Signature = Sign(privkey, EncPeerID || TS)` 7. Content Provider computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. -8. Once the lookup request has returned the 20 closest peers, Content Provider sends a Publish request to these DHT servers. The Publish request contains [`HASH2`, `EncPeerID`, `TS`, `Signature`, `ServerKey`]. +8. Once the lookup request has returned the 20 closest peers, Content Provider sends a Publish request to these DHT servers. The Publish request contains [`HASH2`, `EncPeerID`, `TS`, `Signature`, `ServerKey`]. 9. Each DHT server verifies `Signature` against the `PeerID` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is younger than `48h` and isn't in the future. If invalid, send an error to the client. 10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `PeerID` of the Content Provider. If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. 11. Each DHT server confirms to Content Provider that the Provider Record has been successfully added. @@ -104,11 +103,11 @@ The following process describes the event of a client looking up a CID in the IP 3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. 4. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. 5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || Enc(ServerKey, TS || Signature || multiaddrs)`, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. +6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || SERVERNONCE || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. 7. The DHT servers send `message` to Client. 8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. 9. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 12. -10. If the `MatchLimit` and number of matching `HASH2`s was included in the `message`, Client makes multiple DHT lookup requests for longer prefixes (e.g `KeyPrefix||0` and `KeyPrefix||1`). Else Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. +10. If the DHT Server's `MatchLimit` and number of matching `HASH2`s was included in the `message`, Client makes multiple DHT lookup requests for longer prefixes (e.g `KeyPrefix||0` and `KeyPrefix||1`). Else Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. 11. Go to 4. 12. For each decrypted payload, Client decrypts `CPPeerID = Dec(MH, EncPeerID)`. 13. Client verifies that `Signature` verifies with `CPPeerID`: `Verify(CPPeerID, Signature, EncPeerID || TS)`. From dbad7fde49e39ff9616d5559614d82dcf395f311 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 14 Feb 2023 09:19:28 +0100 Subject: [PATCH 15/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Masih H. Derkani --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 51653301..8e4b398f 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -103,7 +103,7 @@ The following process describes the event of a client looking up a CID in the IP 3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. 4. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. 5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || SERVERNONCE || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. +6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following payload: `EncPeerID || SERVERNONCE || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated byte array of length 12, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. 7. The DHT servers send `message` to Client. 8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. 9. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 12. From 576be8bc3a13593bcc5e0b64458b710e6041a1dc Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 14 Feb 2023 09:20:04 +0100 Subject: [PATCH 16/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com> --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 8e4b398f..bf253480 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -48,7 +48,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **`MH`** is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the digest of a hash function over some content. `MH` is represented as a 32-byte array. - **`HASH2`** is defined as `SHA256(bytes("CR_DOUBLEHASH") || MH)`. It represents the location of the Kademlia keyspace for the Provider Record associated with `CID`. `HASH2` is represented as a 32-byte array. `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)`. - **Content Provider** is the node storing some content, and advertising it to the DHT. -- **DHT Servers** are nodes running the IPFS public DHT. In this documents, DHT Servers mostly refer to the DHT Servers storing the Provider Records associated with specific `CID`s, and not the DHT Servers helping routing lookup requests to the right keyspace location. +- **DHT Servers** are nodes running the IPFS public DHT. In this document, DHT Servers mostly refer to the DHT Servers storing the Provider Records associated with specific `CID`s, and not the DHT Servers helping routing lookup requests to the right keyspace location. - **Client** is an IPFS client looking up a content identified by an already known `CID`. - **Publish Process** is the process of the Content Provider communicating to the DHT Servers that it provides some content identified by `CID`. - **Lookup Process** is the process of the Client retrieving the content identified by `CID`. From 941f30a6faf38e8d992af928fad076e7d824eb91 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 14 Feb 2023 09:21:32 +0100 Subject: [PATCH 17/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com> --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index bf253480..7f147fed 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -60,7 +60,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. - **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L69) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0xa2`, `Nonce`, `AESGCM(MH, Nonce, CPPeerID)`]. - **`Signature`** is the signature of the `EncPeerID` encrypted payload (not including the varint nor the nonce) and `TS` using the Content Provider's private key, either with ed25519 or rsa signature algorithm, depending on the keys of the Content Provider. -- **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists on the following fields: [`EncPeerID`, `TS`, `Signature`]. +- **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists of the following fields: [`EncPeerID`, `TS`, `Signature`]. - **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a nested dictionary/map: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. There is only one single correct `ServerKey` for each `HASH2`. However, any peer can forge a valid Publish request (with invalid `EncPeerID` but valid `Signature`) undetected by the DHT Server. The DHT server isn't able to distinguish which `ServerKey` is correct as it doesn't have the knowledge of `MH`, hence it has to keep both and serve both upon request for `HASH2`. ### Current DHT From cac3d40e6c240c57af31fc47da56e2c96addf16f Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 14 Feb 2023 09:22:48 +0100 Subject: [PATCH 18/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com> --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 7f147fed..3866601b 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -125,7 +125,7 @@ The prefix `l` is derived from `k` and the number of CIDs published to the DHT: Each node keeps track of the number of `HASH2` matching the last `KeyPrefix` requested in the last 128 lookups. `a` is defined as the average number of matches for the last 128 requests. At any point in time, if $a \gt 2\times k$, then `l` should increase (`l = l + 1`), and if $a \lt \frac{k}{2}$, then `l` should decrease (`l = l - 1`). On node shutdown, `a` is saved on disk, allowing a quick restart with an accurate `l` value. -Note that DHT Servers can set an upperbound on the number of Provider Records they serve for each lookup request. So a too small `l` may result in not discovering the target Provider Record. +Note that DHT Servers can set an upperbound on the number of Provider Records they serve for each lookup request. So a very small value for `l` may result in not discovering the target Provider Record. **Prefix magic numbers** - `k`-anonymity privacy parameter, by default `k = 8` From c046b7b1a9aacdf3191e000064fc520f4f97d053 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 14 Feb 2023 09:23:19 +0100 Subject: [PATCH 19/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com> --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 3866601b..1c1bf8f1 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -209,7 +209,7 @@ When a Content Provider republishes a Provider Record, the DHT Server only keeps Default: `k = 8`. Default: `MatchLimit = 64`. -The `k`-anonymity parameter `k` is user defined, it can be modified in the configuration files. Users requiring a higher level of privacy can increase their value of `k`. `8` is deemed to be private enough for standard IPFS users, while limiting the overhead in packet size of the DHT Server response to 8x. +The `k`-anonymity parameter `k` is user defined, it can be modified in the configuration files. Users requiring a higher level of privacy can increase their value of `k`. `k=8` is deemed to be private enough for standard IPFS users, while limiting the overhead in packet size of the DHT Server response to 8x. The `MatchLimit` prevents malformed or malicious requests to match all Provider Records that a DHT Server is providing at once. A Client can still fetch all Provider Records matching any `KeyPrefix`, but it must perform multiple DHT lookup requests for enough prefixes to the DHT Server. The `MatchLimit` protects the Server from having to send large amounts of data at once. `64` is already a large value, given that each `HASH2` can be associated with multiple Provider Records, one for each Content Provider, and the multiaddresses of all Content Providers can be sent along. The DHT provides _on average_ at most `64-anonymity` out-of-the-box and a better privacy level can be reached by sending multiple requests. From df3d0399a1ab8d6f080b38ffe3a8eba3288b0df7 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 14 Feb 2023 09:23:37 +0100 Subject: [PATCH 20/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com> --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 1c1bf8f1..4f164026 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -283,7 +283,7 @@ The Double Hashing DHT prevents DHT Server nodes to associate a Client's `PeerID A powerful adversary could crawl all discoverable `CID`s, e.g by sniffing Bitswap broadcasts or browsing the Web to discover new `CID`s. From this list of `CID`s, the adversary can compute the `HASH2`s associated with all the `CID`s and get a mapping `HASH2` $\rightarrow$ `CID` for many `CID`s. This adversary can run many DHT Servers, and upon request for some `Prefix`, check which `HASH2` are matching the `Prefix`. Using frequency analysis, the adversary can take an educated guess on which content the client is requesting. If the requested content is unknown to the adversary, but the adversary knows its `CID`, the adversary can trivially resolve the Content Providers from the DHT, and fetch the content over Bitswap. -DHT Servers serving the requested Provider Record to the Client has the ability to associate the Client's `PeerID` with the Content Providers `PeerID`. It can track _from which peer a Client is fetching content_. +DHT Servers serving the requested Provider Record to the Client have the ability to associate the Client's `PeerID` with the Content Providers `PeerID`. It can track _from which peer a Client is fetching content_ at the Bitswap level. The proposed solution makes _association attacks_ (associating the Client's `PeerID` with the requested `CID`) much more expensive for _public content_, but doesn't make them impossible to perform. However, malicious users cannot discover _private content_, and spy on users accessing it. If Alice advertises her holiday pictures to the public IPFS DHT and privately sends the root `CID` to Bob only, no adversary can retrieve the pictures, and no adversary can learn what Bob is accessing. Only the DHT Servers serving the Provider Record to Bob know that Bob is requesting some content from Alice's `PeerID`. From 5556519dd87767997978fa3b40a1f6419f8242fb Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 14 Feb 2023 09:24:34 +0100 Subject: [PATCH 21/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com> --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 4f164026..beb1119b 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -219,7 +219,7 @@ The `MatchLimit` prevents malformed or malicious requests to match all Provider **Double Hashing** -Currently any DHT Server observing a request can associate the Client's `PeerID` with the requested `CID`. If the `CID` is not already known, curious DHT Servers observing a DHT request can replay the request, and retrieve the content that the client is accessing, which is a significant privacy concern. Using `HASH2` as DHT Content Identifier prevents curious DHT Servers not knowing `MH`, the preimage of `HASH2` from retrieving the content associated with `HASH2`. Curious DHT Servers can still replay the DHT request for `HASH2` and find the Content Providers. However, they are not able to make a valid Bitswap request to the Content Providers, for they don't have the knowledge of the Content Identifier used by Bitswap (`CID`) for the content being identified by `HASH2` in the DHT. +Currently any DHT Server observing a request can associate the Client's `PeerID` with the requested `CID`. If the `CID` is not already known, curious DHT Servers observing a DHT request can replay the request, and retrieve the content that the client is accessing, which is a significant privacy concern. Using `HASH2` as DHT Content Identifier prevents curious DHT Servers that do not know the `MH`, i.e., the preimage of `HASH2`, from retrieving the content associated with `HASH2`. Curious DHT Servers can still replay the DHT request for `HASH2` and find the Content Providers. However, they are not able to make a valid Bitswap request to the Content Providers, for they don't have a way to know the Content Identifier used by Bitswap (`CID`) for the content being identified by `HASH2` in the DHT. Double Hashing is also necessary for Prefix Requests and Provider Record Encryption. From b7fca73e51a798758e5bacdaeda9cfffe7d388b1 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 14 Feb 2023 09:24:54 +0100 Subject: [PATCH 22/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com> --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index beb1119b..48e354c3 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -229,7 +229,7 @@ A Prefix Request consists in requesting a Prefix of a key, instead of a full len This provides [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) when a curious DHT Server tries to associate the Client's `PeerID` with the requested `HASH2`, with `k` defined as the average number of Provider Records matching a Prefix of length `l`. `k` is a system parameter and defines the `k`-anonymity level, and `l` is derived from `k` (see [Prefix Length Selection](#prefix-length-selection)). Prefix Request also enables [Plausible Deniability](https://en.wikipedia.org/wiki/Deniable_encryption) for the Client. The DHT Server cannot prove that a Client identified by its `PeerID` or `IP Address` tried to access some content identified by its `HASH2`. -However Prefix Requests don't offer [`l`-diversity](https://en.wikipedia.org/wiki/L-diversity) nor [`t`-closeness](https://en.wikipedia.org/wiki/T-closeness), as frequency analysis is still easy to perform. For example, a `Prefix` matches a very popular Provider Records and a few unpopular ones. The DHT Server nodes can take a better-than-random guess when a new request is received for this `Prefix` that there is a higher chance that the Client is requesting the popular content's Provider Record compared with an unpopular one. However, the DHT Server cannot prove the the Client has accessed the popular content. +However Prefix Requests don't offer [`l`-diversity](https://en.wikipedia.org/wiki/L-diversity) nor [`t`-closeness](https://en.wikipedia.org/wiki/T-closeness), as frequency analysis is still easy to perform. For example, a `Prefix` matches a very popular Provider Record and a few unpopular ones. The DHT Server nodes can take a better-than-random guess when a new request is received for this `Prefix` that there is a higher chance that the Client is requesting the popular content's Provider Record compared with an unpopular one. However, the DHT Server cannot prove the the Client has accessed the popular content. **Provider Record Encryption** From cd225d581c4fd7398d5cb4a71561352859beea9c Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 14 Feb 2023 09:25:17 +0100 Subject: [PATCH 23/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com> --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 48e354c3..95c89e3c 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -233,7 +233,7 @@ However Prefix Requests don't offer [`l`-diversity](https://en.wikipedia.org/wik **Provider Record Encryption** -Provider Record Encryption also builds on top of Double Hashing. The Provider Record Encryption prevents curious DHT Servers observing a request for `Prefix` but not storing any Provider Record matching `Prefix`, to replay the request for `Prefix` and get all published keys matching `Prefix` including `HASH2` of the content accessed by the Client. It prevents all curious actors from building a global dictionary of `HASH2` to Content Providers for all content published in the IPFS public DHT. It is necessary to know the `MH` of the content (included in the `CID`) to learn about its Content Providers. +Provider Record Encryption also builds on top of Double Hashing. The Provider Record Encryption prevents curious DHT Servers observing a request for `Prefix` (but not storing any Provider Record matching `Prefix`), to replay the request for `Prefix` and get all published keys matching `Prefix` including `HASH2` of the content accessed by the Client. It prevents all curious actors from building a global dictionary of `HASH2` to Content Providers for all content published in the IPFS public DHT. It is necessary to know the `MH` of the content (included in the `CID`) to learn about its Content Providers. Curious DHT Servers observing a request from `PeerID` for `Prefix` cannot associate the Client's `PeerID` with the Content Provider's `PeerID` because they cannot read the Provider Record. From 1b189df1a2946fa1e0156db7b9a30a357156c8cc Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 14 Feb 2023 09:25:45 +0100 Subject: [PATCH 24/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com> --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 95c89e3c..c06ee3ed 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -240,7 +240,7 @@ Curious DHT Servers observing a request from `PeerID` for `Prefix` cannot associ ### Writer Privacy Writer Privacy is NOT the goal of this design. However, as a side effect, Write Privacy gets improved in some specific cases. -- Content Providers do NOT get any additional privacy from the Client fetching the data +- Content Providers do NOT get any additional privacy when sending data to Clients through Bitswap. - Content Providers can now hide to the DHT Server peers hosting their Provider Records which data they are serving, as long as the DHT Servers don't know the preimage of `HASH2`: `MH`. The DHT Servers are not able to query the content associated with the Provider Records they are storing. However, they can approximately monitor the number of requests associated to the content by observing the requests in the keysubspace matching to `Prefix` of `HASH2` of the content. DHT Servers can take an educated guess on the association of `HASH2` with the Content Provider's `PeerID`. The DHT Servers storing the Provider Record are able to share information about the Content Provider with potential accomplices. - Content Providers get additional privacy from curious DHT Servers observing a request, but NOT storing the Provider Record. These DHT Servers can still replay the DHT request, but are unable to discover the Content Provider's `PeerID` associated with `Preix` because the Provider Records are encrypted, and the content itself. This holds as long as the DHT Servers don't know the `MH` (or `CID`). From c8dc97573158780caeed388acd0bcc7c2e6444ec Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 14 Feb 2023 09:26:27 +0100 Subject: [PATCH 25/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com> --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index c06ee3ed..0103044e 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -242,7 +242,7 @@ Curious DHT Servers observing a request from `PeerID` for `Prefix` cannot associ Writer Privacy is NOT the goal of this design. However, as a side effect, Write Privacy gets improved in some specific cases. - Content Providers do NOT get any additional privacy when sending data to Clients through Bitswap. - Content Providers can now hide to the DHT Server peers hosting their Provider Records which data they are serving, as long as the DHT Servers don't know the preimage of `HASH2`: `MH`. The DHT Servers are not able to query the content associated with the Provider Records they are storing. However, they can approximately monitor the number of requests associated to the content by observing the requests in the keysubspace matching to `Prefix` of `HASH2` of the content. DHT Servers can take an educated guess on the association of `HASH2` with the Content Provider's `PeerID`. The DHT Servers storing the Provider Record are able to share information about the Content Provider with potential accomplices. -- Content Providers get additional privacy from curious DHT Servers observing a request, but NOT storing the Provider Record. These DHT Servers can still replay the DHT request, but are unable to discover the Content Provider's `PeerID` associated with `Preix` because the Provider Records are encrypted, and the content itself. This holds as long as the DHT Servers don't know the `MH` (or `CID`). +- Content Providers get additional privacy from curious DHT Servers observing a request, but NOT storing the Provider Record. These DHT Servers can still replay the DHT request, but are unable to discover the Content Provider's `PeerID` associated with `Prefix` because the Provider Records are encrypted, and the content itself. This holds as long as the DHT Servers don't know the `MH` (or `CID`). ### Provider Record Authenticity From c118325ccf1e46f7aaa70d437be97ea6ceef3044 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Tue, 14 Feb 2023 09:54:35 +0100 Subject: [PATCH 26/55] addressed reviews --- IPIP/0000-double-hash-dht.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 51653301..86627a4e 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -37,7 +37,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting **Magic Values** - bytes("CR_DOUBLEHASH") - bytes("CR_SERVERKEY") -- AES [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L69): `aes-256 = 0xa2` +- AES-GCM [varint](https://github.com/multiformats/multicodec/pull/314): `aes-gcm-256 = 0xa5` - Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x56` - A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). - Provider Record Timestamp (`TS`) validity period: `48h` @@ -55,10 +55,10 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **`PeerID`** s define stable [peer identities](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md). The `PeerID` is derived from the node's cryptographic public key. - **`multiaddrs`** are the [network addresses](https://github.com/libp2p/specs/tree/master/addressing) associated with a `PeerID`. It represents the location(s) of the peer. - **`KeyPrefix`** is defined as a prefix of length `l` bits of `HASH2`. `KeyPrefix` is represented by a `byte` concatenated with a variable sized array of bytes, containing at most 32 bytes. The leading `byte` represents the binary representation of `l - 1`, making prefixes of length `256` possible, but not prefixes of length `0`. The trailing byte array is of length `ceil(l/8)` bytes, and its content is the bits prefix right padded with zeros. -- **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use it to encrypt the data sent to the Client during the lookup process. `ServerKey` is represented as a 32-byte array. +- **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use `ServerKey` to encrypt `TS`, `Signature` and Content Providers `multiaddrs` sent to the Client when some Provider Records match the requested `Prefix`. `ServerKey` is represented as a 32-byte array. - **`TS`** is the [Unix Timestamp](https://en.wikipedia.org/wiki/Unix_time) corresponding content publish time. `TS` is represented as a 32-bit **unsigned** Integer, allowing timestamps to range from `1970-01-01T00:00:00Z` to `2106-02-07T06:28:15Z` before reaching the overflow. - **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. -- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L69) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0xa2`, `Nonce`, `AESGCM(MH, Nonce, CPPeerID)`]. +- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/pull/314) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0xa2`, `Nonce`, `AESGCM(MH, Nonce, CPPeerID)`]. - **`Signature`** is the signature of the `EncPeerID` encrypted payload (not including the varint nor the nonce) and `TS` using the Content Provider's private key, either with ed25519 or rsa signature algorithm, depending on the keys of the Content Provider. - **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists on the following fields: [`EncPeerID`, `TS`, `Signature`]. - **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a nested dictionary/map: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. There is only one single correct `ServerKey` for each `HASH2`. However, any peer can forge a valid Publish request (with invalid `EncPeerID` but valid `Signature`) undetected by the DHT Server. The DHT server isn't able to distinguish which `ServerKey` is correct as it doesn't have the knowledge of `MH`, hence it has to keep both and serve both upon request for `HASH2`. @@ -92,7 +92,7 @@ The following process describes the event of a client looking up a CID in the IP 7. Content Provider computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. 8. Once the lookup request has returned the 20 closest peers, Content Provider sends a Publish request to these DHT servers. The Publish request contains [`HASH2`, `EncPeerID`, `TS`, `Signature`, `ServerKey`]. 9. Each DHT server verifies `Signature` against the `PeerID` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is younger than `48h` and isn't in the future. If invalid, send an error to the client. -10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `PeerID` of the Content Provider. If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. +10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `PeerID` of the Content Provider (see [provider store](#provider-store)). If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. 11. Each DHT server confirms to Content Provider that the Provider Record has been successfully added. 12. The proces is over once Content Provider has received 20 confirmations. @@ -197,7 +197,7 @@ signature, err := privKey.Sign(data) ### Provider Store -The data structure of the DHT Servers' Provider Store is a nested dictionary/map whose structure is: `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`]. +The data structure of the DHT Servers' Provider Store is a nested key-value store whose structure is: `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`]. The same `HASH2` always produces the same `ServerKey` (as long as the same Hashing Algorithm was used), as both `HASH2` and `ServerKey` result in a deterministic hash operation on `MH` prepended with a constant prefix. However, a misbehaving node could publish an advertisement for `HASH2` while not knowing `MH`, and forge a random `ServerKey`. The DHT Server not knowing `MH` cannot determine which `ServerKey` is the one associated with `HASH2`, and hence need to keep all different `ServerKey`s. However, the number of forged `ServerKey`s is expected to be small as the Client aren't able to decrypt payload encrypted with a forged `ServerKey`, and detect that the Provider Record isn't legitimate. The only reason a misbehaving peer would want to publish forged `ServerKey`s is to exhaust the storage resources of a specific target DHT Server. @@ -225,7 +225,7 @@ Double Hashing is also necessary for Prefix Requests and Provider Record Encrypt **Prefix Requests** -A Prefix Request consists in requesting a Prefix of a key, instead of a full length Kademlia key. A Prefix corresponds to a branch of the binary tree, and potentially matches multiple existing keys. Prefix Request Routing works exactly like the normal Kademlia Routing, hence a DHT Prefix Request always converges eventually. The goal of Prefix Requests is to match multiple Provider Records for a single request. Instead of requesting `HASH2` the Client now requests `Prefix`, a prefix of `HASH2` of length `l` bits, and the DHT Server storing the Provider Records matching to `Prefix` doesn't know exactly which content is accessed and returns all Provider Records whose `HASH2` matches `Prefix`. +A Prefix Request consists in requesting a Prefix of a key, instead of a full length Kademlia key. A Prefix corresponds to a branch of the binary trie, and potentially matches multiple existing keys. Prefix Request Routing works exactly like the normal Kademlia Routing, hence a DHT Prefix Request eventually always converges. The goal of Prefix Requests is to match multiple Provider Records for a single request. Instead of requesting `HASH2` the Client now requests `Prefix`, a prefix of `HASH2` of length `l` bits, and the DHT Server storing the Provider Records matching to `Prefix` doesn't know exactly which content is accessed and returns all Provider Records whose `HASH2` matches `Prefix`. With Double Hashing, curious DHT Servers cannot associate `CID` with the requester `PeerID` anymore, but they can associate `HASH2` with `PeerID`. Prefix Requests makes it harder for curious DHT Servers to associate `PeerID` to a specific `HASH2`, as they only learn a `Prefix` of `HASH2`. This provides [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) when a curious DHT Server tries to associate the Client's `PeerID` with the requested `HASH2`, with `k` defined as the average number of Provider Records matching a Prefix of length `l`. `k` is a system parameter and defines the `k`-anonymity level, and `l` is derived from `k` (see [Prefix Length Selection](#prefix-length-selection)). Prefix Request also enables [Plausible Deniability](https://en.wikipedia.org/wiki/Deniable_encryption) for the Client. The DHT Server cannot prove that a Client identified by its `PeerID` or `IP Address` tried to access some content identified by its `HASH2`. From 422f1d599c82b7d442315d967bf145bdc927faa4 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 14 Feb 2023 09:58:53 +0100 Subject: [PATCH 27/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com> --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 3e32f53d..169cf80d 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -241,7 +241,7 @@ Curious DHT Servers observing a request from `PeerID` for `Prefix` cannot associ Writer Privacy is NOT the goal of this design. However, as a side effect, Write Privacy gets improved in some specific cases. - Content Providers do NOT get any additional privacy when sending data to Clients through Bitswap. -- Content Providers can now hide to the DHT Server peers hosting their Provider Records which data they are serving, as long as the DHT Servers don't know the preimage of `HASH2`: `MH`. The DHT Servers are not able to query the content associated with the Provider Records they are storing. However, they can approximately monitor the number of requests associated to the content by observing the requests in the keysubspace matching to `Prefix` of `HASH2` of the content. DHT Servers can take an educated guess on the association of `HASH2` with the Content Provider's `PeerID`. The DHT Servers storing the Provider Record are able to share information about the Content Provider with potential accomplices. +- Content Providers can now hide to the DHT Server peers hosting their Provider Records which data they are serving, as long as the DHT Servers don't know the preimage of `HASH2`: `MH`. This is because the DHT Servers are not able to query the content associated with the Provider Records they are storing. However, they can approximately monitor the number of requests associated to the content by observing the requests in the keysubspace matching to `Prefix` of `HASH2` of the content. DHT Servers can take an educated guess on the association of `HASH2` with the Content Provider's `PeerID`. Recall, however, that `HASH2` cannot be associated with the actual CID of the content. The DHT Servers storing the Provider Record are able to share information about the Content Provider with potential accomplices. - Content Providers get additional privacy from curious DHT Servers observing a request, but NOT storing the Provider Record. These DHT Servers can still replay the DHT request, but are unable to discover the Content Provider's `PeerID` associated with `Prefix` because the Provider Records are encrypted, and the content itself. This holds as long as the DHT Servers don't know the `MH` (or `CID`). ### Provider Record Authenticity From fa0c5a408e5ede000c12bee3882614f6ad3abea6 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Tue, 14 Feb 2023 10:56:09 +0100 Subject: [PATCH 28/55] addressed reviews --- IPIP/0000-double-hash-dht.md | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 169cf80d..6c2ca6a6 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -48,7 +48,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **`MH`** is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the digest of a hash function over some content. `MH` is represented as a 32-byte array. - **`HASH2`** is defined as `SHA256(bytes("CR_DOUBLEHASH") || MH)`. It represents the location of the Kademlia keyspace for the Provider Record associated with `CID`. `HASH2` is represented as a 32-byte array. `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)`. - **Content Provider** is the node storing some content, and advertising it to the DHT. -- **DHT Servers** are nodes running the IPFS public DHT. In this document, DHT Servers mostly refer to the DHT Servers storing the Provider Records associated with specific `CID`s, and not the DHT Servers helping routing lookup requests to the right keyspace location. +- **DHT Servers** are nodes running the IPFS public DHT. In this documents, DHT Servers mostly refer to the DHT Servers storing the Provider Records associated with specific `CID`s, and not the DHT Servers helping routing lookup requests to the right keyspace location. - **Client** is an IPFS client looking up a content identified by an already known `CID`. - **Publish Process** is the process of the Content Provider communicating to the DHT Servers that it provides some content identified by `CID`. - **Lookup Process** is the process of the Client retrieving the content identified by `CID`. @@ -60,7 +60,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. - **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/pull/314) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0xa2`, `Nonce`, `AESGCM(MH, Nonce, CPPeerID)`]. - **`Signature`** is the signature of the `EncPeerID` encrypted payload (not including the varint nor the nonce) and `TS` using the Content Provider's private key, either with ed25519 or rsa signature algorithm, depending on the keys of the Content Provider. -- **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists of the following fields: [`EncPeerID`, `TS`, `Signature`]. +- **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists on the following fields: [`EncPeerID`, `TS`, `Signature`]. - **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a nested dictionary/map: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. There is only one single correct `ServerKey` for each `HASH2`. However, any peer can forge a valid Publish request (with invalid `EncPeerID` but valid `Signature`) undetected by the DHT Server. The DHT server isn't able to distinguish which `ServerKey` is correct as it doesn't have the knowledge of `MH`, hence it has to keep both and serve both upon request for `HASH2`. ### Current DHT @@ -103,7 +103,7 @@ The following process describes the event of a client looking up a CID in the IP 3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. 4. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. 5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following payload: `EncPeerID || SERVERNONCE || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated byte array of length 12, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. +6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || SERVERNONCE || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. 7. The DHT servers send `message` to Client. 8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. 9. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 12. @@ -125,7 +125,7 @@ The prefix `l` is derived from `k` and the number of CIDs published to the DHT: Each node keeps track of the number of `HASH2` matching the last `KeyPrefix` requested in the last 128 lookups. `a` is defined as the average number of matches for the last 128 requests. At any point in time, if $a \gt 2\times k$, then `l` should increase (`l = l + 1`), and if $a \lt \frac{k}{2}$, then `l` should decrease (`l = l - 1`). On node shutdown, `a` is saved on disk, allowing a quick restart with an accurate `l` value. -Note that DHT Servers can set an upperbound on the number of Provider Records they serve for each lookup request. So a very small value for `l` may result in not discovering the target Provider Record. +Note that DHT Servers can set an upperbound on the number of Provider Records they serve for each lookup request. So a too small `l` may result in not discovering the target Provider Record. **Prefix magic numbers** - `k`-anonymity privacy parameter, by default `k = 8` @@ -209,7 +209,7 @@ When a Content Provider republishes a Provider Record, the DHT Server only keeps Default: `k = 8`. Default: `MatchLimit = 64`. -The `k`-anonymity parameter `k` is user defined, it can be modified in the configuration files. Users requiring a higher level of privacy can increase their value of `k`. `k=8` is deemed to be private enough for standard IPFS users, while limiting the overhead in packet size of the DHT Server response to 8x. +The `k`-anonymity parameter `k` is user defined, it can be modified in the configuration files. Users requiring a higher level of privacy can increase their value of `k`. `8` is deemed to be private enough for standard IPFS users, while limiting the overhead in packet size of the DHT Server response to 8x. The `MatchLimit` prevents malformed or malicious requests to match all Provider Records that a DHT Server is providing at once. A Client can still fetch all Provider Records matching any `KeyPrefix`, but it must perform multiple DHT lookup requests for enough prefixes to the DHT Server. The `MatchLimit` protects the Server from having to send large amounts of data at once. `64` is already a large value, given that each `HASH2` can be associated with multiple Provider Records, one for each Content Provider, and the multiaddresses of all Content Providers can be sent along. The DHT provides _on average_ at most `64-anonymity` out-of-the-box and a better privacy level can be reached by sending multiple requests. @@ -219,30 +219,30 @@ The `MatchLimit` prevents malformed or malicious requests to match all Provider **Double Hashing** -Currently any DHT Server observing a request can associate the Client's `PeerID` with the requested `CID`. If the `CID` is not already known, curious DHT Servers observing a DHT request can replay the request, and retrieve the content that the client is accessing, which is a significant privacy concern. Using `HASH2` as DHT Content Identifier prevents curious DHT Servers that do not know the `MH`, i.e., the preimage of `HASH2`, from retrieving the content associated with `HASH2`. Curious DHT Servers can still replay the DHT request for `HASH2` and find the Content Providers. However, they are not able to make a valid Bitswap request to the Content Providers, for they don't have a way to know the Content Identifier used by Bitswap (`CID`) for the content being identified by `HASH2` in the DHT. +Currently any DHT Server observing a request can associate the Client's `PeerID` with the requested `CID`. If the `CID` is not already known, curious DHT Servers observing a DHT request can replay the request, and retrieve the content that the client is accessing, which is a significant privacy concern. Using `HASH2` as DHT Content Identifier prevents curious DHT Servers not knowing `MH`, the preimage of `HASH2` from retrieving the content associated with `HASH2`. Curious DHT Servers can still replay the DHT request for `HASH2` and find the Content Providers. However, they are not able to make a valid Bitswap request to the Content Providers, for they don't have the knowledge of the Content Identifier used by Bitswap (`CID`) for the content being identified by `HASH2` in the DHT. Double Hashing is also necessary for Prefix Requests and Provider Record Encryption. **Prefix Requests** -A Prefix Request consists in requesting a Prefix of a key, instead of a full length Kademlia key. A Prefix corresponds to a branch of the binary trie, and potentially matches multiple existing keys. Prefix Request Routing works exactly like the normal Kademlia Routing, hence a DHT Prefix Request eventually always converges. The goal of Prefix Requests is to match multiple Provider Records for a single request. Instead of requesting `HASH2` the Client now requests `Prefix`, a prefix of `HASH2` of length `l` bits, and the DHT Server storing the Provider Records matching to `Prefix` doesn't know exactly which content is accessed and returns all Provider Records whose `HASH2` matches `Prefix`. With Double Hashing, curious DHT Servers cannot associate `CID` with the requester `PeerID` anymore, but they can associate `HASH2` with `PeerID`. Prefix Requests makes it harder for curious DHT Servers to associate `PeerID` to a specific `HASH2`, as they only learn a `Prefix` of `HASH2`. +A Prefix Request consists in requesting a Prefix of a key, instead of a full length Kademlia key. A Prefix corresponds to a branch of the binary trie, and potentially matches multiple existing keys. Prefix Request Routing works exactly like the normal Kademlia Routing, hence a DHT Prefix Request eventually always converges. The goal of Prefix Requests is to match multiple Provider Records for a single request. Instead of requesting `HASH2` the Client now requests `Prefix`, a prefix of `HASH2` of length `l` bits, and the DHT Server storing the Provider Records matching to `Prefix` doesn't know exactly which content is accessed and returns all Provider Records whose `HASH2` matches `Prefix`. With Double Hashing, curious DHT Servers cannot associate `CID` with the requester `PeerID` anymore, but they can associate `HASH2` with `PeerID`. Prefix Requests make it harder for curious DHT Servers to associate `PeerID` to a specific `HASH2`, as they only learn a `Prefix` of `HASH2`. This provides [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) when a curious DHT Server tries to associate the Client's `PeerID` with the requested `HASH2`, with `k` defined as the average number of Provider Records matching a Prefix of length `l`. `k` is a system parameter and defines the `k`-anonymity level, and `l` is derived from `k` (see [Prefix Length Selection](#prefix-length-selection)). Prefix Request also enables [Plausible Deniability](https://en.wikipedia.org/wiki/Deniable_encryption) for the Client. The DHT Server cannot prove that a Client identified by its `PeerID` or `IP Address` tried to access some content identified by its `HASH2`. -However Prefix Requests don't offer [`l`-diversity](https://en.wikipedia.org/wiki/L-diversity) nor [`t`-closeness](https://en.wikipedia.org/wiki/T-closeness), as frequency analysis is still easy to perform. For example, a `Prefix` matches a very popular Provider Record and a few unpopular ones. The DHT Server nodes can take a better-than-random guess when a new request is received for this `Prefix` that there is a higher chance that the Client is requesting the popular content's Provider Record compared with an unpopular one. However, the DHT Server cannot prove the the Client has accessed the popular content. +However Prefix Requests don't offer [`l`-diversity](https://en.wikipedia.org/wiki/L-diversity) nor [`t`-closeness](https://en.wikipedia.org/wiki/T-closeness), as frequency analysis is still easy to perform. For example, a `Prefix` matches a very popular Provider Records and a few unpopular ones. The DHT Server nodes can take a better-than-random guess when a new request is received for this `Prefix` that there is a higher chance that the Client is requesting the popular content's Provider Record compared with an unpopular one. However, the DHT Server cannot prove the the Client has accessed the popular content. **Provider Record Encryption** -Provider Record Encryption also builds on top of Double Hashing. The Provider Record Encryption prevents curious DHT Servers observing a request for `Prefix` (but not storing any Provider Record matching `Prefix`), to replay the request for `Prefix` and get all published keys matching `Prefix` including `HASH2` of the content accessed by the Client. It prevents all curious actors from building a global dictionary of `HASH2` to Content Providers for all content published in the IPFS public DHT. It is necessary to know the `MH` of the content (included in the `CID`) to learn about its Content Providers. +Provider Record Encryption also builds on top of Double Hashing. The Provider Record Encryption prevents curious DHT Servers observing a request for `Prefix` but not storing any Provider Record matching `Prefix`, to replay the request for `Prefix` and get all published keys matching `Prefix` including `HASH2` of the content accessed by the Client. It prevents all curious actors from building a global dictionary of `HASH2` to Content Providers for all content published in the IPFS public DHT. It is necessary to know the `MH` of the content (included in the `CID`) to learn about its Content Providers. Curious DHT Servers observing a request from `PeerID` for `Prefix` cannot associate the Client's `PeerID` with the Content Provider's `PeerID` because they cannot read the Provider Record. ### Writer Privacy Writer Privacy is NOT the goal of this design. However, as a side effect, Write Privacy gets improved in some specific cases. -- Content Providers do NOT get any additional privacy when sending data to Clients through Bitswap. -- Content Providers can now hide to the DHT Server peers hosting their Provider Records which data they are serving, as long as the DHT Servers don't know the preimage of `HASH2`: `MH`. This is because the DHT Servers are not able to query the content associated with the Provider Records they are storing. However, they can approximately monitor the number of requests associated to the content by observing the requests in the keysubspace matching to `Prefix` of `HASH2` of the content. DHT Servers can take an educated guess on the association of `HASH2` with the Content Provider's `PeerID`. Recall, however, that `HASH2` cannot be associated with the actual CID of the content. The DHT Servers storing the Provider Record are able to share information about the Content Provider with potential accomplices. -- Content Providers get additional privacy from curious DHT Servers observing a request, but NOT storing the Provider Record. These DHT Servers can still replay the DHT request, but are unable to discover the Content Provider's `PeerID` associated with `Prefix` because the Provider Records are encrypted, and the content itself. This holds as long as the DHT Servers don't know the `MH` (or `CID`). +- Content Providers do NOT get any additional privacy from the Client fetching the data +- Content Providers can now hide to the DHT Server peers hosting their Provider Records which data they are serving, as long as the DHT Servers don't know the preimage of `HASH2`: `MH`. The DHT Servers are not able to query the content associated with the Provider Records they are storing. However, they can approximately monitor the number of requests associated to the content by observing the requests in the keysubspace matching to `Prefix` of `HASH2` of the content. DHT Servers can take an educated guess on the association of `HASH2` with the Content Provider's `PeerID`. The DHT Servers storing the Provider Record are able to share information about the Content Provider with potential accomplices. For instance, a coalition of curious DHT Servers could share with each other for each Provider Record, identified by `HASH2`, the list of Content Providers, and the number of received Prefix Requests matching `HASH2`. This results in monitoring all content advertised by all `PeerID`s and estimating the number of requests they are serving. +- Content Providers get additional privacy from curious DHT Servers observing a request, but NOT from the ones storing the Provider Record. These DHT Servers can still replay the DHT request, but are unable to discover the content and the Content Provider's `PeerID` associated with `Prefix`, because the Provider Records are encrypted using `MH`. This holds as long as the DHT Servers don't know the `MH` (or `CID`). ### Provider Record Authenticity @@ -250,7 +250,7 @@ The Provider Records are now signed by the Content Provider. This prevents a mal ### Provider Records Enumeration -Enumerating the number of Provider Records in the DHT becomes trivial thank to the Double Hashing and Prefix Requests. Knowledge of the preimage of the requested key isn't required anymore for a valid Kademlia request. An easy Provider Records Enumeration, or Approximation if crawling the complete DHT isn't an option enables a better monitoring of the DHT load and activity. +Enumerating the number of Provider Records in the DHT becomes trivial thank to the Double Hashing and Prefix Requests. Knowledge of the preimage of the requested key isn't required anymore for a valid Kademlia request. Monitoring the number of Provider Records stored in the DHT is a good metric to evaluate the health of the DHT. ### Better Kademlia Routing Table Refresh @@ -281,11 +281,11 @@ Alternatives for migration: The Double Hashing DHT prevents DHT Server nodes to associate a Client's `PeerID` with the Content requested by the Client. DHT Servers no longer know _which Client is accessing which content_. This protection only works as long as the DHT Servers don't know the `CID` requested by the Client. Thus, the privacy of a request depends on the secrecy of the requested `CID`. -A powerful adversary could crawl all discoverable `CID`s, e.g by sniffing Bitswap broadcasts or browsing the Web to discover new `CID`s. From this list of `CID`s, the adversary can compute the `HASH2`s associated with all the `CID`s and get a mapping `HASH2` $\rightarrow$ `CID` for many `CID`s. This adversary can run many DHT Servers, and upon request for some `Prefix`, check which `HASH2` are matching the `Prefix`. Using frequency analysis, the adversary can take an educated guess on which content the client is requesting. If the requested content is unknown to the adversary, but the adversary knows its `CID`, the adversary can trivially resolve the Content Providers from the DHT, and fetch the content over Bitswap. +The proposed solution makes _association attacks_ (associating the Client's `PeerID` with the requested `CID`) much more expensive for _public content_, but doesn't make them impossible to perform. However, malicious users cannot discover _private content_, and spy on users accessing it. If Alice advertises her holiday pictures to the public IPFS DHT and privately sends the root `CID` to Bob only, no adversary can retrieve the pictures, and no adversary can learn what Bob is accessing. Only the DHT Servers serving the Provider Record to Bob know that Bob is requesting some content from Alice's `PeerID`. -DHT Servers serving the requested Provider Record to the Client have the ability to associate the Client's `PeerID` with the Content Providers `PeerID`. It can track _from which peer a Client is fetching content_ at the Bitswap level. +A powerful adversary could crawl all discoverable `CID`s, e.g by sniffing Bitswap broadcasts or browsing the Web to discover new `CID`s. From this list of `CID`s, the adversary can compute the `HASH2`s associated with all the `CID`s and get a mapping `HASH2` $\rightarrow$ `CID` for many `CID`s. This adversary can run many DHT Servers, and upon request for some `Prefix`, check which `HASH2` are matching the `Prefix`. Using frequency analysis, the adversary can take an educated guess on which content the client is requesting. If the requested content is unknown to the adversary, but the adversary knows its `CID`, the adversary can trivially resolve the Content Providers from the DHT, and fetch the content over Bitswap. Removing request broadcast from Bitswap would make it harder to crawl existing `CID`s, and thus would improve reader privacy in the DHT. -The proposed solution makes _association attacks_ (associating the Client's `PeerID` with the requested `CID`) much more expensive for _public content_, but doesn't make them impossible to perform. However, malicious users cannot discover _private content_, and spy on users accessing it. If Alice advertises her holiday pictures to the public IPFS DHT and privately sends the root `CID` to Bob only, no adversary can retrieve the pictures, and no adversary can learn what Bob is accessing. Only the DHT Servers serving the Provider Record to Bob know that Bob is requesting some content from Alice's `PeerID`. +DHT Servers serving the requested Provider Record to the Client has the ability to associate the Client's `PeerID` with the Content Providers `PeerID`. They can track _from which peer a Client is fetching content_. The Client doesn't have any privacy protection from the Content Provider serving Content over Bitswap. @@ -312,7 +312,7 @@ Other approaches to improve Reader Privacy in the DHT mostly include Ephemeral ` Ephemeral `PeerID`s references: - https://github.com/libp2p/libp2p/issues/37 -The other alternative to increase the Reader Privacy level in the IPFS DHT is the use of Mixnets such as Tor or I2P. Mixnets usually provide an excellent Reader- and Writer Privacy level, but the latency is expected to increase significantly. Hence the use of Mixnets is generally not good for all use cases, but only when strong privacy guarantees are required. IPFS users willing to remain pseudonymous could use the existing Tor network to hide their identity. Another alternative could be to create a Mixnet out of the IPFS network, e.g include mixing capabilities in every libp2p host. There has been some ongoing work on IPFS-Tor integration. +The other alternative to increase the Reader Privacy level in the IPFS DHT is the use of Mixnets such as Tor or I2P. Mixnets usually provide an excellent Reader- and Writer Privacy level, but lookup latency is significantly higher. Hence the use of Mixnets is generally not good for all use cases, but only when strong privacy guarantees are required. Mixnets can easily be built on top of the Double Hash DHT to maximize user Privacy. IPFS users willing to remain pseudonymous could use the existing Tor network to hide their identity. Another alternative could be to create a Mixnet out of the IPFS network, e.g include mixing capabilities in every libp2p host. There has been some ongoing work on IPFS-Tor integration. Mixnets references: - Berty's [go-libp2p-tor-transport](https://github.com/berty/go-libp2p-tor-transport) From 6c587946c7ce8e3e11f93fb0454f791da84e9a5f Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Tue, 14 Feb 2023 11:19:18 +0100 Subject: [PATCH 29/55] restructured provider store section --- IPIP/0000-double-hash-dht.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 6c2ca6a6..481ef09e 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -199,9 +199,12 @@ signature, err := privKey.Sign(data) The data structure of the DHT Servers' Provider Store is a nested key-value store whose structure is: `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`]. -The same `HASH2` always produces the same `ServerKey` (as long as the same Hashing Algorithm was used), as both `HASH2` and `ServerKey` result in a deterministic hash operation on `MH` prepended with a constant prefix. However, a misbehaving node could publish an advertisement for `HASH2` while not knowing `MH`, and forge a random `ServerKey`. The DHT Server not knowing `MH` cannot determine which `ServerKey` is the one associated with `HASH2`, and hence need to keep all different `ServerKey`s. However, the number of forged `ServerKey`s is expected to be small as the Client aren't able to decrypt payload encrypted with a forged `ServerKey`, and detect that the Provider Record isn't legitimate. The only reason a misbehaving peer would want to publish forged `ServerKey`s is to exhaust the storage resources of a specific target DHT Server. +The same `HASH2` always produces the same `ServerKey` (as long as the same Hashing Algorithm was used), as both `HASH2` and `ServerKey` result in a deterministic hash operation on `MH` prepended with a constant prefix. So if all peers are honest, each `HASH2` should be associated with a single `ServerKey`. + +However, a misbehaving node could publish an advertisement for `HASH2` while not knowing `MH`, and forge a random `ServerKey`. The DHT Server not knowing `MH` cannot determine whether a `ServerKey` is valid and hence need to keep all different `ServerKey`s. The only reason a misbehaving peer would want to publish forged `ServerKey`s is to exhaust the storage resources of a specific target DHT Server, as Clients detect invalid Provider Records. DHT Servers store at most `3` different `ServerKey` for each `CPPeerID`, limiting the resource exhaustion attack while allowing some agility when changing the Hash function. + +Content can be provided by multiple Content Providers, hence `HASH2` -> `ServerKey` potentially maps to multiple `CPPeerID`s, each Content Provider having its own Provider Record. As the `CPPeerID` is obtained from the open libp2p connection, we assume that it is not possible to impersonate another `CPPeerID`. Each Content Provider can have a single Provider Record for each `HASH2`, and for each available `varint`. During a migration, we expect to have multiple Provider Records for the same pair (`HASH2`, `CPPeerID`), the Provider Store keeps 1 Provider Records for each distinct (`HASH2`, `CPPeerID`, `varint`) with a maximum of `3` per pair (`HASH2`, `CPeerID`). If there are more than 3 candidates, the ones with the lowest `TS` are discarded. -Content can be provided by multiple Content Providers, hence `HASH2` -> `ServerKey` points to potentially multiple `CPPeerID`s, each Content Provider having its own Provider Record. As the `CPPeerID` is obtained from the open libp2p connection, we assume that it is not possible to impersonate another `CPPeerID`. Each Content Provider can have a single Provider Record for each `HASH2`, and for each available `varint`. During a migration, we expect to have multiple Provider Records for the same pair (`HASH2`, `CPPeerID`), the Provider Store keeps 1 Provider Records for each distinct (`HASH2`, `CPPeerID`, `varint`) with a maximum of `3` per pair (`HASH2`, `CPeerID`). If there are more than 3 candidates, the ones with the lowest `TS` are discarded. When a Content Provider republishes a Provider Record, the DHT Server only keeps the valid Provider Record whose `TS` is the largest value, for the given `varint`. We expect to have a single `varint` in use most of the time. DHT Servers drop all Provider Records from published by the same `CPPeerID` with the same `HASH2` but multiple different `ServerKey`s. A well behaving node can compute the right `ServerKey` and doesn't try to exhaust the storage resources of the DHT Server. Only a misbehaving node forges invalid `ServerKey`s, and if multiple `ServerKey`s are associated with the same (`HASH2`, `CPPeerID`) it implies that at least one of the two `ServerKey` is incorrect, so the Content Provider is misbehaving. ### `k`-anonymity From 9936e755bc53bec6b01964394fa1aeeba5cd8628 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Tue, 14 Feb 2023 11:25:58 +0100 Subject: [PATCH 30/55] added open question about multiple matching Provider Records indexing --- IPIP/0000-double-hash-dht.md | 1 + 1 file changed, 1 insertion(+) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 481ef09e..d8a16e1d 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -330,6 +330,7 @@ Mixnets references: - If we plan to move to using SHA3 instead of SHA2 to generate 256-bits digests, this migration is the perfect opportunity, as we will be breaking everything anyways. SHA3 was proved to be more secure against Length Extension Attacks. It has not be proven whether SHA2 or SHA3 is more collision resistant and secure against preimage attacks. See this [comparison](https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions). - Is it wise to encrypt the `CPPeerID` using `MH` directly? It would be possible to derive another identifier from `MH` (such as `Hash("SOME_CONSTANT" || MH)`). `MH` is the master identifier of the content, hence if it is revealed all other identifiers can trivially be found. However, it is computationally impossible to recover `MH` from `Hash("SOME_CONSTANT" || MH)`. - It may be fine to use `TS` as nonce, it spares bytes on the wire. However, if two Content Providers publish the same content at the same time (`TS` either in seconds or milliseconds), then the DHT Server may be able to forge a valid Provider Records for itself. +- As multiple `HASH2` match each `Prefix` and the Client is only interested in a single one, should we send the `HASH2` along with each encrypted provider record (network load overhead) or let the Client try to decrypt all payloads and see for themselves which one opens (cpu overhead)? ## Copyright From 145869b42112fd251ff5f69f308af03815bc3bdb Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Tue, 14 Feb 2023 11:54:59 +0100 Subject: [PATCH 31/55] rephrased open question on using timestamp as IV --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index d8a16e1d..3a9454d1 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -329,7 +329,7 @@ Mixnets references: - If we plan to move to using SHA3 instead of SHA2 to generate 256-bits digests, this migration is the perfect opportunity, as we will be breaking everything anyways. SHA3 was proved to be more secure against Length Extension Attacks. It has not be proven whether SHA2 or SHA3 is more collision resistant and secure against preimage attacks. See this [comparison](https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions). - Is it wise to encrypt the `CPPeerID` using `MH` directly? It would be possible to derive another identifier from `MH` (such as `Hash("SOME_CONSTANT" || MH)`). `MH` is the master identifier of the content, hence if it is revealed all other identifiers can trivially be found. However, it is computationally impossible to recover `MH` from `Hash("SOME_CONSTANT" || MH)`. -- It may be fine to use `TS` as nonce, it spares bytes on the wire. However, if two Content Providers publish the same content at the same time (`TS` either in seconds or milliseconds), then the DHT Server may be able to forge a valid Provider Records for itself. +- It may be fine to use `TS` as Nonce/IV for the Provider Record encryption (`EncPeerID = AESGCM(MH, Nonce, CPPeerID)`), it spares bytes on the wire. If `TS` is the number of milli- or nano-seconds that have passed since `1970-01-01T00:00:00`, this number easily fits in the 12 bytes IV. Moreover it is very unlikely that 2 nodes perform an encryption using the same key (for the same content) at the exact same milli- or nano-second. Using TS as nonce would spare 4 bytes (`TS` size) on the wire when publishing content to the DHT, and 4 bytes for each Provider Record matching `Prefix` for all requests. However the information about when the Provider Record was published (already known to the DHT Servers storing the Provider Record) would be publicly available. Anyone enumerating DHT Provider Records would be able to read it. - As multiple `HASH2` match each `Prefix` and the Client is only interested in a single one, should we send the `HASH2` along with each encrypted provider record (network load overhead) or let the Client try to decrypt all payloads and see for themselves which one opens (cpu overhead)? ## Copyright From dcfbcb5605edfa3a33dc69e2f3fd4c905e99a206 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Tue, 14 Feb 2023 13:41:53 +0100 Subject: [PATCH 32/55] corrected aes gcm varint --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 3a9454d1..fb861714 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -37,7 +37,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting **Magic Values** - bytes("CR_DOUBLEHASH") - bytes("CR_SERVERKEY") -- AES-GCM [varint](https://github.com/multiformats/multicodec/pull/314): `aes-gcm-256 = 0xa5` +- AES-GCM [varint](https://github.com/multiformats/multicodec/pull/314): `aes-gcm-256 = 0xa501` - Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x56` - A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). - Provider Record Timestamp (`TS`) validity period: `48h` From 6cb0f2acf0e992492b6aef45765d8c220a95816f Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Tue, 14 Feb 2023 15:07:36 +0100 Subject: [PATCH 33/55] added varint to encryption with ServerKey --- IPIP/0000-double-hash-dht.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index fb861714..32bbaffb 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -38,7 +38,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting - bytes("CR_DOUBLEHASH") - bytes("CR_SERVERKEY") - AES-GCM [varint](https://github.com/multiformats/multicodec/pull/314): `aes-gcm-256 = 0xa501` -- Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x56` +- Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x5601` - A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). - Provider Record Timestamp (`TS`) validity period: `48h` @@ -58,7 +58,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use `ServerKey` to encrypt `TS`, `Signature` and Content Providers `multiaddrs` sent to the Client when some Provider Records match the requested `Prefix`. `ServerKey` is represented as a 32-byte array. - **`TS`** is the [Unix Timestamp](https://en.wikipedia.org/wiki/Unix_time) corresponding content publish time. `TS` is represented as a 32-bit **unsigned** Integer, allowing timestamps to range from `1970-01-01T00:00:00Z` to `2106-02-07T06:28:15Z` before reaching the overflow. - **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. -- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/pull/314) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0xa2`, `Nonce`, `AESGCM(MH, Nonce, CPPeerID)`]. +- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/pull/314) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0xa501`, `Nonce`, `payload_len`, `AESGCM(MH, Nonce, CPPeerID)`]. - **`Signature`** is the signature of the `EncPeerID` encrypted payload (not including the varint nor the nonce) and `TS` using the Content Provider's private key, either with ed25519 or rsa signature algorithm, depending on the keys of the Content Provider. - **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists on the following fields: [`EncPeerID`, `TS`, `Signature`]. - **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a nested dictionary/map: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. There is only one single correct `ServerKey` for each `HASH2`. However, any peer can forge a valid Publish request (with invalid `EncPeerID` but valid `Signature`) undetected by the DHT Server. The DHT server isn't able to distinguish which `ServerKey` is correct as it doesn't have the knowledge of `MH`, hence it has to keep both and serve both upon request for `HASH2`. @@ -86,7 +86,7 @@ The following process describes the event of a client looking up a CID in the IP 1. Content Provider wants to publish some content with identifier `CID`. 2. Content Provider computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). 3. Content Provider starts a DHT lookup request for the 20 closest `PeerID`s in XOR distance to `HASH2`. -4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = [0xa2, Nonce, AESGCM(MH, Nonce, CPPeerID)]` +4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = [0xa501, Nonce, payload_len, AESGCM(MH, Nonce, CPPeerID)]` 5. Content Provider takes the current timestamp `TS`. 6. Content Provider signs `EncPeerID` and `TS` using its private key. `Signature = Sign(privkey, EncPeerID || TS)` 7. Content Provider computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. @@ -100,10 +100,10 @@ The following process describes the event of a client looking up a CID in the IP 1. Client computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). 2. Client selects a prefix of `HASH2`, `KeyPrefix = HASH2[:l]` for a defined `l` (see [`l` selection](#prefix-length-selection)). 2. Client finds the closest `PeerID`s to `HASH2` in XOR distance in its Routing Table. -3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. +3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. 4. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. 5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || SERVERNONCE || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. +6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || 0xa501 || SERVERNONCE || payload_len || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. 7. The DHT servers send `message` to Client. 8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. 9. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 12. From dca899135bd0362e8c4606df4373c66426cbdb5e Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Wed, 15 Feb 2023 13:57:37 +0100 Subject: [PATCH 34/55] Update IPIP/0000-double-hash-dht.md Co-authored-by: Max Inden --- IPIP/0000-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 32bbaffb..2bae3455 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -71,7 +71,7 @@ The following process describes the event of a client looking up a CID in the IP 3. Client sends a DHT lookup request for `CID` to these DHT servers. 4. Upon receiving the request, the DHT servers search if there is an entry for `MH` in their Provider Store. If yes, go to 10. Else continue. 5. DHT servers compute `Hash(MH)`. -6. DHT servers find the 20 closest peers to `Hash(HM)` in XOR distance in their Routing Table. +6. DHT servers find the 20 closest peers to `Hash(MH)` in XOR distance in their Routing Table. 7. DHT servers return the 20 `PeerID`s and `multiaddrs` of these peers to Client. 8. Client sends a DHT lookup request for `CID` to the closest peers in XOR distance to `Hash(MH)` that it received. 9. Go to 4. From ba2f3c7c42aba4f8898d66fb8b5d4e6e3614dc16 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 15 Feb 2023 16:50:34 +0100 Subject: [PATCH 35/55] added mermaid diagrams --- IPIP/0000-template.md | 406 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 405 insertions(+), 1 deletion(-) diff --git a/IPIP/0000-template.md b/IPIP/0000-template.md index 21ff22ff..578d9aa7 100644 --- a/IPIP/0000-template.md +++ b/IPIP/0000-template.md @@ -29,7 +29,411 @@ Describe the proposed solution and list all changes made to the specs repository The resulting specification should be detailed enough to allow competing, interoperable implementations. -When modifying an existing specification file, this section should provide a +When modifying an existing specification file, this # IPIP 0000: Double Hash DHT + +![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) +- DRI: [Guillaume Michel](https://github.com/guillaumemichel) +- Start Date: 2023-01-18 +- Related Resources: + - [Specs in Notion](https://pl-strflt.notion.site/Double-Hashing-for-Privacy-ff44e3156ce040579289996fec9af609) + - [WIP Implementation](https://github.com/ChainSafe/go-libp2p-kad-dht) + - https://github.com/ipfs/specs/pull/334 + - https://github.com/ipfs/specs/issues/345 + +## Summary + +This IPIP contains the up-to-date Spec of the IPFS Double Hash DHT. The Double Hashing DHT aims at providing some Reader Privacy guarantees to the IPFS DHT. + +This document is still WIP, all feedback is more than welcome. Make sure to write your thoughts about the [open questions](#open-questions) in the PR. + +## Table of Contents + +1. [Motivation](#motivation) +2. [Detailed Design](#detailed-design) +3. [Design Rationale](#design-rationale) +4. [User benefits](#user-benefits) +5. [Migration](#migration) +6. [Threat Model](#threat-model) +7. [Alternatives](#alternatives-for-dht-reader-privacy) +8. [Open Questions](#open-questions) + +## Motivation + +IPFS is currently lacking of many privacy protections. One of its principal weaknesses currently lies in the lack of privacy protections for the DHT content routing subsystem. Currently in the IPFS DHT, neither readers (clients retrieving content) nor writers (hosts storing and distributing content) have much privacy with regard to content they consume or publish. It is trivial for a DHT server node to associate the requestor's identity with the accessed content during the routing process. A curious DHT server node, can request the same CIDs to find out what content other users are consuming. Improving privacy in the IPFS DHT has been a strong request from the community for some time. + +The changes described in this document introduce a DHT privacy upgrade boosting the reader’s privacy. It will prevent DHT tracking as described above, and add Provider Records Authentication. The proposed modifications also add a slight Writer Privacy improvement as a side effect. + +## Detailed Design + +**Magic Values** +- bytes("CR_DOUBLEHASH") +- bytes("CR_SERVERKEY") +- AES-GCM [varint](https://github.com/multiformats/multicodec/pull/314): `aes-gcm-256 = 0xa501` +- Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x5601` +- A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). +- Provider Record Timestamp (`TS`) validity period: `48h` + +### Definitions + +- **`CID`** is the IPFS [Content IDentifier](https://github.com/multiformats/cid) +- **`MH`** is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the digest of a hash function over some content. `MH` is represented as a 32-byte array. +- **`HASH2`** is defined as `SHA256(bytes("CR_DOUBLEHASH") || MH)`. It represents the location of the Kademlia keyspace for the Provider Record associated with `CID`. `HASH2` is represented as a 32-byte array. `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)`. +- **Content Provider** is the node storing some content, and advertising it to the DHT. +- **DHT Servers** are nodes running the IPFS public DHT. In this documents, DHT Servers mostly refer to the DHT Servers storing the Provider Records associated with specific `CID`s, and not the DHT Servers helping routing lookup requests to the right keyspace location. +- **Client** is an IPFS client looking up a content identified by an already known `CID`. +- **Publish Process** is the process of the Content Provider communicating to the DHT Servers that it provides some content identified by `CID`. +- **Lookup Process** is the process of the Client retrieving the content identified by `CID`. +- **`PeerID`** s define stable [peer identities](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md). The `PeerID` is derived from the node's cryptographic public key. +- **`multiaddrs`** are the [network addresses](https://github.com/libp2p/specs/tree/master/addressing) associated with a `PeerID`. It represents the location(s) of the peer. +- **`KeyPrefix`** is defined as a prefix of length `l` bits of `HASH2`. `KeyPrefix` is represented by a `byte` concatenated with a variable sized array of bytes, containing at most 32 bytes. The leading `byte` represents the binary representation of `l - 1`, making prefixes of length `256` possible, but not prefixes of length `0`. The trailing byte array is of length `ceil(l/8)` bytes, and its content is the bits prefix right padded with zeros. +- **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use `ServerKey` to encrypt `TS`, `Signature` and Content Providers `multiaddrs` sent to the Client when some Provider Records match the requested `Prefix`. `ServerKey` is represented as a 32-byte array. +- **`TS`** is the [Unix Timestamp](https://en.wikipedia.org/wiki/Unix_time) corresponding content publish time. `TS` is represented as a 32-bit **unsigned** Integer, allowing timestamps to range from `1970-01-01T00:00:00Z` to `2106-02-07T06:28:15Z` before reaching the overflow. +- **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. +- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/pull/314) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0xa501`, `Nonce`, `payload_len`, `AESGCM(MH, Nonce, CPPeerID)`]. +- **`Signature`** is the signature of the `EncPeerID` encrypted payload (not including the varint nor the nonce) and `TS` using the Content Provider's private key, either with ed25519 or rsa signature algorithm, depending on the keys of the Content Provider. +- **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists on the following fields: [`EncPeerID`, `TS`, `Signature`]. +- **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a nested dictionary/map: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. There is only one single correct `ServerKey` for each `HASH2`. However, any peer can forge a valid Publish request (with invalid `EncPeerID` but valid `Signature`) undetected by the DHT Server. The DHT server isn't able to distinguish which `ServerKey` is correct as it doesn't have the knowledge of `MH`, hence it has to keep both and serve both upon request for `HASH2`. + +### Current DHT + +The following process describes the event of a client looking up a CID in the IPFS DHT: +1. Client computes `Hash(MH)` (`MH` is the MultiHash included in the CID). +2. Client looks for the closest peers to `Hash(MH)` in XOR distance in its Routing Table. +3. Client sends a DHT lookup request for `CID` to these DHT servers. +4. Upon receiving the request, the DHT servers search if there is an entry for `MH` in their Provider Store. If yes, go to 10. Else continue. +5. DHT servers compute `Hash(MH)`. +6. DHT servers find the 20 closest peers to `Hash(MH)` in XOR distance in their Routing Table. +7. DHT servers return the 20 `PeerID`s and `multiaddrs` of these peers to Client. +8. Client sends a DHT lookup request for `CID` to the closest peers in XOR distance to `Hash(MH)` that it received. +9. Go to 4. +10. The DHT servers storing the Provider Record(s) associated with `MH` send them to Client. (Currently, if a Provider Record has been published less than 30 min before being requested, the DHT servers also send the `multiaddresses` of the Content Provider to Client). +11. If the response from the DHT server doesn't include the `multiaddrs` associated with the Content Providers' `PeerID`s, Client performs a DHT `FindPeer` request to find the `multiaddrs` of the returned `PeerID`s. +12. Client sends a Bitswap request for `CID` to the Content Provider (known `PeerID` and `multiaddrs`). +13. Content Provider sends the requested content back to Client. + +### Double Hash DHT design + +**Publish Process** +1. Content Provider wants to publish some content with identifier `CID`. +2. Content Provider computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). +3. Content Provider starts a DHT lookup request for the 20 closest `PeerID`s in XOR distance to `HASH2`. +4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = [0xa501, Nonce, payload_len, AESGCM(MH, Nonce, CPPeerID)]` +5. Content Provider takes the current timestamp `TS`. +6. Content Provider signs `EncPeerID` and `TS` using its private key. `Signature = Sign(privkey, EncPeerID || TS)` +7. Content Provider computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. +8. Once the lookup request has returned the 20 closest peers, Content Provider sends a Publish request to these DHT servers. The Publish request contains [`HASH2`, `EncPeerID`, `TS`, `Signature`, `ServerKey`]. +9. Each DHT server verifies `Signature` against the `PeerID` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is younger than `48h` and isn't in the future. If invalid, send an error to the client. +10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `PeerID` of the Content Provider (see [provider store](#provider-store)). If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. +11. Each DHT server confirms to Content Provider that the Provider Record has been successfully added. +12. The proces is over once Content Provider has received 20 confirmations. + +```mermaid +sequenceDiagram + participant CP as Content Provider + participant DHT + participant Server as DHT Server + + Note left of CP: HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH) + + CP->>DHT: FIND_PEERS(HASH2) + DHT->>CP: [PeerID0, PeerID1, ... PeerID19] + + Note left of CP: EncPeerID = 0xa501 || Nonce || payload_len || AESGCM(MH, Nonce, CPPeerID) + Note left of CP: Signature = Sign(privkey, EncPeerID || TS) + Note left of CP: ServerKey = SHA256(bytes("CR_SERVERKEY") || MH) + + par Content Provider to the 20 closest DHT Servers to HASH2 + CP->>Server: HASH2 || EncPeerID || TS || Signature || ServerKey + + Note right of Server: Verify(pubkey, Signature, EncPeerID) &&
TS - time.now() < 48h + Note right of Server: On success, add to Provider Store:
HASH2 -> ServerKey -> CPPeerID -> [EncPeerID, TS, Signature] + + Server->>CP: Success / Error + end + + Note left of CP: Wait for 20 Successes +``` + +**Lookup Process** +1. Client computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). +2. Client selects a prefix of `HASH2`, `KeyPrefix = HASH2[:l]` for a defined `l` (see [`l` selection](#prefix-length-selection)). +2. Client finds the closest `PeerID`s to `HASH2` in XOR distance in its Routing Table. +3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. +4. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. +5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. +6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || 0xa501 || SERVERNONCE || payload_len || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. +7. The DHT servers send `message` to Client. +8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. +9. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 12. +10. If the DHT Server's `MatchLimit` and number of matching `HASH2`s was included in the `message`, Client makes multiple DHT lookup requests for longer prefixes (e.g `KeyPrefix||0` and `KeyPrefix||1`). Else Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. +11. Go to 4. +12. For each decrypted payload, Client decrypts `CPPeerID = Dec(MH, EncPeerID)`. +13. Client verifies that `Signature` verifies with `CPPeerID`: `Verify(CPPeerID, Signature, EncPeerID || TS)`. +14. Client checks that `TS` is younger than `48h`. +15. If none of the decrypted payloads is valid, go to 4. +16. If the decrypted payload doesn't include the `multiaddrs` associated with `CPPeerID`, Client performs a DHT `FindPeer` request to find the `multiaddrs` associated with `CPPeerID`. +17. Client sends a Bitswap request for `CID` to the Content Provider (known `CPPeerID` and `multiaddrs`). +18. Content Provider sends the requested content back to Client. + +```mermaid +sequenceDiagram + participant Client + participant Server as DHT Server + participant CP as Content Provider + + Note left of Client: HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH) + Note left of Client: ServerKey = SHA256(bytes("CR_SERVERKEY") || MH) + + loop in parallel until valid Provider Record found + Note left of Client: KeyPrefix = HASH2[:l] + Client->>Server: FIND_CONTENT(KeyPrefix)
Optional flags: multiaddrs, metadata + Note right of Server: message = [] + loop for each of the 20 closest PeerIDs to KeyPrefix in the Routing Table + Note right of Server: message += PeerID + end + loop for each entry matching KeyPrefix in the Provider Store + Note right of Server: EncMetadata = 0xa501 || SERVERNONCE || payload_len ||
AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs) + Note right of Server: Aggregate records per HASH2:
message += HASH2 || nb_records || EncPeerID0 || EncMetadata0 || ... || EncMetadataN + Note right of Server: Note: don't add multiaddrs nor metadata if not requested with flags + end + Note right of Server: Note: If there are more than MatchLimit entries matching KeyPrefix, drop all records and
message += "MatchLimit = MatchLimit" + Server->>Client: message + loop for all records matching HASH2 + Note left of Client: CPPeerID = Dec(MH, EncPeerID) + Note left of Client: TS || Signature || multiaddrs = Dec(ServerKey, EncMetadata) + Note left of Client: Verify(CPPeerID, Signature, EncPeerID) + end + Note left of Client: If at least 1 record is valid exit the loop + end + opt if no valid record contains multiaddrs + Client->>Server: FIND_PEER(CPPeerID) + Server->>Client: multiaddrs of CPPeerID + end + + + Client->>CP: Bitswap request for CID + CP->>Client: Content +``` + +### Prefix length selection + +The goal of DHT prefix requests is to provide [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) to content lookup, in addition to the pseudonimity gained from double hashing. Each DHT prefix lookup query returns an expected number of `k` Provider Records matching `KeyPrefix`, with `k` being a system parameter. The user should be able to define a custom `k` from the configuration files, according to their privacy needs. The default value `k = 8` is discussed in [Design rationale](#reader-privacy). + +The prefix `l` is derived from `k` and the number of CIDs published to the DHT: $l \leftarrow{} log_2(\frac{\\#CIDs}{k})$. However, the total number of CIDs published to the DHT can be hard to approximate, and the initial `l` value can be determined by approximation and dichotomy. At the first startup, the node looks up for random keys starting with a `l = 26`. Then, by dichotomy it adapts `l` so that a lookup for a prefix of length `l` matches on average ~`k` Provider Records. + +Each node keeps track of the number of `HASH2` matching the last `KeyPrefix` requested in the last 128 lookups. `a` is defined as the average number of matches for the last 128 requests. At any point in time, if $a \gt 2\times k$, then `l` should increase (`l = l + 1`), and if $a \lt \frac{k}{2}$, then `l` should decrease (`l = l - 1`). On node shutdown, `a` is saved on disk, allowing a quick restart with an accurate `l` value. + +Note that DHT Servers can set an upperbound on the number of Provider Records they serve for each lookup request. So a too small `l` may result in not discovering the target Provider Record. + +**Prefix magic numbers** +- `k`-anonymity privacy parameter, by default `k = 8` +- Size of moving average of number of Provider Records matching a prefix: `128` +- Initial prefix length: `26`. There are currently ~850M distinct CIDs published in the DHT ([source](https://pl-strflt.notion.site/2022-09-20-Hydras-Analysis-5db53b6af3e04a46aaf7a776e65ae97d)). $log_2(\frac{850M}{8})=26.663$. As the number of CIDs in the network grows exponentially, the prefix length is expected to decrease linearly for a constant `k`. + +### _Closest_ keys to a key prefix + +Computing the XOR distance between two binary bitstrings of different lengths isn't possible. Hence finding the N closest keys to a key prefix in the Kademlia keyspace doesn't make sense. We can however find the keys matching the prefix (e.g `prefix == key[:l]` for $key \in \{0, 1\}^{256}, prefix \in \{0, 1\}^{l}, l \leq 256$), and the keys _close_ from matching the prefix. Randomness is used as tie breaker. + +The following pseudo-code defines the algorithm to find `N` keys matching or _close_ from matching a prefix. The main idea is to truncate the leaves of the Kademlia trie to the length of the prefix `l`. If `M` keys match prefix, for $M \ge N$, then `N` keys must be picked at random among the `M` candidates. If `M` keys match prefix, for $M \lt N$, we must still find `Q = N - M` keys. We iterate on the truncated Kademlia leaves of depth `l` ordered by XOR distance to `prefix`, starting from the closest. Supposing there are `P` keys in the current truncated Kademlia leaf, and that we are missing `Q` keys, if $P \ge Q$, we select `Q` keys at random among the `P` candidates, otherwise, if $P \lt Q$ we take the `P` keys, set `Q = Q - P` and iterate on the following leaf until we find `N` keys. + +``` +func closest_to_match(prefix, N, all_keys) { + selected_keys = [] + l = len(prefix) // len(prefix) if the bit length of the prefix + + // iterate on all prefixes of length l from closest to furthest from 'prefix' + for counter = 0; len(selected_keys) < N && counter < 2**l; counter += 1 { + + leaf = prefix XOR binary(counter, l) + // binary(x, l) gives the binary representation of a number x, on l bits + + // get all keys matching to the prefix 'leaf' + matching_keys = find_matching_keys(leaf, all_keys) + + // add at most (N-len(selected_keys)) to selected_keys + if len(matching_keys) <= N - len(selected_keys) { + selected_keys += matching_keys + } else { + random_selection = select_N_random(matching_keys, N - len(selected_keys)) + selected_keys += random_selection + } + } + return selected_keys +} +``` + +## Design rationale + +### Cryptographic algorithms + +**SHA256** + +SHA256 is the algorithm currently in use in IPFS to generate 256-bits digests used as Kademlia identifiers. Note that SHA256 refers to the algorithm of [SHA2](https://en.wikipedia.org/wiki/SHA-2) algorithm with a 256 bits digest size. + +A future change of Cryptographic Hash Function will require a _DHT Migration_ as the Provider Records _location_ in the Kademlia keyspace will change, for they are defined by the Hash Function. It means that all Provider Records must be published using both the new and the old hash function for the transition period. We want to avoid performing theses migrations as much as possible, but we must be ready for it as it is likely to happen in the lifespan of IPFS. + +Changing the Hash function used to derive `ServerKey` requires the DHT Server to support multiple Provider Records indexed by a different `ServerKey` for the same `HASH2` for the migration period. + +**AESGCM** + +[AESGCM](https://en.wikipedia.org/wiki/Galois/Counter_Mode) (Advanced Encryption Standard in Galois/Counter Mode) is a AEAD (Authenticated Encryption with Associated Data) mode of operation for symmetric-key cryptographic block ciphers which is widely adopted for its performance. It takes as input an Initialization Vector (IV) that needs to be unique (Nonce) for each encryption performed with the same key. This algorithm was selected for its security, its performance and its large industry adoption. + +The nonce size is set to `12` (default for AES GCM). AESGCM is used with encryption keys of 256 bits (SHA256 digests in this context). + +A change in the encryption algorithm of the Provider Record implies that the Content Providers must publish 2 Provider Records, one with each encryption scheme. The Client and the DHT Server learn which encryption algorithm has been used by the Content Provider from the `varint` contained in `EncPeerID`. When a new encryption algorithm DHT servers may need to store multiple Provider Records in its Provider Store for the same `HASH2` and the same `CPPeerID`. We restrict the number of Provider Record for each pair (`HASH2`, `CPPeerID`) to `3` (the `varint`s must be distinct), in order to allow some flexibility, while keeping the potential number of _garbage_ Provider Records published by hostile nodes low. + +A change in the encryption algorithm used between the DHT Server and the Client (Lookup step 7.) means that the Client and the DHT Server must negotiate the encryption algorithm, as long as it still uses a 256-bits key. + +**Signature scheme** + +The signature scheme is the default one from libp2p. The available algorithms are available [here](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#key-types) We use the private key, from which the node's `PeerID` is derived to sign `(EncPeerID || TS)`. Every node with the knowledge of the signing `peerid` can verify the signature. + +```go +privKey := host.Peerstore().PrivKey(host.ID()) +signature, err := privKey.Sign(data) +``` + +### Provider Store + +The data structure of the DHT Servers' Provider Store is a nested key-value store whose structure is: `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`]. + +The same `HASH2` always produces the same `ServerKey` (as long as the same Hashing Algorithm was used), as both `HASH2` and `ServerKey` result in a deterministic hash operation on `MH` prepended with a constant prefix. So if all peers are honest, each `HASH2` should be associated with a single `ServerKey`. + +However, a misbehaving node could publish an advertisement for `HASH2` while not knowing `MH`, and forge a random `ServerKey`. The DHT Server not knowing `MH` cannot determine whether a `ServerKey` is valid and hence need to keep all different `ServerKey`s. The only reason a misbehaving peer would want to publish forged `ServerKey`s is to exhaust the storage resources of a specific target DHT Server, as Clients detect invalid Provider Records. DHT Servers store at most `3` different `ServerKey` for each `CPPeerID`, limiting the resource exhaustion attack while allowing some agility when changing the Hash function. + +Content can be provided by multiple Content Providers, hence `HASH2` -> `ServerKey` potentially maps to multiple `CPPeerID`s, each Content Provider having its own Provider Record. As the `CPPeerID` is obtained from the open libp2p connection, we assume that it is not possible to impersonate another `CPPeerID`. Each Content Provider can have a single Provider Record for each `HASH2`, and for each available `varint`. During a migration, we expect to have multiple Provider Records for the same pair (`HASH2`, `CPPeerID`), the Provider Store keeps 1 Provider Records for each distinct (`HASH2`, `CPPeerID`, `varint`) with a maximum of `3` per pair (`HASH2`, `CPeerID`). If there are more than 3 candidates, the ones with the lowest `TS` are discarded. + +When a Content Provider republishes a Provider Record, the DHT Server only keeps the valid Provider Record whose `TS` is the largest value, for the given `varint`. We expect to have a single `varint` in use most of the time. DHT Servers drop all Provider Records from published by the same `CPPeerID` with the same `HASH2` but multiple different `ServerKey`s. A well behaving node can compute the right `ServerKey` and doesn't try to exhaust the storage resources of the DHT Server. Only a misbehaving node forges invalid `ServerKey`s, and if multiple `ServerKey`s are associated with the same (`HASH2`, `CPPeerID`) it implies that at least one of the two `ServerKey` is incorrect, so the Content Provider is misbehaving. + +### `k`-anonymity + +Default: `k = 8`. +Default: `MatchLimit = 64`. + +The `k`-anonymity parameter `k` is user defined, it can be modified in the configuration files. Users requiring a higher level of privacy can increase their value of `k`. `8` is deemed to be private enough for standard IPFS users, while limiting the overhead in packet size of the DHT Server response to 8x. + +The `MatchLimit` prevents malformed or malicious requests to match all Provider Records that a DHT Server is providing at once. A Client can still fetch all Provider Records matching any `KeyPrefix`, but it must perform multiple DHT lookup requests for enough prefixes to the DHT Server. The `MatchLimit` protects the Server from having to send large amounts of data at once. `64` is already a large value, given that each `HASH2` can be associated with multiple Provider Records, one for each Content Provider, and the multiaddresses of all Content Providers can be sent along. The DHT provides _on average_ at most `64-anonymity` out-of-the-box and a better privacy level can be reached by sending multiple requests. + +## User benefits + +### Reader Privacy + +**Double Hashing** + +Currently any DHT Server observing a request can associate the Client's `PeerID` with the requested `CID`. If the `CID` is not already known, curious DHT Servers observing a DHT request can replay the request, and retrieve the content that the client is accessing, which is a significant privacy concern. Using `HASH2` as DHT Content Identifier prevents curious DHT Servers not knowing `MH`, the preimage of `HASH2` from retrieving the content associated with `HASH2`. Curious DHT Servers can still replay the DHT request for `HASH2` and find the Content Providers. However, they are not able to make a valid Bitswap request to the Content Providers, for they don't have the knowledge of the Content Identifier used by Bitswap (`CID`) for the content being identified by `HASH2` in the DHT. + +Double Hashing is also necessary for Prefix Requests and Provider Record Encryption. + +**Prefix Requests** + +A Prefix Request consists in requesting a Prefix of a key, instead of a full length Kademlia key. A Prefix corresponds to a branch of the binary trie, and potentially matches multiple existing keys. Prefix Request Routing works exactly like the normal Kademlia Routing, hence a DHT Prefix Request eventually always converges. The goal of Prefix Requests is to match multiple Provider Records for a single request. Instead of requesting `HASH2` the Client now requests `Prefix`, a prefix of `HASH2` of length `l` bits, and the DHT Server storing the Provider Records matching to `Prefix` doesn't know exactly which content is accessed and returns all Provider Records whose `HASH2` matches `Prefix`. With Double Hashing, curious DHT Servers cannot associate `CID` with the requester `PeerID` anymore, but they can associate `HASH2` with `PeerID`. Prefix Requests make it harder for curious DHT Servers to associate `PeerID` to a specific `HASH2`, as they only learn a `Prefix` of `HASH2`. + +This provides [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) when a curious DHT Server tries to associate the Client's `PeerID` with the requested `HASH2`, with `k` defined as the average number of Provider Records matching a Prefix of length `l`. `k` is a system parameter and defines the `k`-anonymity level, and `l` is derived from `k` (see [Prefix Length Selection](#prefix-length-selection)). Prefix Request also enables [Plausible Deniability](https://en.wikipedia.org/wiki/Deniable_encryption) for the Client. The DHT Server cannot prove that a Client identified by its `PeerID` or `IP Address` tried to access some content identified by its `HASH2`. + +However Prefix Requests don't offer [`l`-diversity](https://en.wikipedia.org/wiki/L-diversity) nor [`t`-closeness](https://en.wikipedia.org/wiki/T-closeness), as frequency analysis is still easy to perform. For example, a `Prefix` matches a very popular Provider Records and a few unpopular ones. The DHT Server nodes can take a better-than-random guess when a new request is received for this `Prefix` that there is a higher chance that the Client is requesting the popular content's Provider Record compared with an unpopular one. However, the DHT Server cannot prove the the Client has accessed the popular content. + +**Provider Record Encryption** + +Provider Record Encryption also builds on top of Double Hashing. The Provider Record Encryption prevents curious DHT Servers observing a request for `Prefix` but not storing any Provider Record matching `Prefix`, to replay the request for `Prefix` and get all published keys matching `Prefix` including `HASH2` of the content accessed by the Client. It prevents all curious actors from building a global dictionary of `HASH2` to Content Providers for all content published in the IPFS public DHT. It is necessary to know the `MH` of the content (included in the `CID`) to learn about its Content Providers. + +Curious DHT Servers observing a request from `PeerID` for `Prefix` cannot associate the Client's `PeerID` with the Content Provider's `PeerID` because they cannot read the Provider Record. + +### Writer Privacy + +Writer Privacy is NOT the goal of this design. However, as a side effect, Write Privacy gets improved in some specific cases. +- Content Providers do NOT get any additional privacy from the Client fetching the data +- Content Providers can now hide to the DHT Server peers hosting their Provider Records which data they are serving, as long as the DHT Servers don't know the preimage of `HASH2`: `MH`. The DHT Servers are not able to query the content associated with the Provider Records they are storing. However, they can approximately monitor the number of requests associated to the content by observing the requests in the keysubspace matching to `Prefix` of `HASH2` of the content. DHT Servers can take an educated guess on the association of `HASH2` with the Content Provider's `PeerID`. The DHT Servers storing the Provider Record are able to share information about the Content Provider with potential accomplices. For instance, a coalition of curious DHT Servers could share with each other for each Provider Record, identified by `HASH2`, the list of Content Providers, and the number of received Prefix Requests matching `HASH2`. This results in monitoring all content advertised by all `PeerID`s and estimating the number of requests they are serving. +- Content Providers get additional privacy from curious DHT Servers observing a request, but NOT from the ones storing the Provider Record. These DHT Servers can still replay the DHT request, but are unable to discover the content and the Content Provider's `PeerID` associated with `Prefix`, because the Provider Records are encrypted using `MH`. This holds as long as the DHT Servers don't know the `MH` (or `CID`). + +### Provider Record Authenticity + +The Provider Records are now signed by the Content Provider. This prevents a malicious DHT Server from forging a Provider Record for an arbitrary key. The Clients need to verify the Signature against the Content Provider's `PeerID` and send a Bitswap request to the Content Provider only if the Signature is valid. Content Providers can only publish Provider Records for themselves. + +### Provider Records Enumeration + +Enumerating the number of Provider Records in the DHT becomes trivial thank to the Double Hashing and Prefix Requests. Knowledge of the preimage of the requested key isn't required anymore for a valid Kademlia request. Monitoring the number of Provider Records stored in the DHT is a good metric to evaluate the health of the DHT. + +### Better Kademlia Routing Table Refresh + +As knowledge of the preimage of the requested key isn't necessary in the Double Hashing DHT, nodes gain the ability to request _truly_ random keys in the DHT. + +Requesting random keys is necessary for the Kademlia Bucket Refresh Process. On refresh, if a bucket has empty slots, the node will make a request for a random forged key falling in this specific bucket. In the current implementation, as the prefix of a requested key is necessary, Kademlia uses a [list of precomputed preimages](https://github.com/libp2p/go-libp2p-kbucket/blob/master/bucket_prefixmap.go), 1 matching every 15-bits key prefix. Hence, the random forged key, is never random, its definition set is the list of precomputed preimages, and not the full keyspace. This can lead to degraded performance and security vulnerabilities. + +Double Hashing enables the nodes to select a _truly_ random key from the Kademlia keyspace (limited by the randomness algorithm) matching the appropriate bucket.The 456KB [list of precomputed preimages](https://github.com/libp2p/go-libp2p-kbucket/blob/master/bucket_prefixmap.go) can be removed from the IPFS source code, once the migration to the Double Hashing DHT is complete. + +### Simplicity + +It is generally less complex to find content in the DHT by requesting its Kademlia identifier (keyspace location), instead of requesting the preimage of its keyspace location. + +## Migration + +This design is a breaking change and requires a major DHT migration. + +**WIP** + +Alternatives for migration: +- slow breaking change (give enough time so that only a _small_ number of participants break) +- DHT duplication +- Universal DHT (WIP). + +## Threat Model + +### Reader Privacy + +The Double Hashing DHT prevents DHT Server nodes to associate a Client's `PeerID` with the Content requested by the Client. DHT Servers no longer know _which Client is accessing which content_. This protection only works as long as the DHT Servers don't know the `CID` requested by the Client. Thus, the privacy of a request depends on the secrecy of the requested `CID`. + +The proposed solution makes _association attacks_ (associating the Client's `PeerID` with the requested `CID`) much more expensive for _public content_, but doesn't make them impossible to perform. However, malicious users cannot discover _private content_, and spy on users accessing it. If Alice advertises her holiday pictures to the public IPFS DHT and privately sends the root `CID` to Bob only, no adversary can retrieve the pictures, and no adversary can learn what Bob is accessing. Only the DHT Servers serving the Provider Record to Bob know that Bob is requesting some content from Alice's `PeerID`. + +A powerful adversary could crawl all discoverable `CID`s, e.g by sniffing Bitswap broadcasts or browsing the Web to discover new `CID`s. From this list of `CID`s, the adversary can compute the `HASH2`s associated with all the `CID`s and get a mapping `HASH2` $\rightarrow$ `CID` for many `CID`s. This adversary can run many DHT Servers, and upon request for some `Prefix`, check which `HASH2` are matching the `Prefix`. Using frequency analysis, the adversary can take an educated guess on which content the client is requesting. If the requested content is unknown to the adversary, but the adversary knows its `CID`, the adversary can trivially resolve the Content Providers from the DHT, and fetch the content over Bitswap. Removing request broadcast from Bitswap would make it harder to crawl existing `CID`s, and thus would improve reader privacy in the DHT. + +DHT Servers serving the requested Provider Record to the Client has the ability to associate the Client's `PeerID` with the Content Providers `PeerID`. They can track _from which peer a Client is fetching content_. + +The Client doesn't have any privacy protection from the Content Provider serving Content over Bitswap. + +### Signed Provider Records + +Provider Records are signed in the Double Hash DHT. This implies that malicious DHT Servers serving a Provider Record can no longer forge an arbitrary Provider Record corresponding to the requested `CID`. The Client can computationally verify that the Provider Record is valid, and was created by the Provider Record that has the knowledge of `CID`. + +### DDOS Protection + +The Double Hash DHT doesn't improve DDOS (Distributed Denial Of Service) protection. Upon receiving a DHT request from a Client for a valid Provider Record, DHT Servers can decide to return a `multiaddrs` corresponding to the IP address of a `target` host, not providing the requested content. The Client will open a connection to the returned `multiaddrs` and send a Bitswap request for the content. If the `CID` that was initially requested is popular, this will generate a lot of traffic toward the `target` coming from many different Clients. + +DDOS protection can be improved in the future on the Double Hash DHT by using signed Peer Records. + +### DHT Servers Resource Exhaustion + +An adversary user could try to exhaust the DHT resources by advertising garbage Provider Records. The adversary needs to generate random bytes (_garbage_), sign them and ask DHT Server nodes to store the garbage Provider Records. DHT Server nodes cannot computationally decide whether a Provider Record is garbage or not, thus they must continue storing the Provider Records. Note that the adversary periodically needs to republish every Provider Record, which isn't trivial for a large number of Provider Records at the moment. This issue isn't mitigated in the current DHT. + +One possible mitigation could be to identify IP addresses publishing an _excessive_ number of Provider Records that are never accessed, and refusing to store more Provider Records for this IP. + +## Alternatives for DHT Reader Privacy + +Other approaches to improve Reader Privacy in the DHT mostly include Ephemeral `PeerID`s and [Mixnets](https://en.wikipedia.org/wiki/Mix_network). The first option is to use ephemeral `PeerID`s in order to escape tracking. This solution however doesn’t increase much the privacy level. It is still possible to enumerate the all `PeerID`s in the network and to associate all the `PeerID`s using the same IP addresses. Combining the Ephemeral `PeerID` approach with Double Hashing can help slighlty improve privacy. Having a different `PeerID` for the DHT Client and the DHT Server of the same IPFS node makes association of _which Content Provider requested which Content_ harder. The two `PeerID`s can still be associated as they use the same IP address, but the DHT Client cannot be discovered in a network crawl. + +Ephemeral `PeerID`s references: +- https://github.com/libp2p/libp2p/issues/37 + +The other alternative to increase the Reader Privacy level in the IPFS DHT is the use of Mixnets such as Tor or I2P. Mixnets usually provide an excellent Reader- and Writer Privacy level, but lookup latency is significantly higher. Hence the use of Mixnets is generally not good for all use cases, but only when strong privacy guarantees are required. Mixnets can easily be built on top of the Double Hash DHT to maximize user Privacy. IPFS users willing to remain pseudonymous could use the existing Tor network to hide their identity. Another alternative could be to create a Mixnet out of the IPFS network, e.g include mixing capabilities in every libp2p host. There has been some ongoing work on IPFS-Tor integration. + +Mixnets references: +- Berty's [go-libp2p-tor-transport](https://github.com/berty/go-libp2p-tor-transport) +- [Hosting an IPFS Gateway Through a Tor Proxy](https://www.minds.com/raymondsmith98/blog/tutorial-tor-hosting-an-ipfs-gateway-through-a-tor-proxy-857369540936916992) +- Mixnet and Content Routing ([IPFS Thing 2022 Video](https://www.youtube.com/watch?v=f85U8b5g-Ks), [Notes](https://hackmd.io/@nZ-twauPRISEa6G9zg3XRw/BkrcMOLd9)) by [noot](https://github.com/noot) +- [Nym Mixnet](https://nymtech.net/) +- https://github.com/ipfs/notes/issues/37 + + +## Open Questions + +- If we plan to move to using SHA3 instead of SHA2 to generate 256-bits digests, this migration is the perfect opportunity, as we will be breaking everything anyways. SHA3 was proved to be more secure against Length Extension Attacks. It has not be proven whether SHA2 or SHA3 is more collision resistant and secure against preimage attacks. See this [comparison](https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions). +- Is it wise to encrypt the `CPPeerID` using `MH` directly? It would be possible to derive another identifier from `MH` (such as `Hash("SOME_CONSTANT" || MH)`). `MH` is the master identifier of the content, hence if it is revealed all other identifiers can trivially be found. However, it is computationally impossible to recover `MH` from `Hash("SOME_CONSTANT" || MH)`. +- It may be fine to use `TS` as Nonce/IV for the Provider Record encryption (`EncPeerID = AESGCM(MH, Nonce, CPPeerID)`), it spares bytes on the wire. If `TS` is the number of milli- or nano-seconds that have passed since `1970-01-01T00:00:00`, this number easily fits in the 12 bytes IV. Moreover it is very unlikely that 2 nodes perform an encryption using the same key (for the same content) at the exact same milli- or nano-second. Using TS as nonce would spare 4 bytes (`TS` size) on the wire when publishing content to the DHT, and 4 bytes for each Provider Record matching `Prefix` for all requests. However the information about when the Provider Record was published (already known to the DHT Servers storing the Provider Record) would be publicly available. Anyone enumerating DHT Provider Records would be able to read it. +- As multiple `HASH2` match each `Prefix` and the Client is only interested in a single one, should we send the `HASH2` along with each encrypted provider record (network load overhead) or let the Client try to decrypt all payloads and see for themselves which one opens (cpu overhead)? + +## Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). + should provide a summary of changes. When adding new specification files, list all of them. ## Test fixtures From aba292b4645aaeb78bf384a66ada6edfddd90d2b Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 15 Feb 2023 16:56:47 +0100 Subject: [PATCH 36/55] fix copy paste --- IPIP/0000-double-hash-dht.md | 67 ++++++ IPIP/0000-template.md | 414 +---------------------------------- 2 files changed, 72 insertions(+), 409 deletions(-) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0000-double-hash-dht.md index 2bae3455..e490c7c7 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0000-double-hash-dht.md @@ -96,6 +96,33 @@ The following process describes the event of a client looking up a CID in the IP 11. Each DHT server confirms to Content Provider that the Provider Record has been successfully added. 12. The proces is over once Content Provider has received 20 confirmations. +```mermaid +sequenceDiagram + participant CP as Content Provider + participant DHT + participant Server as DHT Server + + Note left of CP: HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH) + + CP->>DHT: FIND_PEERS(HASH2) + DHT->>CP: [PeerID0, PeerID1, ... PeerID19] + + Note left of CP: EncPeerID = 0xa501 || Nonce || payload_len || AESGCM(MH, Nonce, CPPeerID) + Note left of CP: Signature = Sign(privkey, EncPeerID || TS) + Note left of CP: ServerKey = SHA256(bytes("CR_SERVERKEY") || MH) + + par Content Provider to the 20 closest DHT Servers to HASH2 + CP->>Server: HASH2 || EncPeerID || TS || Signature || ServerKey + + Note right of Server: Verify(pubkey, Signature, EncPeerID) &&
TS - time.now() < 48h + Note right of Server: On success, add to Provider Store:
HASH2 -> ServerKey -> CPPeerID -> [EncPeerID, TS, Signature] + + Server->>CP: Success / Error + end + + Note left of CP: Wait for 20 Successes +``` + **Lookup Process** 1. Client computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). 2. Client selects a prefix of `HASH2`, `KeyPrefix = HASH2[:l]` for a defined `l` (see [`l` selection](#prefix-length-selection)). @@ -117,6 +144,46 @@ The following process describes the event of a client looking up a CID in the IP 17. Client sends a Bitswap request for `CID` to the Content Provider (known `CPPeerID` and `multiaddrs`). 18. Content Provider sends the requested content back to Client. +```mermaid +sequenceDiagram + participant Client + participant Server as DHT Server + participant CP as Content Provider + + Note left of Client: HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH) + Note left of Client: ServerKey = SHA256(bytes("CR_SERVERKEY") || MH) + + loop in parallel until valid Provider Record found + Note left of Client: KeyPrefix = HASH2[:l] + Client->>Server: FIND_CONTENT(KeyPrefix)
Optional flags: multiaddrs, metadata + Note right of Server: message = [] + loop for each of the 20 closest PeerIDs to KeyPrefix in the Routing Table + Note right of Server: message += PeerID + end + loop for each entry matching KeyPrefix in the Provider Store + Note right of Server: EncMetadata = 0xa501 || SERVERNONCE || payload_len ||
AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs) + Note right of Server: Aggregate records per HASH2:
message += HASH2 || nb_records || EncPeerID0 || EncMetadata0 || ... || EncMetadataN + Note right of Server: Note: don't add multiaddrs nor metadata if not requested with flags + end + Note right of Server: Note: If there are more than MatchLimit entries matching KeyPrefix, drop all records and
message += "MatchLimit = MatchLimit" + Server->>Client: message + loop for all records matching HASH2 + Note left of Client: CPPeerID = Dec(MH, EncPeerID) + Note left of Client: TS || Signature || multiaddrs = Dec(ServerKey, EncMetadata) + Note left of Client: Verify(CPPeerID, Signature, EncPeerID) + end + Note left of Client: If at least 1 record is valid exit the loop + end + opt if no valid record contains multiaddrs + Client->>Server: FIND_PEER(CPPeerID) + Server->>Client: multiaddrs of CPPeerID + end + + + Client->>CP: Bitswap request for CID + CP->>Client: Content +``` + ### Prefix length selection The goal of DHT prefix requests is to provide [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) to content lookup, in addition to the pseudonimity gained from double hashing. Each DHT prefix lookup query returns an expected number of `k` Provider Records matching `KeyPrefix`, with `k` being a system parameter. The user should be able to define a custom `k` from the configuration files, according to their privacy needs. The default value `k = 8` is discussed in [Design rationale](#reader-privacy). diff --git a/IPIP/0000-template.md b/IPIP/0000-template.md index 578d9aa7..6b59247a 100644 --- a/IPIP/0000-template.md +++ b/IPIP/0000-template.md @@ -1,7 +1,7 @@ -# IPIP 0000: InterPlanetary Improvement Proposal Template +# IPIP-0: InterPlanetary Improvement Proposal Template - - Start Date: YYYY-MM-DD @@ -29,411 +29,7 @@ Describe the proposed solution and list all changes made to the specs repository The resulting specification should be detailed enough to allow competing, interoperable implementations. -When modifying an existing specification file, this # IPIP 0000: Double Hash DHT - -![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) -- DRI: [Guillaume Michel](https://github.com/guillaumemichel) -- Start Date: 2023-01-18 -- Related Resources: - - [Specs in Notion](https://pl-strflt.notion.site/Double-Hashing-for-Privacy-ff44e3156ce040579289996fec9af609) - - [WIP Implementation](https://github.com/ChainSafe/go-libp2p-kad-dht) - - https://github.com/ipfs/specs/pull/334 - - https://github.com/ipfs/specs/issues/345 - -## Summary - -This IPIP contains the up-to-date Spec of the IPFS Double Hash DHT. The Double Hashing DHT aims at providing some Reader Privacy guarantees to the IPFS DHT. - -This document is still WIP, all feedback is more than welcome. Make sure to write your thoughts about the [open questions](#open-questions) in the PR. - -## Table of Contents - -1. [Motivation](#motivation) -2. [Detailed Design](#detailed-design) -3. [Design Rationale](#design-rationale) -4. [User benefits](#user-benefits) -5. [Migration](#migration) -6. [Threat Model](#threat-model) -7. [Alternatives](#alternatives-for-dht-reader-privacy) -8. [Open Questions](#open-questions) - -## Motivation - -IPFS is currently lacking of many privacy protections. One of its principal weaknesses currently lies in the lack of privacy protections for the DHT content routing subsystem. Currently in the IPFS DHT, neither readers (clients retrieving content) nor writers (hosts storing and distributing content) have much privacy with regard to content they consume or publish. It is trivial for a DHT server node to associate the requestor's identity with the accessed content during the routing process. A curious DHT server node, can request the same CIDs to find out what content other users are consuming. Improving privacy in the IPFS DHT has been a strong request from the community for some time. - -The changes described in this document introduce a DHT privacy upgrade boosting the reader’s privacy. It will prevent DHT tracking as described above, and add Provider Records Authentication. The proposed modifications also add a slight Writer Privacy improvement as a side effect. - -## Detailed Design - -**Magic Values** -- bytes("CR_DOUBLEHASH") -- bytes("CR_SERVERKEY") -- AES-GCM [varint](https://github.com/multiformats/multicodec/pull/314): `aes-gcm-256 = 0xa501` -- Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x5601` -- A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). -- Provider Record Timestamp (`TS`) validity period: `48h` - -### Definitions - -- **`CID`** is the IPFS [Content IDentifier](https://github.com/multiformats/cid) -- **`MH`** is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the digest of a hash function over some content. `MH` is represented as a 32-byte array. -- **`HASH2`** is defined as `SHA256(bytes("CR_DOUBLEHASH") || MH)`. It represents the location of the Kademlia keyspace for the Provider Record associated with `CID`. `HASH2` is represented as a 32-byte array. `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)`. -- **Content Provider** is the node storing some content, and advertising it to the DHT. -- **DHT Servers** are nodes running the IPFS public DHT. In this documents, DHT Servers mostly refer to the DHT Servers storing the Provider Records associated with specific `CID`s, and not the DHT Servers helping routing lookup requests to the right keyspace location. -- **Client** is an IPFS client looking up a content identified by an already known `CID`. -- **Publish Process** is the process of the Content Provider communicating to the DHT Servers that it provides some content identified by `CID`. -- **Lookup Process** is the process of the Client retrieving the content identified by `CID`. -- **`PeerID`** s define stable [peer identities](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md). The `PeerID` is derived from the node's cryptographic public key. -- **`multiaddrs`** are the [network addresses](https://github.com/libp2p/specs/tree/master/addressing) associated with a `PeerID`. It represents the location(s) of the peer. -- **`KeyPrefix`** is defined as a prefix of length `l` bits of `HASH2`. `KeyPrefix` is represented by a `byte` concatenated with a variable sized array of bytes, containing at most 32 bytes. The leading `byte` represents the binary representation of `l - 1`, making prefixes of length `256` possible, but not prefixes of length `0`. The trailing byte array is of length `ceil(l/8)` bytes, and its content is the bits prefix right padded with zeros. -- **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use `ServerKey` to encrypt `TS`, `Signature` and Content Providers `multiaddrs` sent to the Client when some Provider Records match the requested `Prefix`. `ServerKey` is represented as a 32-byte array. -- **`TS`** is the [Unix Timestamp](https://en.wikipedia.org/wiki/Unix_time) corresponding content publish time. `TS` is represented as a 32-bit **unsigned** Integer, allowing timestamps to range from `1970-01-01T00:00:00Z` to `2106-02-07T06:28:15Z` before reaching the overflow. -- **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. -- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/pull/314) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0xa501`, `Nonce`, `payload_len`, `AESGCM(MH, Nonce, CPPeerID)`]. -- **`Signature`** is the signature of the `EncPeerID` encrypted payload (not including the varint nor the nonce) and `TS` using the Content Provider's private key, either with ed25519 or rsa signature algorithm, depending on the keys of the Content Provider. -- **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists on the following fields: [`EncPeerID`, `TS`, `Signature`]. -- **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a nested dictionary/map: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. There is only one single correct `ServerKey` for each `HASH2`. However, any peer can forge a valid Publish request (with invalid `EncPeerID` but valid `Signature`) undetected by the DHT Server. The DHT server isn't able to distinguish which `ServerKey` is correct as it doesn't have the knowledge of `MH`, hence it has to keep both and serve both upon request for `HASH2`. - -### Current DHT - -The following process describes the event of a client looking up a CID in the IPFS DHT: -1. Client computes `Hash(MH)` (`MH` is the MultiHash included in the CID). -2. Client looks for the closest peers to `Hash(MH)` in XOR distance in its Routing Table. -3. Client sends a DHT lookup request for `CID` to these DHT servers. -4. Upon receiving the request, the DHT servers search if there is an entry for `MH` in their Provider Store. If yes, go to 10. Else continue. -5. DHT servers compute `Hash(MH)`. -6. DHT servers find the 20 closest peers to `Hash(MH)` in XOR distance in their Routing Table. -7. DHT servers return the 20 `PeerID`s and `multiaddrs` of these peers to Client. -8. Client sends a DHT lookup request for `CID` to the closest peers in XOR distance to `Hash(MH)` that it received. -9. Go to 4. -10. The DHT servers storing the Provider Record(s) associated with `MH` send them to Client. (Currently, if a Provider Record has been published less than 30 min before being requested, the DHT servers also send the `multiaddresses` of the Content Provider to Client). -11. If the response from the DHT server doesn't include the `multiaddrs` associated with the Content Providers' `PeerID`s, Client performs a DHT `FindPeer` request to find the `multiaddrs` of the returned `PeerID`s. -12. Client sends a Bitswap request for `CID` to the Content Provider (known `PeerID` and `multiaddrs`). -13. Content Provider sends the requested content back to Client. - -### Double Hash DHT design - -**Publish Process** -1. Content Provider wants to publish some content with identifier `CID`. -2. Content Provider computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). -3. Content Provider starts a DHT lookup request for the 20 closest `PeerID`s in XOR distance to `HASH2`. -4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = [0xa501, Nonce, payload_len, AESGCM(MH, Nonce, CPPeerID)]` -5. Content Provider takes the current timestamp `TS`. -6. Content Provider signs `EncPeerID` and `TS` using its private key. `Signature = Sign(privkey, EncPeerID || TS)` -7. Content Provider computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. -8. Once the lookup request has returned the 20 closest peers, Content Provider sends a Publish request to these DHT servers. The Publish request contains [`HASH2`, `EncPeerID`, `TS`, `Signature`, `ServerKey`]. -9. Each DHT server verifies `Signature` against the `PeerID` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is younger than `48h` and isn't in the future. If invalid, send an error to the client. -10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `PeerID` of the Content Provider (see [provider store](#provider-store)). If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. -11. Each DHT server confirms to Content Provider that the Provider Record has been successfully added. -12. The proces is over once Content Provider has received 20 confirmations. - -```mermaid -sequenceDiagram - participant CP as Content Provider - participant DHT - participant Server as DHT Server - - Note left of CP: HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH) - - CP->>DHT: FIND_PEERS(HASH2) - DHT->>CP: [PeerID0, PeerID1, ... PeerID19] - - Note left of CP: EncPeerID = 0xa501 || Nonce || payload_len || AESGCM(MH, Nonce, CPPeerID) - Note left of CP: Signature = Sign(privkey, EncPeerID || TS) - Note left of CP: ServerKey = SHA256(bytes("CR_SERVERKEY") || MH) - - par Content Provider to the 20 closest DHT Servers to HASH2 - CP->>Server: HASH2 || EncPeerID || TS || Signature || ServerKey - - Note right of Server: Verify(pubkey, Signature, EncPeerID) &&
TS - time.now() < 48h - Note right of Server: On success, add to Provider Store:
HASH2 -> ServerKey -> CPPeerID -> [EncPeerID, TS, Signature] - - Server->>CP: Success / Error - end - - Note left of CP: Wait for 20 Successes -``` - -**Lookup Process** -1. Client computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). -2. Client selects a prefix of `HASH2`, `KeyPrefix = HASH2[:l]` for a defined `l` (see [`l` selection](#prefix-length-selection)). -2. Client finds the closest `PeerID`s to `HASH2` in XOR distance in its Routing Table. -3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. -4. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. -5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || 0xa501 || SERVERNONCE || payload_len || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. -7. The DHT servers send `message` to Client. -8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. -9. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 12. -10. If the DHT Server's `MatchLimit` and number of matching `HASH2`s was included in the `message`, Client makes multiple DHT lookup requests for longer prefixes (e.g `KeyPrefix||0` and `KeyPrefix||1`). Else Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. -11. Go to 4. -12. For each decrypted payload, Client decrypts `CPPeerID = Dec(MH, EncPeerID)`. -13. Client verifies that `Signature` verifies with `CPPeerID`: `Verify(CPPeerID, Signature, EncPeerID || TS)`. -14. Client checks that `TS` is younger than `48h`. -15. If none of the decrypted payloads is valid, go to 4. -16. If the decrypted payload doesn't include the `multiaddrs` associated with `CPPeerID`, Client performs a DHT `FindPeer` request to find the `multiaddrs` associated with `CPPeerID`. -17. Client sends a Bitswap request for `CID` to the Content Provider (known `CPPeerID` and `multiaddrs`). -18. Content Provider sends the requested content back to Client. - -```mermaid -sequenceDiagram - participant Client - participant Server as DHT Server - participant CP as Content Provider - - Note left of Client: HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH) - Note left of Client: ServerKey = SHA256(bytes("CR_SERVERKEY") || MH) - - loop in parallel until valid Provider Record found - Note left of Client: KeyPrefix = HASH2[:l] - Client->>Server: FIND_CONTENT(KeyPrefix)
Optional flags: multiaddrs, metadata - Note right of Server: message = [] - loop for each of the 20 closest PeerIDs to KeyPrefix in the Routing Table - Note right of Server: message += PeerID - end - loop for each entry matching KeyPrefix in the Provider Store - Note right of Server: EncMetadata = 0xa501 || SERVERNONCE || payload_len ||
AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs) - Note right of Server: Aggregate records per HASH2:
message += HASH2 || nb_records || EncPeerID0 || EncMetadata0 || ... || EncMetadataN - Note right of Server: Note: don't add multiaddrs nor metadata if not requested with flags - end - Note right of Server: Note: If there are more than MatchLimit entries matching KeyPrefix, drop all records and
message += "MatchLimit = MatchLimit" - Server->>Client: message - loop for all records matching HASH2 - Note left of Client: CPPeerID = Dec(MH, EncPeerID) - Note left of Client: TS || Signature || multiaddrs = Dec(ServerKey, EncMetadata) - Note left of Client: Verify(CPPeerID, Signature, EncPeerID) - end - Note left of Client: If at least 1 record is valid exit the loop - end - opt if no valid record contains multiaddrs - Client->>Server: FIND_PEER(CPPeerID) - Server->>Client: multiaddrs of CPPeerID - end - - - Client->>CP: Bitswap request for CID - CP->>Client: Content -``` - -### Prefix length selection - -The goal of DHT prefix requests is to provide [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) to content lookup, in addition to the pseudonimity gained from double hashing. Each DHT prefix lookup query returns an expected number of `k` Provider Records matching `KeyPrefix`, with `k` being a system parameter. The user should be able to define a custom `k` from the configuration files, according to their privacy needs. The default value `k = 8` is discussed in [Design rationale](#reader-privacy). - -The prefix `l` is derived from `k` and the number of CIDs published to the DHT: $l \leftarrow{} log_2(\frac{\\#CIDs}{k})$. However, the total number of CIDs published to the DHT can be hard to approximate, and the initial `l` value can be determined by approximation and dichotomy. At the first startup, the node looks up for random keys starting with a `l = 26`. Then, by dichotomy it adapts `l` so that a lookup for a prefix of length `l` matches on average ~`k` Provider Records. - -Each node keeps track of the number of `HASH2` matching the last `KeyPrefix` requested in the last 128 lookups. `a` is defined as the average number of matches for the last 128 requests. At any point in time, if $a \gt 2\times k$, then `l` should increase (`l = l + 1`), and if $a \lt \frac{k}{2}$, then `l` should decrease (`l = l - 1`). On node shutdown, `a` is saved on disk, allowing a quick restart with an accurate `l` value. - -Note that DHT Servers can set an upperbound on the number of Provider Records they serve for each lookup request. So a too small `l` may result in not discovering the target Provider Record. - -**Prefix magic numbers** -- `k`-anonymity privacy parameter, by default `k = 8` -- Size of moving average of number of Provider Records matching a prefix: `128` -- Initial prefix length: `26`. There are currently ~850M distinct CIDs published in the DHT ([source](https://pl-strflt.notion.site/2022-09-20-Hydras-Analysis-5db53b6af3e04a46aaf7a776e65ae97d)). $log_2(\frac{850M}{8})=26.663$. As the number of CIDs in the network grows exponentially, the prefix length is expected to decrease linearly for a constant `k`. - -### _Closest_ keys to a key prefix - -Computing the XOR distance between two binary bitstrings of different lengths isn't possible. Hence finding the N closest keys to a key prefix in the Kademlia keyspace doesn't make sense. We can however find the keys matching the prefix (e.g `prefix == key[:l]` for $key \in \{0, 1\}^{256}, prefix \in \{0, 1\}^{l}, l \leq 256$), and the keys _close_ from matching the prefix. Randomness is used as tie breaker. - -The following pseudo-code defines the algorithm to find `N` keys matching or _close_ from matching a prefix. The main idea is to truncate the leaves of the Kademlia trie to the length of the prefix `l`. If `M` keys match prefix, for $M \ge N$, then `N` keys must be picked at random among the `M` candidates. If `M` keys match prefix, for $M \lt N$, we must still find `Q = N - M` keys. We iterate on the truncated Kademlia leaves of depth `l` ordered by XOR distance to `prefix`, starting from the closest. Supposing there are `P` keys in the current truncated Kademlia leaf, and that we are missing `Q` keys, if $P \ge Q$, we select `Q` keys at random among the `P` candidates, otherwise, if $P \lt Q$ we take the `P` keys, set `Q = Q - P` and iterate on the following leaf until we find `N` keys. - -``` -func closest_to_match(prefix, N, all_keys) { - selected_keys = [] - l = len(prefix) // len(prefix) if the bit length of the prefix - - // iterate on all prefixes of length l from closest to furthest from 'prefix' - for counter = 0; len(selected_keys) < N && counter < 2**l; counter += 1 { - - leaf = prefix XOR binary(counter, l) - // binary(x, l) gives the binary representation of a number x, on l bits - - // get all keys matching to the prefix 'leaf' - matching_keys = find_matching_keys(leaf, all_keys) - - // add at most (N-len(selected_keys)) to selected_keys - if len(matching_keys) <= N - len(selected_keys) { - selected_keys += matching_keys - } else { - random_selection = select_N_random(matching_keys, N - len(selected_keys)) - selected_keys += random_selection - } - } - return selected_keys -} -``` - -## Design rationale - -### Cryptographic algorithms - -**SHA256** - -SHA256 is the algorithm currently in use in IPFS to generate 256-bits digests used as Kademlia identifiers. Note that SHA256 refers to the algorithm of [SHA2](https://en.wikipedia.org/wiki/SHA-2) algorithm with a 256 bits digest size. - -A future change of Cryptographic Hash Function will require a _DHT Migration_ as the Provider Records _location_ in the Kademlia keyspace will change, for they are defined by the Hash Function. It means that all Provider Records must be published using both the new and the old hash function for the transition period. We want to avoid performing theses migrations as much as possible, but we must be ready for it as it is likely to happen in the lifespan of IPFS. - -Changing the Hash function used to derive `ServerKey` requires the DHT Server to support multiple Provider Records indexed by a different `ServerKey` for the same `HASH2` for the migration period. - -**AESGCM** - -[AESGCM](https://en.wikipedia.org/wiki/Galois/Counter_Mode) (Advanced Encryption Standard in Galois/Counter Mode) is a AEAD (Authenticated Encryption with Associated Data) mode of operation for symmetric-key cryptographic block ciphers which is widely adopted for its performance. It takes as input an Initialization Vector (IV) that needs to be unique (Nonce) for each encryption performed with the same key. This algorithm was selected for its security, its performance and its large industry adoption. - -The nonce size is set to `12` (default for AES GCM). AESGCM is used with encryption keys of 256 bits (SHA256 digests in this context). - -A change in the encryption algorithm of the Provider Record implies that the Content Providers must publish 2 Provider Records, one with each encryption scheme. The Client and the DHT Server learn which encryption algorithm has been used by the Content Provider from the `varint` contained in `EncPeerID`. When a new encryption algorithm DHT servers may need to store multiple Provider Records in its Provider Store for the same `HASH2` and the same `CPPeerID`. We restrict the number of Provider Record for each pair (`HASH2`, `CPPeerID`) to `3` (the `varint`s must be distinct), in order to allow some flexibility, while keeping the potential number of _garbage_ Provider Records published by hostile nodes low. - -A change in the encryption algorithm used between the DHT Server and the Client (Lookup step 7.) means that the Client and the DHT Server must negotiate the encryption algorithm, as long as it still uses a 256-bits key. - -**Signature scheme** - -The signature scheme is the default one from libp2p. The available algorithms are available [here](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#key-types) We use the private key, from which the node's `PeerID` is derived to sign `(EncPeerID || TS)`. Every node with the knowledge of the signing `peerid` can verify the signature. - -```go -privKey := host.Peerstore().PrivKey(host.ID()) -signature, err := privKey.Sign(data) -``` - -### Provider Store - -The data structure of the DHT Servers' Provider Store is a nested key-value store whose structure is: `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`]. - -The same `HASH2` always produces the same `ServerKey` (as long as the same Hashing Algorithm was used), as both `HASH2` and `ServerKey` result in a deterministic hash operation on `MH` prepended with a constant prefix. So if all peers are honest, each `HASH2` should be associated with a single `ServerKey`. - -However, a misbehaving node could publish an advertisement for `HASH2` while not knowing `MH`, and forge a random `ServerKey`. The DHT Server not knowing `MH` cannot determine whether a `ServerKey` is valid and hence need to keep all different `ServerKey`s. The only reason a misbehaving peer would want to publish forged `ServerKey`s is to exhaust the storage resources of a specific target DHT Server, as Clients detect invalid Provider Records. DHT Servers store at most `3` different `ServerKey` for each `CPPeerID`, limiting the resource exhaustion attack while allowing some agility when changing the Hash function. - -Content can be provided by multiple Content Providers, hence `HASH2` -> `ServerKey` potentially maps to multiple `CPPeerID`s, each Content Provider having its own Provider Record. As the `CPPeerID` is obtained from the open libp2p connection, we assume that it is not possible to impersonate another `CPPeerID`. Each Content Provider can have a single Provider Record for each `HASH2`, and for each available `varint`. During a migration, we expect to have multiple Provider Records for the same pair (`HASH2`, `CPPeerID`), the Provider Store keeps 1 Provider Records for each distinct (`HASH2`, `CPPeerID`, `varint`) with a maximum of `3` per pair (`HASH2`, `CPeerID`). If there are more than 3 candidates, the ones with the lowest `TS` are discarded. - -When a Content Provider republishes a Provider Record, the DHT Server only keeps the valid Provider Record whose `TS` is the largest value, for the given `varint`. We expect to have a single `varint` in use most of the time. DHT Servers drop all Provider Records from published by the same `CPPeerID` with the same `HASH2` but multiple different `ServerKey`s. A well behaving node can compute the right `ServerKey` and doesn't try to exhaust the storage resources of the DHT Server. Only a misbehaving node forges invalid `ServerKey`s, and if multiple `ServerKey`s are associated with the same (`HASH2`, `CPPeerID`) it implies that at least one of the two `ServerKey` is incorrect, so the Content Provider is misbehaving. - -### `k`-anonymity - -Default: `k = 8`. -Default: `MatchLimit = 64`. - -The `k`-anonymity parameter `k` is user defined, it can be modified in the configuration files. Users requiring a higher level of privacy can increase their value of `k`. `8` is deemed to be private enough for standard IPFS users, while limiting the overhead in packet size of the DHT Server response to 8x. - -The `MatchLimit` prevents malformed or malicious requests to match all Provider Records that a DHT Server is providing at once. A Client can still fetch all Provider Records matching any `KeyPrefix`, but it must perform multiple DHT lookup requests for enough prefixes to the DHT Server. The `MatchLimit` protects the Server from having to send large amounts of data at once. `64` is already a large value, given that each `HASH2` can be associated with multiple Provider Records, one for each Content Provider, and the multiaddresses of all Content Providers can be sent along. The DHT provides _on average_ at most `64-anonymity` out-of-the-box and a better privacy level can be reached by sending multiple requests. - -## User benefits - -### Reader Privacy - -**Double Hashing** - -Currently any DHT Server observing a request can associate the Client's `PeerID` with the requested `CID`. If the `CID` is not already known, curious DHT Servers observing a DHT request can replay the request, and retrieve the content that the client is accessing, which is a significant privacy concern. Using `HASH2` as DHT Content Identifier prevents curious DHT Servers not knowing `MH`, the preimage of `HASH2` from retrieving the content associated with `HASH2`. Curious DHT Servers can still replay the DHT request for `HASH2` and find the Content Providers. However, they are not able to make a valid Bitswap request to the Content Providers, for they don't have the knowledge of the Content Identifier used by Bitswap (`CID`) for the content being identified by `HASH2` in the DHT. - -Double Hashing is also necessary for Prefix Requests and Provider Record Encryption. - -**Prefix Requests** - -A Prefix Request consists in requesting a Prefix of a key, instead of a full length Kademlia key. A Prefix corresponds to a branch of the binary trie, and potentially matches multiple existing keys. Prefix Request Routing works exactly like the normal Kademlia Routing, hence a DHT Prefix Request eventually always converges. The goal of Prefix Requests is to match multiple Provider Records for a single request. Instead of requesting `HASH2` the Client now requests `Prefix`, a prefix of `HASH2` of length `l` bits, and the DHT Server storing the Provider Records matching to `Prefix` doesn't know exactly which content is accessed and returns all Provider Records whose `HASH2` matches `Prefix`. With Double Hashing, curious DHT Servers cannot associate `CID` with the requester `PeerID` anymore, but they can associate `HASH2` with `PeerID`. Prefix Requests make it harder for curious DHT Servers to associate `PeerID` to a specific `HASH2`, as they only learn a `Prefix` of `HASH2`. - -This provides [`k`-anonymity](https://en.wikipedia.org/wiki/K-anonymity) when a curious DHT Server tries to associate the Client's `PeerID` with the requested `HASH2`, with `k` defined as the average number of Provider Records matching a Prefix of length `l`. `k` is a system parameter and defines the `k`-anonymity level, and `l` is derived from `k` (see [Prefix Length Selection](#prefix-length-selection)). Prefix Request also enables [Plausible Deniability](https://en.wikipedia.org/wiki/Deniable_encryption) for the Client. The DHT Server cannot prove that a Client identified by its `PeerID` or `IP Address` tried to access some content identified by its `HASH2`. - -However Prefix Requests don't offer [`l`-diversity](https://en.wikipedia.org/wiki/L-diversity) nor [`t`-closeness](https://en.wikipedia.org/wiki/T-closeness), as frequency analysis is still easy to perform. For example, a `Prefix` matches a very popular Provider Records and a few unpopular ones. The DHT Server nodes can take a better-than-random guess when a new request is received for this `Prefix` that there is a higher chance that the Client is requesting the popular content's Provider Record compared with an unpopular one. However, the DHT Server cannot prove the the Client has accessed the popular content. - -**Provider Record Encryption** - -Provider Record Encryption also builds on top of Double Hashing. The Provider Record Encryption prevents curious DHT Servers observing a request for `Prefix` but not storing any Provider Record matching `Prefix`, to replay the request for `Prefix` and get all published keys matching `Prefix` including `HASH2` of the content accessed by the Client. It prevents all curious actors from building a global dictionary of `HASH2` to Content Providers for all content published in the IPFS public DHT. It is necessary to know the `MH` of the content (included in the `CID`) to learn about its Content Providers. - -Curious DHT Servers observing a request from `PeerID` for `Prefix` cannot associate the Client's `PeerID` with the Content Provider's `PeerID` because they cannot read the Provider Record. - -### Writer Privacy - -Writer Privacy is NOT the goal of this design. However, as a side effect, Write Privacy gets improved in some specific cases. -- Content Providers do NOT get any additional privacy from the Client fetching the data -- Content Providers can now hide to the DHT Server peers hosting their Provider Records which data they are serving, as long as the DHT Servers don't know the preimage of `HASH2`: `MH`. The DHT Servers are not able to query the content associated with the Provider Records they are storing. However, they can approximately monitor the number of requests associated to the content by observing the requests in the keysubspace matching to `Prefix` of `HASH2` of the content. DHT Servers can take an educated guess on the association of `HASH2` with the Content Provider's `PeerID`. The DHT Servers storing the Provider Record are able to share information about the Content Provider with potential accomplices. For instance, a coalition of curious DHT Servers could share with each other for each Provider Record, identified by `HASH2`, the list of Content Providers, and the number of received Prefix Requests matching `HASH2`. This results in monitoring all content advertised by all `PeerID`s and estimating the number of requests they are serving. -- Content Providers get additional privacy from curious DHT Servers observing a request, but NOT from the ones storing the Provider Record. These DHT Servers can still replay the DHT request, but are unable to discover the content and the Content Provider's `PeerID` associated with `Prefix`, because the Provider Records are encrypted using `MH`. This holds as long as the DHT Servers don't know the `MH` (or `CID`). - -### Provider Record Authenticity - -The Provider Records are now signed by the Content Provider. This prevents a malicious DHT Server from forging a Provider Record for an arbitrary key. The Clients need to verify the Signature against the Content Provider's `PeerID` and send a Bitswap request to the Content Provider only if the Signature is valid. Content Providers can only publish Provider Records for themselves. - -### Provider Records Enumeration - -Enumerating the number of Provider Records in the DHT becomes trivial thank to the Double Hashing and Prefix Requests. Knowledge of the preimage of the requested key isn't required anymore for a valid Kademlia request. Monitoring the number of Provider Records stored in the DHT is a good metric to evaluate the health of the DHT. - -### Better Kademlia Routing Table Refresh - -As knowledge of the preimage of the requested key isn't necessary in the Double Hashing DHT, nodes gain the ability to request _truly_ random keys in the DHT. - -Requesting random keys is necessary for the Kademlia Bucket Refresh Process. On refresh, if a bucket has empty slots, the node will make a request for a random forged key falling in this specific bucket. In the current implementation, as the prefix of a requested key is necessary, Kademlia uses a [list of precomputed preimages](https://github.com/libp2p/go-libp2p-kbucket/blob/master/bucket_prefixmap.go), 1 matching every 15-bits key prefix. Hence, the random forged key, is never random, its definition set is the list of precomputed preimages, and not the full keyspace. This can lead to degraded performance and security vulnerabilities. - -Double Hashing enables the nodes to select a _truly_ random key from the Kademlia keyspace (limited by the randomness algorithm) matching the appropriate bucket.The 456KB [list of precomputed preimages](https://github.com/libp2p/go-libp2p-kbucket/blob/master/bucket_prefixmap.go) can be removed from the IPFS source code, once the migration to the Double Hashing DHT is complete. - -### Simplicity - -It is generally less complex to find content in the DHT by requesting its Kademlia identifier (keyspace location), instead of requesting the preimage of its keyspace location. - -## Migration - -This design is a breaking change and requires a major DHT migration. - -**WIP** - -Alternatives for migration: -- slow breaking change (give enough time so that only a _small_ number of participants break) -- DHT duplication -- Universal DHT (WIP). - -## Threat Model - -### Reader Privacy - -The Double Hashing DHT prevents DHT Server nodes to associate a Client's `PeerID` with the Content requested by the Client. DHT Servers no longer know _which Client is accessing which content_. This protection only works as long as the DHT Servers don't know the `CID` requested by the Client. Thus, the privacy of a request depends on the secrecy of the requested `CID`. - -The proposed solution makes _association attacks_ (associating the Client's `PeerID` with the requested `CID`) much more expensive for _public content_, but doesn't make them impossible to perform. However, malicious users cannot discover _private content_, and spy on users accessing it. If Alice advertises her holiday pictures to the public IPFS DHT and privately sends the root `CID` to Bob only, no adversary can retrieve the pictures, and no adversary can learn what Bob is accessing. Only the DHT Servers serving the Provider Record to Bob know that Bob is requesting some content from Alice's `PeerID`. - -A powerful adversary could crawl all discoverable `CID`s, e.g by sniffing Bitswap broadcasts or browsing the Web to discover new `CID`s. From this list of `CID`s, the adversary can compute the `HASH2`s associated with all the `CID`s and get a mapping `HASH2` $\rightarrow$ `CID` for many `CID`s. This adversary can run many DHT Servers, and upon request for some `Prefix`, check which `HASH2` are matching the `Prefix`. Using frequency analysis, the adversary can take an educated guess on which content the client is requesting. If the requested content is unknown to the adversary, but the adversary knows its `CID`, the adversary can trivially resolve the Content Providers from the DHT, and fetch the content over Bitswap. Removing request broadcast from Bitswap would make it harder to crawl existing `CID`s, and thus would improve reader privacy in the DHT. - -DHT Servers serving the requested Provider Record to the Client has the ability to associate the Client's `PeerID` with the Content Providers `PeerID`. They can track _from which peer a Client is fetching content_. - -The Client doesn't have any privacy protection from the Content Provider serving Content over Bitswap. - -### Signed Provider Records - -Provider Records are signed in the Double Hash DHT. This implies that malicious DHT Servers serving a Provider Record can no longer forge an arbitrary Provider Record corresponding to the requested `CID`. The Client can computationally verify that the Provider Record is valid, and was created by the Provider Record that has the knowledge of `CID`. - -### DDOS Protection - -The Double Hash DHT doesn't improve DDOS (Distributed Denial Of Service) protection. Upon receiving a DHT request from a Client for a valid Provider Record, DHT Servers can decide to return a `multiaddrs` corresponding to the IP address of a `target` host, not providing the requested content. The Client will open a connection to the returned `multiaddrs` and send a Bitswap request for the content. If the `CID` that was initially requested is popular, this will generate a lot of traffic toward the `target` coming from many different Clients. - -DDOS protection can be improved in the future on the Double Hash DHT by using signed Peer Records. - -### DHT Servers Resource Exhaustion - -An adversary user could try to exhaust the DHT resources by advertising garbage Provider Records. The adversary needs to generate random bytes (_garbage_), sign them and ask DHT Server nodes to store the garbage Provider Records. DHT Server nodes cannot computationally decide whether a Provider Record is garbage or not, thus they must continue storing the Provider Records. Note that the adversary periodically needs to republish every Provider Record, which isn't trivial for a large number of Provider Records at the moment. This issue isn't mitigated in the current DHT. - -One possible mitigation could be to identify IP addresses publishing an _excessive_ number of Provider Records that are never accessed, and refusing to store more Provider Records for this IP. - -## Alternatives for DHT Reader Privacy - -Other approaches to improve Reader Privacy in the DHT mostly include Ephemeral `PeerID`s and [Mixnets](https://en.wikipedia.org/wiki/Mix_network). The first option is to use ephemeral `PeerID`s in order to escape tracking. This solution however doesn’t increase much the privacy level. It is still possible to enumerate the all `PeerID`s in the network and to associate all the `PeerID`s using the same IP addresses. Combining the Ephemeral `PeerID` approach with Double Hashing can help slighlty improve privacy. Having a different `PeerID` for the DHT Client and the DHT Server of the same IPFS node makes association of _which Content Provider requested which Content_ harder. The two `PeerID`s can still be associated as they use the same IP address, but the DHT Client cannot be discovered in a network crawl. - -Ephemeral `PeerID`s references: -- https://github.com/libp2p/libp2p/issues/37 - -The other alternative to increase the Reader Privacy level in the IPFS DHT is the use of Mixnets such as Tor or I2P. Mixnets usually provide an excellent Reader- and Writer Privacy level, but lookup latency is significantly higher. Hence the use of Mixnets is generally not good for all use cases, but only when strong privacy guarantees are required. Mixnets can easily be built on top of the Double Hash DHT to maximize user Privacy. IPFS users willing to remain pseudonymous could use the existing Tor network to hide their identity. Another alternative could be to create a Mixnet out of the IPFS network, e.g include mixing capabilities in every libp2p host. There has been some ongoing work on IPFS-Tor integration. - -Mixnets references: -- Berty's [go-libp2p-tor-transport](https://github.com/berty/go-libp2p-tor-transport) -- [Hosting an IPFS Gateway Through a Tor Proxy](https://www.minds.com/raymondsmith98/blog/tutorial-tor-hosting-an-ipfs-gateway-through-a-tor-proxy-857369540936916992) -- Mixnet and Content Routing ([IPFS Thing 2022 Video](https://www.youtube.com/watch?v=f85U8b5g-Ks), [Notes](https://hackmd.io/@nZ-twauPRISEa6G9zg3XRw/BkrcMOLd9)) by [noot](https://github.com/noot) -- [Nym Mixnet](https://nymtech.net/) -- https://github.com/ipfs/notes/issues/37 - - -## Open Questions - -- If we plan to move to using SHA3 instead of SHA2 to generate 256-bits digests, this migration is the perfect opportunity, as we will be breaking everything anyways. SHA3 was proved to be more secure against Length Extension Attacks. It has not be proven whether SHA2 or SHA3 is more collision resistant and secure against preimage attacks. See this [comparison](https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions). -- Is it wise to encrypt the `CPPeerID` using `MH` directly? It would be possible to derive another identifier from `MH` (such as `Hash("SOME_CONSTANT" || MH)`). `MH` is the master identifier of the content, hence if it is revealed all other identifiers can trivially be found. However, it is computationally impossible to recover `MH` from `Hash("SOME_CONSTANT" || MH)`. -- It may be fine to use `TS` as Nonce/IV for the Provider Record encryption (`EncPeerID = AESGCM(MH, Nonce, CPPeerID)`), it spares bytes on the wire. If `TS` is the number of milli- or nano-seconds that have passed since `1970-01-01T00:00:00`, this number easily fits in the 12 bytes IV. Moreover it is very unlikely that 2 nodes perform an encryption using the same key (for the same content) at the exact same milli- or nano-second. Using TS as nonce would spare 4 bytes (`TS` size) on the wire when publishing content to the DHT, and 4 bytes for each Provider Record matching `Prefix` for all requests. However the information about when the Provider Record was published (already known to the DHT Servers storing the Provider Record) would be publicly available. Anyone enumerating DHT Provider Records would be able to read it. -- As multiple `HASH2` match each `Prefix` and the Client is only interested in a single one, should we send the `HASH2` along with each encrypted provider record (network load overhead) or let the Client try to decrypt all payloads and see for themselves which one opens (cpu overhead)? - -## Copyright - -Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). - should provide a +When modifying an existing specification file, this section should provide a summary of changes. When adding new specification files, list all of them. ## Test fixtures @@ -469,4 +65,4 @@ Describe alternate designs that were considered and related work. ### Copyright -Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). \ No newline at end of file From 42519038b7122c8743a4bf7e100069db3e8b6b9b Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 15 Feb 2023 16:58:17 +0100 Subject: [PATCH 37/55] reverting template --- IPIP/0000-template.md | 29 +++-------------------------- 1 file changed, 3 insertions(+), 26 deletions(-) diff --git a/IPIP/0000-template.md b/IPIP/0000-template.md index 6b59247a..3895cc13 100644 --- a/IPIP/0000-template.md +++ b/IPIP/0000-template.md @@ -1,66 +1,43 @@ -# IPIP-0: InterPlanetary Improvement Proposal Template +# IPIP 0000: InterPlanetary Improvement Proposal Template - - Start Date: YYYY-MM-DD - Related Issues: - (add links here) - ## Summary - This is the suggested template for new IPIPs. - ## Motivation - AKA Problem Statement - Clearly explain why the existing protocol specification is inadequate to address the problem that the IPIP solves. - ## Detailed design - AKA Solution Proposal - Describe the proposed solution and list all changes made to the specs repository. - The resulting specification should be detailed enough to allow competing, interoperable implementations. - When modifying an existing specification file, this section should provide a summary of changes. When adding new specification files, list all of them. - ## Test fixtures - List relevant CIDs. Describe how implementations can use them to determine specification compliance. This section can be skipped if IPIP does not deal with the way IPFS handles content-addressed data, or the modified specification file already includes this information. - ## Design rationale - The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. - Provide evidence of rough consensus and working code within the community, and discuss important objections or concerns raised during discussion. - ### User benefit - How will end users benefit from this work? - ### Compatibility - Explain the upgrade considerations for existing implementations. - ### Security - Explain the security implications/considerations relevant to the proposed change. - ### Alternatives - Describe alternate designs that were considered and related work. ### Copyright From b1ad80287024f0cfb550ca9355fe464c096a6e84 Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 15 Feb 2023 17:00:11 +0100 Subject: [PATCH 38/55] correcting modified template --- IPIP/0000-template.md | 29 ++++++++++++++++++++++++++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/IPIP/0000-template.md b/IPIP/0000-template.md index 3895cc13..6b59247a 100644 --- a/IPIP/0000-template.md +++ b/IPIP/0000-template.md @@ -1,43 +1,66 @@ -# IPIP 0000: InterPlanetary Improvement Proposal Template +# IPIP-0: InterPlanetary Improvement Proposal Template - - Start Date: YYYY-MM-DD - Related Issues: - (add links here) + ## Summary + This is the suggested template for new IPIPs. + ## Motivation + AKA Problem Statement + Clearly explain why the existing protocol specification is inadequate to address the problem that the IPIP solves. + ## Detailed design + AKA Solution Proposal + Describe the proposed solution and list all changes made to the specs repository. + The resulting specification should be detailed enough to allow competing, interoperable implementations. + When modifying an existing specification file, this section should provide a summary of changes. When adding new specification files, list all of them. + ## Test fixtures + List relevant CIDs. Describe how implementations can use them to determine specification compliance. This section can be skipped if IPIP does not deal with the way IPFS handles content-addressed data, or the modified specification file already includes this information. + ## Design rationale + The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. + Provide evidence of rough consensus and working code within the community, and discuss important objections or concerns raised during discussion. + ### User benefit + How will end users benefit from this work? + ### Compatibility + Explain the upgrade considerations for existing implementations. + ### Security + Explain the security implications/considerations relevant to the proposed change. + ### Alternatives + Describe alternate designs that were considered and related work. ### Copyright From c2aabcb86c5096195cda9426628de057f810354a Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 15 Feb 2023 17:04:22 +0100 Subject: [PATCH 39/55] Removed a modified file from pull request --- IPIP/0000-template.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/IPIP/0000-template.md b/IPIP/0000-template.md index 6b59247a..21ff22ff 100644 --- a/IPIP/0000-template.md +++ b/IPIP/0000-template.md @@ -1,7 +1,7 @@ -# IPIP-0: InterPlanetary Improvement Proposal Template +# IPIP 0000: InterPlanetary Improvement Proposal Template - - Start Date: YYYY-MM-DD @@ -65,4 +65,4 @@ Describe alternate designs that were considered and related work. ### Copyright -Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). \ No newline at end of file +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From 35748904a3cac3e9df874da551946a219984ca3b Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Mon, 27 Feb 2023 10:43:59 +0100 Subject: [PATCH 40/55] updated aes-gcm varint --- ...00-double-hash-dht.md => 0373-double-hash-dht.md} | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) rename IPIP/{0000-double-hash-dht.md => 0373-double-hash-dht.md} (98%) diff --git a/IPIP/0000-double-hash-dht.md b/IPIP/0373-double-hash-dht.md similarity index 98% rename from IPIP/0000-double-hash-dht.md rename to IPIP/0373-double-hash-dht.md index e490c7c7..3c60bad2 100644 --- a/IPIP/0000-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -37,7 +37,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting **Magic Values** - bytes("CR_DOUBLEHASH") - bytes("CR_SERVERKEY") -- AES-GCM [varint](https://github.com/multiformats/multicodec/pull/314): `aes-gcm-256 = 0xa501` +- AES-GCM [varint](https://github.com/multiformats/multicodec/pull/314): `aes-gcm-256 = 0x8040` - Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x5601` - A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). - Provider Record Timestamp (`TS`) validity period: `48h` @@ -58,7 +58,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use `ServerKey` to encrypt `TS`, `Signature` and Content Providers `multiaddrs` sent to the Client when some Provider Records match the requested `Prefix`. `ServerKey` is represented as a 32-byte array. - **`TS`** is the [Unix Timestamp](https://en.wikipedia.org/wiki/Unix_time) corresponding content publish time. `TS` is represented as a 32-bit **unsigned** Integer, allowing timestamps to range from `1970-01-01T00:00:00Z` to `2106-02-07T06:28:15Z` before reaching the overflow. - **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. -- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/pull/314) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0xa501`, `Nonce`, `payload_len`, `AESGCM(MH, Nonce, CPPeerID)`]. +- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/pull/314) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0x8040`, `Nonce`, `payload_len`, `AESGCM(MH, Nonce, CPPeerID)`]. - **`Signature`** is the signature of the `EncPeerID` encrypted payload (not including the varint nor the nonce) and `TS` using the Content Provider's private key, either with ed25519 or rsa signature algorithm, depending on the keys of the Content Provider. - **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists on the following fields: [`EncPeerID`, `TS`, `Signature`]. - **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a nested dictionary/map: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. There is only one single correct `ServerKey` for each `HASH2`. However, any peer can forge a valid Publish request (with invalid `EncPeerID` but valid `Signature`) undetected by the DHT Server. The DHT server isn't able to distinguish which `ServerKey` is correct as it doesn't have the knowledge of `MH`, hence it has to keep both and serve both upon request for `HASH2`. @@ -86,7 +86,7 @@ The following process describes the event of a client looking up a CID in the IP 1. Content Provider wants to publish some content with identifier `CID`. 2. Content Provider computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). 3. Content Provider starts a DHT lookup request for the 20 closest `PeerID`s in XOR distance to `HASH2`. -4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = [0xa501, Nonce, payload_len, AESGCM(MH, Nonce, CPPeerID)]` +4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = [0x8040, Nonce, payload_len, AESGCM(MH, Nonce, CPPeerID)]` 5. Content Provider takes the current timestamp `TS`. 6. Content Provider signs `EncPeerID` and `TS` using its private key. `Signature = Sign(privkey, EncPeerID || TS)` 7. Content Provider computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. @@ -107,7 +107,7 @@ sequenceDiagram CP->>DHT: FIND_PEERS(HASH2) DHT->>CP: [PeerID0, PeerID1, ... PeerID19] - Note left of CP: EncPeerID = 0xa501 || Nonce || payload_len || AESGCM(MH, Nonce, CPPeerID) + Note left of CP: EncPeerID = 0x8040 || Nonce || payload_len || AESGCM(MH, Nonce, CPPeerID) Note left of CP: Signature = Sign(privkey, EncPeerID || TS) Note left of CP: ServerKey = SHA256(bytes("CR_SERVERKEY") || MH) @@ -130,7 +130,7 @@ sequenceDiagram 3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. 4. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. 5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || 0xa501 || SERVERNONCE || payload_len || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. +6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || 0x8040 || SERVERNONCE || payload_len || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. The `multiaddrs` are taken from the DHT Server's lib2p2 peerstore, and may be stale. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. 7. The DHT servers send `message` to Client. 8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. 9. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 12. @@ -161,7 +161,7 @@ sequenceDiagram Note right of Server: message += PeerID end loop for each entry matching KeyPrefix in the Provider Store - Note right of Server: EncMetadata = 0xa501 || SERVERNONCE || payload_len ||
AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs) + Note right of Server: EncMetadata = 0x8040 || SERVERNONCE || payload_len ||
AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs) Note right of Server: Aggregate records per HASH2:
message += HASH2 || nb_records || EncPeerID0 || EncMetadata0 || ... || EncMetadataN Note right of Server: Note: don't add multiaddrs nor metadata if not requested with flags end From c575685fc56c966c3944e9f05c04eb43289426ce Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 12:04:21 +0100 Subject: [PATCH 41/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Jorropo --- IPIP/0373-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 3c60bad2..25e7d388 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -328,7 +328,7 @@ As knowledge of the preimage of the requested key isn't necessary in the Double Requesting random keys is necessary for the Kademlia Bucket Refresh Process. On refresh, if a bucket has empty slots, the node will make a request for a random forged key falling in this specific bucket. In the current implementation, as the prefix of a requested key is necessary, Kademlia uses a [list of precomputed preimages](https://github.com/libp2p/go-libp2p-kbucket/blob/master/bucket_prefixmap.go), 1 matching every 15-bits key prefix. Hence, the random forged key, is never random, its definition set is the list of precomputed preimages, and not the full keyspace. This can lead to degraded performance and security vulnerabilities. -Double Hashing enables the nodes to select a _truly_ random key from the Kademlia keyspace (limited by the randomness algorithm) matching the appropriate bucket.The 456KB [list of precomputed preimages](https://github.com/libp2p/go-libp2p-kbucket/blob/master/bucket_prefixmap.go) can be removed from the IPFS source code, once the migration to the Double Hashing DHT is complete. +Double Hashing enables the nodes to select a _truly_ random key from the Kademlia keyspace (limited by the randomness algorithm) matching the appropriate bucket.The 456KB [list of precomputed preimages](https://github.com/libp2p/go-libp2p-kbucket/blob/2e310782ef7bc42d9af3e948fcf21d92b97e56ea/bucket_prefixmap.go) can be removed from the IPFS source code, once the migration to the Double Hashing DHT is complete. ### Simplicity From c188425b6b75ba50fb07246911e27b48956f9101 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 12:14:33 +0100 Subject: [PATCH 42/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Jorropo --- IPIP/0373-double-hash-dht.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 25e7d388..4670a2bd 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -35,8 +35,8 @@ The changes described in this document introduce a DHT privacy upgrade boosting ## Detailed Design **Magic Values** -- bytes("CR_DOUBLEHASH") -- bytes("CR_SERVERKEY") +- bytes("CR_DOUBLEHASH\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00") +- bytes("CR_SERVERKEY\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00") - AES-GCM [varint](https://github.com/multiformats/multicodec/pull/314): `aes-gcm-256 = 0x8040` - Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x5601` - A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). From 6356f58104e0c38dcc5758c00cfaa59ccea5fc6a Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 12:21:11 +0100 Subject: [PATCH 43/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Dennis Trautwein --- IPIP/0373-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 4670a2bd..947da625 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -94,7 +94,7 @@ The following process describes the event of a client looking up a CID in the IP 9. Each DHT server verifies `Signature` against the `PeerID` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is younger than `48h` and isn't in the future. If invalid, send an error to the client. 10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `PeerID` of the Content Provider (see [provider store](#provider-store)). If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. 11. Each DHT server confirms to Content Provider that the Provider Record has been successfully added. -12. The proces is over once Content Provider has received 20 confirmations. +12. The process is over once Content Provider has received 20 confirmations. ```mermaid sequenceDiagram From 60fdd8000272b135dd3a7b254c0ca5c4bd3b89fd Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 13:53:56 +0100 Subject: [PATCH 44/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Jorropo --- IPIP/0373-double-hash-dht.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 947da625..081b0a79 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -35,8 +35,8 @@ The changes described in this document introduce a DHT privacy upgrade boosting ## Detailed Design **Magic Values** -- bytes("CR_DOUBLEHASH\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00") -- bytes("CR_SERVERKEY\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00") +- `SALT_DOUBLEHASH = bytes("CR_DOUBLEHASH\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00")` +- `SALT_SERVERKEY = bytes("CR_SERVERKEY\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00")` - AES-GCM [varint](https://github.com/multiformats/multicodec/pull/314): `aes-gcm-256 = 0x8040` - Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x5601` - A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). From 103ebcd43ef1607202c70598a9f49fcdc29b2b15 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 13:54:25 +0100 Subject: [PATCH 45/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Jorropo --- IPIP/0373-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 081b0a79..f6b36dc0 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -46,7 +46,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **`CID`** is the IPFS [Content IDentifier](https://github.com/multiformats/cid) - **`MH`** is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the digest of a hash function over some content. `MH` is represented as a 32-byte array. -- **`HASH2`** is defined as `SHA256(bytes("CR_DOUBLEHASH") || MH)`. It represents the location of the Kademlia keyspace for the Provider Record associated with `CID`. `HASH2` is represented as a 32-byte array. `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)`. +- **`HASH2`** is defined as `SHA256(SALT_DOUBLEHASH || MH)`. It represents the location of the Kademlia keyspace for the Provider Record associated with `CID`. `HASH2` is represented as a 32-byte array. `HASH2 = SHA256(SALT_DOUBLEHASH || MH)`. - **Content Provider** is the node storing some content, and advertising it to the DHT. - **DHT Servers** are nodes running the IPFS public DHT. In this documents, DHT Servers mostly refer to the DHT Servers storing the Provider Records associated with specific `CID`s, and not the DHT Servers helping routing lookup requests to the right keyspace location. - **Client** is an IPFS client looking up a content identified by an already known `CID`. From 848c2a9aabc43f4ca02bd27155545534a6eb400f Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 13:54:37 +0100 Subject: [PATCH 46/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Jorropo --- IPIP/0373-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index f6b36dc0..4c81681e 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -102,7 +102,7 @@ sequenceDiagram participant DHT participant Server as DHT Server - Note left of CP: HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH) + Note left of CP: HASH2 = SHA256(SALT_DOUBLEHASH || MH) CP->>DHT: FIND_PEERS(HASH2) DHT->>CP: [PeerID0, PeerID1, ... PeerID19] From a9caf145cfb93a2d3dc46964c41988349cb137ec Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 13:54:49 +0100 Subject: [PATCH 47/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Jorropo --- IPIP/0373-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 4c81681e..179a6676 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -124,7 +124,7 @@ sequenceDiagram ``` **Lookup Process** -1. Client computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). +1. Client computes `HASH2 = SHA256(SALT_DOUBLEHASH || MH)` (`MH` is the MultiHash included in the CID). 2. Client selects a prefix of `HASH2`, `KeyPrefix = HASH2[:l]` for a defined `l` (see [`l` selection](#prefix-length-selection)). 2. Client finds the closest `PeerID`s to `HASH2` in XOR distance in its Routing Table. 3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. From 436d066b639e642aa66b51b6741c6cb31dfd45b6 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 13:54:59 +0100 Subject: [PATCH 48/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Jorropo --- IPIP/0373-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 179a6676..312d5783 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -109,7 +109,7 @@ sequenceDiagram Note left of CP: EncPeerID = 0x8040 || Nonce || payload_len || AESGCM(MH, Nonce, CPPeerID) Note left of CP: Signature = Sign(privkey, EncPeerID || TS) - Note left of CP: ServerKey = SHA256(bytes("CR_SERVERKEY") || MH) + Note left of CP: ServerKey = SHA256(SALT_SERVERKEY || MH) par Content Provider to the 20 closest DHT Servers to HASH2 CP->>Server: HASH2 || EncPeerID || TS || Signature || ServerKey From 11525c508b1657034e20b520f10bc5674ee74bf5 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 13:55:28 +0100 Subject: [PATCH 49/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Jorropo --- IPIP/0373-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 312d5783..14525ff0 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -84,7 +84,7 @@ The following process describes the event of a client looking up a CID in the IP **Publish Process** 1. Content Provider wants to publish some content with identifier `CID`. -2. Content Provider computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). +2. Content Provider computes `HASH2 = SHA256(SALT_DOUBLEHASH || MH)` (`MH` is the MultiHash included in the CID). 3. Content Provider starts a DHT lookup request for the 20 closest `PeerID`s in XOR distance to `HASH2`. 4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = [0x8040, Nonce, payload_len, AESGCM(MH, Nonce, CPPeerID)]` 5. Content Provider takes the current timestamp `TS`. From 37aa63daddfff5724ae65d5a42aa6f88cb1d9f70 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 13:55:40 +0100 Subject: [PATCH 50/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Jorropo --- IPIP/0373-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 14525ff0..4f3d8e19 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -89,7 +89,7 @@ The following process describes the event of a client looking up a CID in the IP 4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = [0x8040, Nonce, payload_len, AESGCM(MH, Nonce, CPPeerID)]` 5. Content Provider takes the current timestamp `TS`. 6. Content Provider signs `EncPeerID` and `TS` using its private key. `Signature = Sign(privkey, EncPeerID || TS)` -7. Content Provider computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. +7. Content Provider computes `ServerKey = SHA256(SALT_SERVERKEY || MH)`. 8. Once the lookup request has returned the 20 closest peers, Content Provider sends a Publish request to these DHT servers. The Publish request contains [`HASH2`, `EncPeerID`, `TS`, `Signature`, `ServerKey`]. 9. Each DHT server verifies `Signature` against the `PeerID` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is younger than `48h` and isn't in the future. If invalid, send an error to the client. 10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `PeerID` of the Content Provider (see [provider store](#provider-store)). If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. From 67a3dfd8f5b1964a34f310ce438720faa74d6a16 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 13:55:55 +0100 Subject: [PATCH 51/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Jorropo --- IPIP/0373-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 4f3d8e19..71d2be61 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -55,7 +55,7 @@ The changes described in this document introduce a DHT privacy upgrade boosting - **`PeerID`** s define stable [peer identities](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md). The `PeerID` is derived from the node's cryptographic public key. - **`multiaddrs`** are the [network addresses](https://github.com/libp2p/specs/tree/master/addressing) associated with a `PeerID`. It represents the location(s) of the peer. - **`KeyPrefix`** is defined as a prefix of length `l` bits of `HASH2`. `KeyPrefix` is represented by a `byte` concatenated with a variable sized array of bytes, containing at most 32 bytes. The leading `byte` represents the binary representation of `l - 1`, making prefixes of length `256` possible, but not prefixes of length `0`. The trailing byte array is of length `ceil(l/8)` bytes, and its content is the bits prefix right padded with zeros. -- **`ServerKey`** is defined as `SHA256(bytes("CR_SERVERKEY") || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use `ServerKey` to encrypt `TS`, `Signature` and Content Providers `multiaddrs` sent to the Client when some Provider Records match the requested `Prefix`. `ServerKey` is represented as a 32-byte array. +- **`ServerKey`** is defined as `SHA256(SALT_SERVERKEY || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use `ServerKey` to encrypt `TS`, `Signature` and Content Providers `multiaddrs` sent to the Client when some Provider Records match the requested `Prefix`. `ServerKey` is represented as a 32-byte array. - **`TS`** is the [Unix Timestamp](https://en.wikipedia.org/wiki/Unix_time) corresponding content publish time. `TS` is represented as a 32-bit **unsigned** Integer, allowing timestamps to range from `1970-01-01T00:00:00Z` to `2106-02-07T06:28:15Z` before reaching the overflow. - **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. - **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/pull/314) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0x8040`, `Nonce`, `payload_len`, `AESGCM(MH, Nonce, CPPeerID)`]. From 5325d553bcde342e9943bdb1d203a50331a15464 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 13:56:06 +0100 Subject: [PATCH 52/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Jorropo --- IPIP/0373-double-hash-dht.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 71d2be61..8d2913f2 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -150,8 +150,8 @@ sequenceDiagram participant Server as DHT Server participant CP as Content Provider - Note left of Client: HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH) - Note left of Client: ServerKey = SHA256(bytes("CR_SERVERKEY") || MH) + Note left of Client: HASH2 = SHA256(SALT_DOUBLEHASH || MH) + Note left of Client: ServerKey = SHA256(SALT_SERVERKEY || MH) loop in parallel until valid Provider Record found Note left of Client: KeyPrefix = HASH2[:l] From 0da90106a2ac73045f1b6d033ac8a6f486e243e0 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Tue, 28 Feb 2023 13:56:22 +0100 Subject: [PATCH 53/55] Update IPIP/0373-double-hash-dht.md Co-authored-by: Jorropo --- IPIP/0373-double-hash-dht.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 8d2913f2..9d6f6c79 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -132,7 +132,7 @@ sequenceDiagram 5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. 6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || 0x8040 || SERVERNONCE || payload_len || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. The `multiaddrs` are taken from the DHT Server's lib2p2 peerstore, and may be stale. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. 7. The DHT servers send `message` to Client. -8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. +8. Client computes `ServerKey = SHA256(SALT_SERVERKEY || MH)`. 9. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 12. 10. If the DHT Server's `MatchLimit` and number of matching `HASH2`s was included in the `message`, Client makes multiple DHT lookup requests for longer prefixes (e.g `KeyPrefix||0` and `KeyPrefix||1`). Else Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. 11. Go to 4. From 08aafde168be036839c6ec788dace2055d3f6e5f Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 1 Mar 2023 08:49:56 +0100 Subject: [PATCH 54/55] modifying enum numbers in algorithm --- IPIP/0373-double-hash-dht.md | 36 +++++++++++++++++------------------- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 3c60bad2..0e0520f0 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -77,8 +77,6 @@ The following process describes the event of a client looking up a CID in the IP 9. Go to 4. 10. The DHT servers storing the Provider Record(s) associated with `MH` send them to Client. (Currently, if a Provider Record has been published less than 30 min before being requested, the DHT servers also send the `multiaddresses` of the Content Provider to Client). 11. If the response from the DHT server doesn't include the `multiaddrs` associated with the Content Providers' `PeerID`s, Client performs a DHT `FindPeer` request to find the `multiaddrs` of the returned `PeerID`s. -12. Client sends a Bitswap request for `CID` to the Content Provider (known `PeerID` and `multiaddrs`). -13. Content Provider sends the requested content back to Client. ### Double Hash DHT design @@ -126,23 +124,23 @@ sequenceDiagram **Lookup Process** 1. Client computes `HASH2 = SHA256(bytes("CR_DOUBLEHASH") || MH)` (`MH` is the MultiHash included in the CID). 2. Client selects a prefix of `HASH2`, `KeyPrefix = HASH2[:l]` for a defined `l` (see [`l` selection](#prefix-length-selection)). -2. Client finds the closest `PeerID`s to `HASH2` in XOR distance in its Routing Table. -3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. -4. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. -5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || 0x8040 || SERVERNONCE || payload_len || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. The `multiaddrs` are taken from the DHT Server's lib2p2 peerstore, and may be stale. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. -7. The DHT servers send `message` to Client. -8. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. -9. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 12. -10. If the DHT Server's `MatchLimit` and number of matching `HASH2`s was included in the `message`, Client makes multiple DHT lookup requests for longer prefixes (e.g `KeyPrefix||0` and `KeyPrefix||1`). Else Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. -11. Go to 4. -12. For each decrypted payload, Client decrypts `CPPeerID = Dec(MH, EncPeerID)`. -13. Client verifies that `Signature` verifies with `CPPeerID`: `Verify(CPPeerID, Signature, EncPeerID || TS)`. -14. Client checks that `TS` is younger than `48h`. -15. If none of the decrypted payloads is valid, go to 4. -16. If the decrypted payload doesn't include the `multiaddrs` associated with `CPPeerID`, Client performs a DHT `FindPeer` request to find the `multiaddrs` associated with `CPPeerID`. -17. Client sends a Bitswap request for `CID` to the Content Provider (known `CPPeerID` and `multiaddrs`). -18. Content Provider sends the requested content back to Client. +3. Client finds the closest `PeerID`s to `HASH2` in XOR distance in its Routing Table. +4. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. +5. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. +6. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. +7. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || 0x8040 || SERVERNONCE || payload_len || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. The `multiaddrs` are taken from the DHT Server's lib2p2 peerstore, and may be stale. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. +8. The DHT servers send `message` to Client. +9. Client computes `ServerKey = SHA256(bytes("CR_SERVERKEY") || MH)`. +10. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 13. +11. If the DHT Server's `MatchLimit` and number of matching `HASH2`s was included in the `message`, Client makes multiple DHT lookup requests for longer prefixes (e.g `KeyPrefix||0` and `KeyPrefix||1`). Else Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. +12. Go to 5. +13. For each decrypted payload, Client decrypts `CPPeerID = Dec(MH, EncPeerID)`. +14. Client verifies that `Signature` verifies with `CPPeerID`: `Verify(CPPeerID, Signature, EncPeerID || TS)`. +15. Client checks that `TS` is younger than `48h`. +16. If none of the decrypted payloads is valid, go to 5. +17. If the decrypted payload doesn't include the `multiaddrs` associated with `CPPeerID`, Client performs a DHT `FindPeer` request to find the `multiaddrs` associated with `CPPeerID`. +18. Client requests `CID` or another content identifier to the Content Provider (known `CPPeerID` and `multiaddrs`) and can exchange data (the DHT may be consumed by various different protocols). + ```mermaid sequenceDiagram From 79a915fe252dd8ddd5a8c6d923dc00a8c47ccd0a Mon Sep 17 00:00:00 2001 From: guillaumemichel Date: Wed, 1 Mar 2023 18:49:45 +0100 Subject: [PATCH 55/55] data format update --- IPIP/0373-double-hash-dht.md | 197 +++++++++++++++++++++-------------- 1 file changed, 117 insertions(+), 80 deletions(-) diff --git a/IPIP/0373-double-hash-dht.md b/IPIP/0373-double-hash-dht.md index 73ef4174..5fc6033d 100644 --- a/IPIP/0373-double-hash-dht.md +++ b/IPIP/0373-double-hash-dht.md @@ -35,34 +35,103 @@ The changes described in this document introduce a DHT privacy upgrade boosting ## Detailed Design **Magic Values** -- `SALT_DOUBLEHASH = bytes("CR_DOUBLEHASH\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00")` -- `SALT_SERVERKEY = bytes("CR_SERVERKEY\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00")` -- AES-GCM [varint](https://github.com/multiformats/multicodec/pull/314): `aes-gcm-256 = 0x8040` -- Double SHA256 [varint](https://github.com/multiformats/multicodec/blob/master/table.csv#L41): `dbl-sha2-256 = 0x5601` + - A DHT Server returns all of the Provider Records matching to at most **`MatchLimit = 64`** distinct `HASH2`. Magic number explanation in [k-anonymity](#k-anonymity). - Provider Record Timestamp (`TS`) validity period: `48h` +- AES-GCM [varint](https://github.com/multiformats/multicodec/pull/314): `aes-gcm-256 = 0x8040` +- DHT Provide Format v0 [varint](https://github.com/multiformats/multicodec) (TBD): `dht-provide-format-v0 = 0x????` +- DHT Provider Record Format v0 [varint](https://github.com/multiformats/multicodec) (TBD): `dht-pr-format-v0 = 0x????` +- DHT Prefix Lookup Response Format v0 [varint](https://github.com/multiformats/multicodec) (TBD): `dht-prefix-lookup-response-format-v0 = 0x????` + + +All salts below are 64-bytes long, and represent a string padded with `\x00`. +- `SALT_DOUBLEHASH = bytes("CR_DOUBLEHASH\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00")` +- `SALT_ENCRYPTIONKEY = bytes("CR_ENCRYPTIONKEY\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00")` +- `SALT_SERVERKEY = bytes("CR_SERVERKEY\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00")` + ### Definitions - **`CID`** is the IPFS [Content IDentifier](https://github.com/multiformats/cid) - **`MH`** is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the digest of a hash function over some content. `MH` is represented as a 32-byte array. -- **`HASH2`** is defined as `SHA256(SALT_DOUBLEHASH || MH)`. It represents the location of the Kademlia keyspace for the Provider Record associated with `CID`. `HASH2` is represented as a 32-byte array. `HASH2 = SHA256(SALT_DOUBLEHASH || MH)`. +- **`HASH2`** represents the location of the Kademlia keyspace for the Provider Record associated with `CID`. `HASH2` is represented as a 32-byte array. `HASH2 = SHA256(SALT_DOUBLEHASH || MH)`. - **Content Provider** is the node storing some content, and advertising it to the DHT. - **DHT Servers** are nodes running the IPFS public DHT. In this documents, DHT Servers mostly refer to the DHT Servers storing the Provider Records associated with specific `CID`s, and not the DHT Servers helping routing lookup requests to the right keyspace location. -- **Client** is an IPFS client looking up a content identified by an already known `CID`. +- **Client** is an IPFS client looking up a content identified by a known `CID`. - **Publish Process** is the process of the Content Provider communicating to the DHT Servers that it provides some content identified by `CID`. - **Lookup Process** is the process of the Client retrieving the content identified by `CID`. - **`PeerID`** s define stable [peer identities](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md). The `PeerID` is derived from the node's cryptographic public key. - **`multiaddrs`** are the [network addresses](https://github.com/libp2p/specs/tree/master/addressing) associated with a `PeerID`. It represents the location(s) of the peer. - **`KeyPrefix`** is defined as a prefix of length `l` bits of `HASH2`. `KeyPrefix` is represented by a `byte` concatenated with a variable sized array of bytes, containing at most 32 bytes. The leading `byte` represents the binary representation of `l - 1`, making prefixes of length `256` possible, but not prefixes of length `0`. The trailing byte array is of length `ceil(l/8)` bytes, and its content is the bits prefix right padded with zeros. -- **`ServerKey`** is defined as `SHA256(SALT_SERVERKEY || MH)`. It is derived from the `MH`. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use `ServerKey` to encrypt `TS`, `Signature` and Content Providers `multiaddrs` sent to the Client when some Provider Records match the requested `Prefix`. `ServerKey` is represented as a 32-byte array. -- **`TS`** is the [Unix Timestamp](https://en.wikipedia.org/wiki/Unix_time) corresponding content publish time. `TS` is represented as a 32-bit **unsigned** Integer, allowing timestamps to range from `1970-01-01T00:00:00Z` to `2106-02-07T06:28:15Z` before reaching the overflow. +- **`EncryptionKey`** is defined as `SHA256(SALT_ENCRYPTIONKEY || MH)`. It is derived from `MH` and is represented as a 32-byte array. `EncryptionKey` is used by the Content Provider to produce `EncPeerID`, making sure only a Client with the knowledge of `MH` can decrypt it. +- **`ServerKey`** is defined as `SHA256(SALT_SERVERKEY || MH)`. It is derived from `MH` and is represented as a 32-byte array. The Content Provider communicates `ServerKey` to the DHT Servers during the Publish Process. The DHT Servers use `ServerKey` to encrypt `Signature` and Content Providers `multiaddrs` sent to the Client when some Provider Records match the requested `Prefix`. +- **`TS`** is a 32-bit unsigned Integer Timestamp representing the number of minutes elapsed since `1970-01-01T00:00:00Z`, allowing timestamps to be used until `10136-02-16T04:16:00Z`. `TS` is determined by the Content Provider when (re)publishing a Provider Record. Note that `TS` is embedded in `Nonce`. +- **`Nonce`** is a 12-byte Nonce used as Initialization Vector (IV) for the AES-GCM encryption. `Nonce` is composed of `TS` concatenated with `8` random bytes. `Nonce` **must** be unique for each AES-GCM encryption performed with `EncryptionKey` for all Content Providers. +- **`ServerNonce`** is a 12-bytes nonce generated by the DHT Server, used when encrypting with `ServerKey`. The first 32 bits represent the time (in minutes) at which the nonce was generated, and the 8 following bytes are determined at random. A timestamp isn't _needed_ here, but it prevents repeating nonces. - **`CPPeerID`** is the `PeerID` of the Content Provider for a specific `CID`. -- **`EncPeerID`** is the result of the encryption of `CPPeerID` using `MH` as encryption key and a random nonce `AESGCM(MH, Nonce, CPPeerID)`. `EncPeerID` contains the [varint](https://github.com/multiformats/multicodec/pull/314) of the encryption algorithm used (AES), the bytes array of the encrypted payload, and the `Nonce`. `Nonce` is a randomly generated 12-byte array. The format of `EncPeerID` is [`0x8040`, `Nonce`, `payload_len`, `AESGCM(MH, Nonce, CPPeerID)`]. -- **`Signature`** is the signature of the `EncPeerID` encrypted payload (not including the varint nor the nonce) and `TS` using the Content Provider's private key, either with ed25519 or rsa signature algorithm, depending on the keys of the Content Provider. +- **`EncPeerIDPayload`** is the result of the AES-GCM encryption of `CPPeerID` using `EncryptionKey` as encryption key and `Nonce` as IV. `EncPeerIDPayload = AESGCM(EncryptionKey, Nonce, CPPeerID)`. +- **`EncPeerID`** contains all data associated with `EncPeerIDPayload`. `EncPeerID` is composed of the [varint](https://github.com/multiformats/multicodec/pull/314) of the encryption algorithm used (AES-GCM), the length of `EncPeerIDPayload`, `Nonce` and `EncPeerIDPayload`. The format of `EncPeerID` is [`0x8040`, `payload_len`, `Nonce`, `EncPeerIDPayload`]. +- **`Signature`** is the signature of `EncPeerIDPayload` and `TS` using the Content Provider's private key, using the Signature algorithm defined by the Content Provider's [PeerID key type](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#key-types). +- **`EncdMetadataPayload`** is the result of the AES-GCM encryption of `Signature || multiaddrs` using the `ServerKey` and the `ServerNonce` performed by the DHT Server. An `EncMetadataPayload` is always associated with an `EncPeerID`, and encrypts the `Signature` associated with the `EncPeerID` and the `multiaddrs` (taken from the DHT Server libp2p Peerstore) associated with the `CPPeerID` that published `EncPeerID` to the DHT Server. +- **`EncMetadata`** contains all data associated with `EncMetadataPayload`. `EncMetadata` is composed of the length of `EncMetadataPayload` , the `ServerNonce` and `EncMetadataPayload`, and has the following format `EncMetadata = payload_len || ServerNonce || EncMetadataPayload`. This format is defined by `dht-pr-format-v0`. +- **`PRShortIdentifier`** is a uniquely and minimally identifying multiple `HASH2`s matching a `KeyPrefix`. Its format is defined [below](#wire-formats). - **Provider Record** is defined as a pointer to the storage location of some content identified by `CID` or `HASH2`. A Provider Record consists on the following fields: [`EncPeerID`, `TS`, `Signature`]. -- **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a nested dictionary/map: `HASH2` -> `ServerKey` -> [`CPPeerID`, `EncPeerID`, `TS`, `Signature`]. There is only one single correct `ServerKey` for each `HASH2`. However, any peer can forge a valid Publish request (with invalid `EncPeerID` but valid `Signature`) undetected by the DHT Server. The DHT server isn't able to distinguish which `ServerKey` is correct as it doesn't have the knowledge of `MH`, hence it has to keep both and serve both upon request for `HASH2`. +- **Provider Store** is the data structure on the DHT Servers used to store the Provider Records. Its structure is a [nested key-value store](#provider-store): `HASH2` -> `ServerKey` -> `CPPeerID` -> Provider Record. + +### Wire formats + +**Content Provider Publishes to DHT Server** +``` +[dht-provide-format-v0, HASH2, EncPeerID, Signature, ServerKey] +``` + + +**`PRShortIdentifier`** + +By design multiple `HASH2` match a `KeyPrefix`, and the DHT Server returns all Provider Records whose `HASH2` matches `KeyPrefix`. The Provider Records must be identifiable by the Client, so that it doesn't need to decrypt all Provider Records before finding the one it is looking for. However, transmitting all 32-byte `HASH2`s would be too expensive. Instead, for each `HASH2` the DHT Server only sends a short prefix (`ShortIdentifier`) uniquely identifying `HASH2` for each `HASH2` matching `KeyPrefix`. Each `HASH2` matching `KeyPrefix` has the following format: `KeyPrefix || ShortIdentifier || Suffix`. `ShortIdentifier` is the shortest bitstring uniquely identifying `HASH2` among a group of `HASH2` all matching a given `KeyPrefix`. +For instance, Client is making a lookup request for `KeyPrefix=001`, and the DHT Server is storing Provider Records for 3 distinct `HASH2`s matching `KeyPrefix`: `00101111`, `00110010` and `00110111`. The `ShortIdentifier`s will respectively be `0`, `100` and `101`. A Client looking for `HASH2 = 00110010` only tries to decrypt the Provider Records associated with the only `ShortIdentifier` matching `HASH2`. + +These bitstrings `ShortIdentifier`s are encoded to unsigned varints as described in [`py-binary-trie`](https://github.com/guillaumemichel/py-binary-trie/#encoding). `PRShortIdentifier` is the unsigned varint encoding of `ShortIdentifier`. + +**Provider Record** +``` +[dht-pr-format-v0, EncPeerID, payload_len, ServerNonce, EncMetadata] +``` + +**DHT Server response to Client Prefix Lookup** +``` +[ + dht-prefix-lookup-response-format-v0, + flags, (+potential_dependencies,) + #closest peers to KeyPrefix, + PeerID0, + #multiaddrs of PeerID0, + PeerID0multiaddr0, + PeerID0multiaddr1, + ... + PeerID0multiaddrN, + PeerID1, + ... + PRShortIdentifier0, + #PR, + PR0, + PR1, + ..., + PRN, + PRShortIdentifier1, + ... +] +``` +- `flags` is a bytes with capacity for 8 binary flags. The left-most bit is reserved for the `MatchLimit` flag. When set to `1`, the DHT Server doesn't send any Provider Record, and adds the unsigned varint of the number of `HASH2` matching `KeyPrefix` and the unsigned varint of its `MatchLimit` variable to communicate it to Client. When the left-most bit is set to `0`, no extra information is added after the `flags` byte. +- '`#`' represent numbers, encoded in [unsigned varint](https://github.com/multiformats/unsigned-varint) format. +- `#closest peers to KeyPrefix` is the number of closest peers to `KeyPrefix` that the DHT Server is returning. We expect this number always to be equal to `20`, unless a peer has less than `20` peers in its routing table. +- For each of the closest peers, we include the `PeerID` and its `multiaddrs`. As peers can have multiple `multiaddrs`, it is important to add the number of `multiaddrs` (`#multiaddrs`) for each peer. +- each `PRShortIdentifier` can be associated with multiple Provider Records (`PR`). When listing the Provider Records associated with a `PRShortIdentifier` it is important to indicate the count of associated Provider Records (`#PR`). +- `PR0`, `PR1`, `...` are Provider Records formatted as described above. + + + ### Double Hash DHT design **Publish Process** 1. Content Provider wants to publish some content with identifier `CID`. 2. Content Provider computes `HASH2 = SHA256(SALT_DOUBLEHASH || MH)` (`MH` is the MultiHash included in the CID). -3. Content Provider starts a DHT lookup request for the 20 closest `PeerID`s in XOR distance to `HASH2`. -4. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `MH`, using AES-GCM. `EncPeerID = [0x8040, Nonce, payload_len, AESGCM(MH, Nonce, CPPeerID)]` -5. Content Provider takes the current timestamp `TS`. -6. Content Provider signs `EncPeerID` and `TS` using its private key. `Signature = Sign(privkey, EncPeerID || TS)` -7. Content Provider computes `ServerKey = SHA256(SALT_SERVERKEY || MH)`. -8. Once the lookup request has returned the 20 closest peers, Content Provider sends a Publish request to these DHT servers. The Publish request contains [`HASH2`, `EncPeerID`, `TS`, `Signature`, `ServerKey`]. -9. Each DHT server verifies `Signature` against the `PeerID` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is younger than `48h` and isn't in the future. If invalid, send an error to the client. -10. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `PeerID` of the Content Provider (see [provider store](#provider-store)). If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. -11. Each DHT server confirms to Content Provider that the Provider Record has been successfully added. -12. The process is over once Content Provider has received 20 confirmations. - +3. Content Provider starts a DHT `GET_CLOSEST_PEERS(HASH2)` request to find the 20 closest `PeerID`s in XOR distance to `HASH2`. +4. Content Provider computes `EncryptionKey = SHA256(SALT_ENCRYPTIONKEY || MH)` +5. Content Provider computes `ServerKey = SHA256(SALT_SERVERKEY || MH)`. +6. Content Provider takes the current timestamp `TS`, and generate `Nonce`. +7. Content Provider encrypts its own `PeerID` (`CPPeerID`) with `EncryptionKey`, using AES-GCM. `EncPeerID = [0x8040, payload_len, Nonce, AESGCM(EncryptionKey, Nonce, CPPeerID)]` +8. Content Provider signs `EncPeerID` and `TS` using its private key. `Signature = Sign(privkey, EncPeerID || TS)` +9. Once the lookup request has returned the 20 closest peers, Content Provider sends a Publish request to these DHT servers. The Publish request contains [`HASH2`, `EncPeerID`, `Signature`, `ServerKey`]. +10. Each DHT server verifies `Signature` against the `PeerID` of the Content Provider used to open the libp2p connection. `Verify(CPPeerID, Signature, EncPeerID || TS)`. It verifies that `TS` is younger than `48h` and isn't in the future. If invalid, go to 12. +11. Each DHT server adds an entry in their Provider Store for `HASH2` -> `ServerKey` -> `CPPeerID` -> [`EncPeerID`, `TS`, `Signature`], with `CPPeerID` being the `PeerID` of the Content Provider (see [provider store](#provider-store)). If there is already an entry including `CPPeerID` for `HASH2` -> `ServerKey`, and if the `TS` of the new valid entry is newer than the existing `TS`, overwrite the entry in the Provider Store. Else drop the new entry. +12. Each DHT server confirms to Content Provider that the Provider Record has been successfully added, or sends an error message. + +Note: Data formats simplified, please refer to the [definitions](#definitions) and [formats](#wire-formats) sections for exact data format. ```mermaid sequenceDiagram participant CP as Content Provider participant DHT participant Server as DHT Server - Note left of CP: HASH2 = SHA256(SALT_DOUBLEHASH || MH) + Note left of CP: HASH2 = SHA256(SALT_DOUBLEHASH || MH)
ServerKey = SHA256(SALT_SERVERKEY || MH)
EncryptionKey = SHA256(SALT_ENCRYPTIONKEY || MH) CP->>DHT: FIND_PEERS(HASH2) DHT->>CP: [PeerID0, PeerID1, ... PeerID19] - Note left of CP: EncPeerID = 0x8040 || Nonce || payload_len || AESGCM(MH, Nonce, CPPeerID) + Note left of CP: Record 32-bits Timestamp (TS) in minutes
Nonce = TS || 8 randbytes + Note left of CP: EncPeerID = [0x8040, payload_len, Nonce, AESGCM(EncryptionKey, Nonce, CPPeerID)] Note left of CP: Signature = Sign(privkey, EncPeerID || TS) - Note left of CP: ServerKey = SHA256(SALT_SERVERKEY || MH) par Content Provider to the 20 closest DHT Servers to HASH2 - CP->>Server: HASH2 || EncPeerID || TS || Signature || ServerKey + CP->>Server: [dht-provide-format-v0, HASH2, EncPeerID, Signature, ServerKey] - Note right of Server: Verify(pubkey, Signature, EncPeerID) &&
TS - time.now() < 48h - Note right of Server: On success, add to Provider Store:
HASH2 -> ServerKey -> CPPeerID -> [EncPeerID, TS, Signature] + Note right of Server: Verify(pubkey, Signature, EncPeerID) &&
TS - time.now() < 48h + Note right of Server: On success, add to Provider Store:
HASH2 -> ServerKey -> CPPeerID -> [EncPeerID, TS, Signature] - Server->>CP: Success / Error + Server->>CP: Success / Error end - - Note left of CP: Wait for 20 Successes ``` **Lookup Process** 1. Client computes `HASH2 = SHA256(SALT_DOUBLEHASH || MH)` (`MH` is the MultiHash included in the CID). 2. Client selects a prefix of `HASH2`, `KeyPrefix = HASH2[:l]` for a defined `l` (see [`l` selection](#prefix-length-selection)). -<<<<<<< HEAD 3. Client finds the closest `PeerID`s to `HASH2` in XOR distance in its Routing Table. 4. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. -5. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. -6. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -7. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || 0x8040 || SERVERNONCE || payload_len || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. The `multiaddrs` are taken from the DHT Server's lib2p2 peerstore, and may be stale. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. +5. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance from their routing table (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated `multiaddrs` to a `message` that will be returned to Client. +6. The DHT servers search if there are entries `HASH2` matching `KeyPrefix` in their Provider Store. +7. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` all the Provider Records whose `HASH2` match `KeyPrefix` (see [format](#wire-formats)). If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, sets the `MatchLimit` flag to `1` and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. 8. The DHT servers send `message` to Client. -9. Client computes `ServerKey = SHA256(SALT_SERVERKEY || MH)`. -10. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 13. -11. If the DHT Server's `MatchLimit` and number of matching `HASH2`s was included in the `message`, Client makes multiple DHT lookup requests for longer prefixes (e.g `KeyPrefix||0` and `KeyPrefix||1`). Else Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. -12. Go to 5. -13. For each decrypted payload, Client decrypts `CPPeerID = Dec(MH, EncPeerID)`. +9. Client computes `ServerKey = SHA256(SALT_SERVERKEY || MH)` and `EncryptionKey = SHA256(SALT_ENCRYPTIONKEY || MH)`. +10. If the `MatchLimit` flag is set to `1`, Client makes multiple DHT lookup requests for longer prefixes (e.g `KeyPrefix||0` and `KeyPrefix||1`). Go to 5. +11. Client tries to decrypt all `EncPeerID` whose `PRShortIdentifier` is the longest prefix with `HASH2` (see [format](#wire-formats)) among all returned Provider Records. If the repsones didn't contain any Provider Record, or none of them was matching `HASH2`, send a new DHT lookup request for `KeyPrefix` to closer peers in XOR distance to `HASH2` returned by the DHT Server. Go to 5. +13. Client now has `CPPeerID = Dec(EncPeerID)`, `TS = Nonce[:4]` and `Signature, multiaddrs = Dec(EncMetadata)` 14. Client verifies that `Signature` verifies with `CPPeerID`: `Verify(CPPeerID, Signature, EncPeerID || TS)`. -15. Client checks that `TS` is younger than `48h`. -16. If none of the decrypted payloads is valid, go to 5. -17. If the decrypted payload doesn't include the `multiaddrs` associated with `CPPeerID`, Client performs a DHT `FindPeer` request to find the `multiaddrs` associated with `CPPeerID`. +15. Client checks that `TS` is younger than `48h`, and not in the future. +16. If none of the Provider Records is valid, send a new DHT lookup request for `KeyPrefix` to closer peers in XOR distance to `HASH2` returned by the DHT Server, go to 5. +17. If the `EncMetadata` doesn't include `multiaddrs` associated with `CPPeerID`, Client performs a DHT `FindPeer` request to find the `multiaddrs` associated with `CPPeerID`. 18. Client requests `CID` or another content identifier to the Content Provider (known `CPPeerID` and `multiaddrs`) and can exchange data (the DHT may be consumed by various different protocols). -======= -2. Client finds the closest `PeerID`s to `HASH2` in XOR distance in its Routing Table. -3. Client sends a DHT lookup request for `KeyPrefix` to these DHT servers. The request contains a flag to specify whether Client wants the `multiaddrs` associated with the `CPPeerID` or not. -4. The DHT servers find the 20 closest `PeerID`s to `KeyPrefix` in XOR distance (see [algorithm](#closest-keys-to-a-key-prefix)). Add these `PeerID`s and their associated multiaddresses (if applicable) to the `message` that will be returned to Client. -5. The DHT servers search if there are entries matching `KeyPrefix` in their Provider Store. -6. For all entries `HASH2` of the Provider Store where `HASH2[:len(KeyPrefix)]==KeyPrefix`, add to `message` the following encrypted payload: `EncPeerID || 0x8040 || SERVERNONCE || payload_len || AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs)`, `SERVERNONCE` being a randomly generated 12-byte array, for `multiaddrs` being the multiaddresses associated with `CPPeerID` (if applicable) if the `multiaddrs` were requested by Client. The `multiaddrs` are taken from the DHT Server's lib2p2 peerstore, and may be stale. If more than `MatchLimit` distinct `HASH2`s match the requested `KeyPrefix`, the DHT Server doesn't return any Provider Record, and adds the number of `HASH2` matching `KeyPrefix` along with its own `MatchLimit` to `message`. -7. The DHT servers send `message` to Client. -8. Client computes `ServerKey = SHA256(SALT_SERVERKEY || MH)`. -9. Client tries to decrypt all returned encrypted payloads using `MH` for `EncPeerID` and `ServerKey` for `Enc(ServerKey, TS || Signature || multiaddrs)`. If at least one encrypted payload can be decrypted, go to 12. -10. If the DHT Server's `MatchLimit` and number of matching `HASH2`s was included in the `message`, Client makes multiple DHT lookup requests for longer prefixes (e.g `KeyPrefix||0` and `KeyPrefix||1`). Else Client sends a DHT lookup request for `KeyPrefix` to the closest peers in XOR distance to `HASH2` that it received from the DHT servers. -11. Go to 4. -12. For each decrypted payload, Client decrypts `CPPeerID = Dec(MH, EncPeerID)`. -13. Client verifies that `Signature` verifies with `CPPeerID`: `Verify(CPPeerID, Signature, EncPeerID || TS)`. -14. Client checks that `TS` is younger than `48h`. -15. If none of the decrypted payloads is valid, go to 4. -16. If the decrypted payload doesn't include the `multiaddrs` associated with `CPPeerID`, Client performs a DHT `FindPeer` request to find the `multiaddrs` associated with `CPPeerID`. -17. Client sends a Bitswap request for `CID` to the Content Provider (known `CPPeerID` and `multiaddrs`). -18. Content Provider sends the requested content back to Client. ->>>>>>> 0da90106a2ac73045f1b6d033ac8a6f486e243e0 - +Note: Data formats simplified, please refer to the [definitions](#definitions) and [formats](#wire-formats) sections for exact data format. ```mermaid sequenceDiagram participant Client participant Server as DHT Server participant CP as Content Provider - Note left of Client: HASH2 = SHA256(SALT_DOUBLEHASH || MH) - Note left of Client: ServerKey = SHA256(SALT_SERVERKEY || MH) + Note left of Client: HASH2 = SHA256(SALT_DOUBLEHASH || MH)
ServerKey = SHA256(SALT_SERVERKEY || MH)
EncryptionKey = SHA256(SALT_ENCRYPTIONKEY || MH) loop in parallel until valid Provider Record found Note left of Client: KeyPrefix = HASH2[:l] Client->>Server: FIND_CONTENT(KeyPrefix)
Optional flags: multiaddrs, metadata Note right of Server: message = [] loop for each of the 20 closest PeerIDs to KeyPrefix in the Routing Table - Note right of Server: message += PeerID + Note right of Server: add PeerID and its multiaddrs to message end + Note right of Server: Note: If more than MatchLimit HASH2 entries match KeyPrefix, set MatchLimit flag to 1 loop for each entry matching KeyPrefix in the Provider Store - Note right of Server: EncMetadata = 0x8040 || SERVERNONCE || payload_len ||
AESGCM(ServerKey, SERVERNONCE, TS || Signature || multiaddrs) - Note right of Server: Aggregate records per HASH2:
message += HASH2 || nb_records || EncPeerID0 || EncMetadata0 || ... || EncMetadataN + Note right of Server: add encrypted Provider Records and associated multiaddrs to message Note right of Server: Note: don't add multiaddrs nor metadata if not requested with flags end - Note right of Server: Note: If there are more than MatchLimit entries matching KeyPrefix, drop all records and
message += "MatchLimit = MatchLimit" Server->>Client: message loop for all records matching HASH2 - Note left of Client: CPPeerID = Dec(MH, EncPeerID) - Note left of Client: TS || Signature || multiaddrs = Dec(ServerKey, EncMetadata) + Note left of Client: CPPeerID = Dec(EncryptionKey, EncPeerID)
TS = Nonce[:4]
Signature || multiaddrs = Dec(ServerKey, EncMetadata) Note left of Client: Verify(CPPeerID, Signature, EncPeerID) end Note left of Client: If at least 1 record is valid exit the loop @@ -198,7 +243,7 @@ sequenceDiagram end - Client->>CP: Bitswap request for CID + Client->>CP: Data request for CID CP->>Client: Content ``` @@ -409,14 +454,6 @@ Mixnets references: - [Nym Mixnet](https://nymtech.net/) - https://github.com/ipfs/notes/issues/37 - -## Open Questions - -- If we plan to move to using SHA3 instead of SHA2 to generate 256-bits digests, this migration is the perfect opportunity, as we will be breaking everything anyways. SHA3 was proved to be more secure against Length Extension Attacks. It has not be proven whether SHA2 or SHA3 is more collision resistant and secure against preimage attacks. See this [comparison](https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions). -- Is it wise to encrypt the `CPPeerID` using `MH` directly? It would be possible to derive another identifier from `MH` (such as `Hash("SOME_CONSTANT" || MH)`). `MH` is the master identifier of the content, hence if it is revealed all other identifiers can trivially be found. However, it is computationally impossible to recover `MH` from `Hash("SOME_CONSTANT" || MH)`. -- It may be fine to use `TS` as Nonce/IV for the Provider Record encryption (`EncPeerID = AESGCM(MH, Nonce, CPPeerID)`), it spares bytes on the wire. If `TS` is the number of milli- or nano-seconds that have passed since `1970-01-01T00:00:00`, this number easily fits in the 12 bytes IV. Moreover it is very unlikely that 2 nodes perform an encryption using the same key (for the same content) at the exact same milli- or nano-second. Using TS as nonce would spare 4 bytes (`TS` size) on the wire when publishing content to the DHT, and 4 bytes for each Provider Record matching `Prefix` for all requests. However the information about when the Provider Record was published (already known to the DHT Servers storing the Provider Record) would be publicly available. Anyone enumerating DHT Provider Records would be able to read it. -- As multiple `HASH2` match each `Prefix` and the Client is only interested in a single one, should we send the `HASH2` along with each encrypted provider record (network load overhead) or let the Client try to decrypt all payloads and see for themselves which one opens (cpu overhead)? - ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).