Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambient peer discovery protocol #590

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
109 changes: 109 additions & 0 deletions ambient-peer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Ambient peer discovery

| Lifecycle Stage | Maturity | Status | Latest Revision |
|-----------------|---------------|--------|-----------------|
| 1A | Working Draft | Active | r0, 2023-10-16 |

Authors: [@thomaseizinger]

Interest Group: <!-- Please add yourself here. -->

[@thomaseizinger]: https://github.com/thomaseizinger

See the [lifecycle document][lifecycle-spec] for context about the maturity level and spec status.

[lifecycle-spec]: https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md

## Table of Contents

<!-- TODO -->

## Overview

The ambient peer discovery protocol allows peers to share some of their ambient peers with each other.

## Usecase

Ambient peer discovery is most useful when a node either starts with or is left with a few or perhaps only a single connection.

For example, a user may start a libp2p web app and enter another browser's relayed `/webrtc` address.
The connection will succeed but because both nodes are browsers, further discovery of nodes via e.g. kademlia is not possible.
Ambient peer discovery allows the web app to inquire for further nodes from the new connection.

The protocol is designed to compliment other discovery mechanism like kademlia.
It features a very small resource footprint and can thus also be used by lite-clients within browser or mobile environments.

## Protocol

1. Node _A_ opens a new stream to node _B_ with the protocol name `/libp2p/ambient-peers`.
1. Node _B_ chooses a subset of at most 5 known peer records received from other peers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Node _B_ chooses a subset of at most 5 known peer records received from other peers.
1. Node _B_ MUST choose a subset of known peer records received from other peers. Node _B_ SHOULD limit the subset to a maximum of 5 peers.

We shouldn't enforce number of peers returned in the spec, but provide a recommendation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how do you know, if the other side configured a different number? Unless we make this a runtime parameter of the protocol, I think enforcing a number is easier. A node can always run the same protocol again to get more peers.

1. The chosen peer records SHOULD at least have one address that share the same transport technology as the the connection between node _A_ and node _B_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the supported transports explicit seems like a strictly superior way.

For example, if node _A_ and node _B_ are connected via WebRTC, node _B_ SHOULD select 5 peer records where each one of them has at least one WebRTC address.
1. Node _B_ SHOULD NOT be currently connected to any of these nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is phrased in a misleading way.
  2. Returning stale peers seems less useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning stale peers seems less useful.

Agreed that it is less useful but it is also a lot safer. We landed on this compromise because there was a lot of strong feedback that returning your current peers is unacceptable from a privacy PoV.

1. Node _B_ writes these peer records onto the stream in their [protobuf encoding](https://github.com/libp2p/specs/blob/master/RFC/0003-routing-records.md#address-record-format), each record being length-prefixed using an unsigned varint and closes the stream after the last one.
1. Node _A_ reads peer records from the stream until EOF or 5 have been received, whichever comes earlier.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Node _A_ reads peer records from the stream until EOF or 5 have been received, whichever comes earlier.
1. Node _A_ reads peer records from the stream until EOF or <num> have been received, whichever comes earlier. Implementations MAY allow overriding of <num> but SHOULD use a default of 5.


## Security considerations

Revealing even just some of your peers has serious privacy and security implications for a network.
By default, implementations MUST NOT share records of peers they are currently connected to.
Implementations MAY add a configuration flag that allows users to override this.

<!-- @vyzo to add more text here -->

## Implementation considerations

### Bound local peer storage

This protocol requires nodes to store records of peers they used to be connected to.
This is useful independently of this protocol to e.g. reconnect to a peer you've once been connected to.
Comment on lines +58 to +59
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a lot of complexity to implementations.

Copy link
Contributor Author

@thomaseizinger thomaseizinger Oct 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on why this adds complexity? I appreciate that the various implementations differ greatly in their internal design but fundamentally, this can easily be satisfied with an LRU cache. Really, all we are doing is accessing the "peer store". I can make that more explicit if you want?

Implementations should take care that the resulting memory or disk usage is bounded and only store a number of peers appropriate for their deployment target (mobile, server, etc).

### Group transport technologies

Implementations MAY group transports as follows:

1. **Anything on top of TCP:** We support several encryption protocols on top of TCP like noise or TLS.
Some nodes may choose to embed this in their multiaddress using `/tls` or `/noise`.
Nodes MAY consider these to be the equivalent and return a peer record containing a `/tcp/noise` address on a connection that is using `/tcp/tls`.
2. **All versions of QUIC:** QUIC is in itself a versioned protocol and we have for the moment two multiaddress protocols: `/quic` and `/quicv1`.
thomaseizinger marked this conversation as resolved.
Show resolved Hide resolved
For the purpose of ambient peer discovery, nodes MAY assume all current and future versions of QUIC are supported by the remote node.
3. **Anything Web:** If a peer connects over `/webrtc`, `/webrtc-direct`, `/webtransport` or `/ws`, chances are they are a browser node.
As such, nodes MAY assume that any peer record with one of these is useful.
4. **IPv4 & IPv6**: Nodes MAY assume that the requesting peer is capable of dialing either version of IP, regardless of which one was used to make the connection.

### Separating networks

Libp2p is used across a range of networks and many of them may not actually have a useful overlap in compatible protocols.
To avoid sharing addresses of peers that don't support useful protocols, implementations SHOULD allow configuration of the protocol identifier.
For example, instead of `/libp2p/ambient-peers` a node may use `/my-cool-p2p-network/ambient-peers`.
It is RECOMMENDED that implementations retain the `/ambient-peers` suffix to communicate the semantics of this protocol.

## Prior art

Exchanging peers one knows is a common thing in the peer-to-peer space:

1. [PEX](https://en.wikipedia.org/wiki/Peer_exchange) augments the BitTorrent protocol.
2. Bitcoin nodes can send [`addr`](https://en.bitcoin.it/wiki/Protocol_documentation#addr) messages to exchange peers with one another.
3. WAKU has an [ambient-peer discovery](https://github.com/vacp2p/rfc/blob/master/content/docs/rfcs/34/README.md) protocol built on top of libp2p.

There have been several discussions in the libp2p space about adding such a protocol:

- https://github.com/libp2p/specs/issues/222
- https://github.com/libp2p/notes/issues/3
- https://github.com/libp2p/notes/issues/7

## FAQ

### Why not use the rendezvous?

The rendezvous protocol could be repurposed as a kind of peer exchange protocol.
We would have to agree on an identifier that all peers use to register themselves within a certain topic.
Other peers can then go and query a node for all peers registered under this topic.

We consider this impractical for the given problem because:

- It requires three parties to support the protocol instead of just two and thus would take a lot longer to be rolled out.
- It creates a lot of traffic.
Nodes have to actively register themselves without knowing whether their peer record will ever be distributed / requested.
To be effective, every node would have to register themselves with every other node.