diff --git a/RFC/0002-signed-envelopes.md b/RFC/0002-signed-envelopes.md new file mode 100644 index 000000000..9e232f973 --- /dev/null +++ b/RFC/0002-signed-envelopes.md @@ -0,0 +1,112 @@ +# RFC 0002 - Signed Envelopes + +- Start Date: 2019-10-21 +- Related RFC: [0003 Address Records][addr-records-rfc] + +## Abstract + +This RFC proposes a "signed envelope" structure that contains an arbitrary byte +string payload, a signature of the payload, and the public key that can be used +to verify the signature. + +This was spun out of an earlier draft of the [address records +RFC][addr-records-rfc], since it's generically useful. + +## Problem Statement + +Sometimes we'd like to store some data in a public location (e.g. a DHT, etc), +or make use of potentially untrustworthy intermediaries to relay information. It +would be nice to have an all-purpose data container that includes a signature of +the data, so we can verify that the data came from a specific peer and that it hasn't +been tampered with. + +## Domain Separation + +Signatures can be used for a variety of purposes, and a signature made for a +specific purpose MUST NOT be considered valid for a different purpose. + +Without this property, an attacker could convince a peer to sign a payload in +one context and present it as valid in another, for example, presenting a signed +address record as a pubsub message. + +We separate signatures into "domains" by prefixing the data to be signed with a +string unique to each domain. This string is not contained within the payload or +the outer envelope structure. Instead, each libp2p subsystem that makes use of +signed envelopes will provide their own domain string when constructing the +envelope, and again when validating the envelope. If the domain string used to +validate is different from the one used to sign, the signature validation will +fail. + +Domain strings may be any valid UTF-8 string, but should be fairly short and +descriptive of their use case, for example `"libp2p-routing-record"`. + +## Payload Type Information + +The envelope record can contain an arbitrary byte string payload, which will +need to be interpreted in the context of a specific use case. To assist in +"hydrating" the payload into an appropriate domain object, we include a "payload +type" field. This field consists of a [multicodec][multicodec] code, +optionally followed by an arbitrary byte sequence. + +This allows very compact type hints that contain just a multicodec, as well as +"path" multicodecs of the form `/some/thing`, using the ["namespace" +multicodec](https://github.com/multiformats/multicodec/blob/master/table.csv#L23), +whose binary value is equivalent to the UTF-8 `/` character. + +Use of the payload type field is encouraged, but the field may be left empty +without invalidating the envelope. + +## Wire Format + +Since we already have a [protobuf definition for public keys][peer-id-spec], we +can use protobuf for this as well and easily embed the key in the envelope: + + +```protobuf +message Envelope { + PublicKey public_key = 1; // see peer id spec for definition + bytes payload_type = 2; // payload type indicator + bytes payload = 3; // opaque binary payload + bytes signature = 5; // see below for signing rules +} +``` + +The `public_key` field contains the public key whose secret counterpart was used +to sign the message. This MUST be consistent with the peer id of the signing +peer, as the recipient will derive the peer id of the signer from this key. + +The `payload_type` field contains a [multicodec][multicodec]-prefixed type +indicator as described in the [Payload Type Information +section](#payload-type-information). + +The `payload` field contains the arbitrary byte string payload. + +The `signature` field contains a signature of all fields except `public_key`, +generated as described below. + +## Signature Production / Verification + +When signing, a peer will prepare a buffer by concatenating the following: + +- The length of the [domain separation string](#domain-separation) string in + bytes +- The domain separation string, encoded as UTF-8 +- The length of the `payload_type` field in bytes +- The value of the `payload_type` field +- The length of the `payload` field in bytes +- The value of the `payload` field + +The length values for each field are encoded as unsigned variable-length +integers as defined in the [multiformats uvarint spec][uvarint]. + +Then they will sign the buffer according to the rules in the [peer id +spec][peer-id-spec] and set the `signature` field accordingly. + +To verify, a peer will "inflate" the `public_key` into a domain object that can +verify signatures, prepare a buffer as above and verify the `signature` field +against it. + +[addr-records-rfc]: ./0003-routing-records.md +[peer-id-spec]: ../peer-ids/peer-ids.md +[multicodec]: https://github.com/multiformats/multicodec +[uvarint]: https://github.com/multiformats/unsigned-varint diff --git a/RFC/0003-routing-records.md b/RFC/0003-routing-records.md new file mode 100644 index 000000000..fbb3394b4 --- /dev/null +++ b/RFC/0003-routing-records.md @@ -0,0 +1,277 @@ +# RFC 0003 - Peer Routing Records + +- Start Date: 2019-10-04 +- Related Issues: + - [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47) + - [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436) + +## Abstract + +This RFC proposes a method for distributing peer routing records, which contain +a peer's publicly reachable listen addresses, and may be extended in the future +to contain additional metadata relevant to routing. This serves a similar +purpose to [Ethereum Node Records][eip-778]. Like ENR records, libp2p routing +records should be extensible, so that we can add information relevant to as-yet +unknown use cases. + +The record described here does not include a signature, but it is expected to +be serialized and wrapped in a [signed envelope][envelope-rfc], which will +prove the identity of the issuing peer. The dialer can then prioritize +self-certified addresses over addresses from an unknown origin. + +## Problem Statement + +All libp2p peers keep a "peer store", which maps [peer ids][peer-id-spec] to a +set of known addresses for each peer. When the application layer wants to +contact a peer, the dialer will pull addresses from the peer store and try to +initiate a connection on one or more addresses. + +Addresses for a peer can come from a variety of sources. If we have already made +a connection to a peer, the libp2p [identify protocol][identify-spec] will +inform us of other addresses that they are listening on. We may also discover +their address by querying the DHT, checking a fixed "bootstrap list", or perhaps +through a pubsub message or an application-specific protocol. + +In the case of the identify protocol, we can be fairly certain that the +addresses originate from the peer we're speaking to, assuming that we're using a +secure, authenticated communication channel. However, more "ambient" discovery +methods such as DHT traversal and pubsub depend on potentially untrustworthy +third parties to relay address information. + +Even in the case of receiving addresses via the identify protocol, our +confidence that the address came directly from the peer is not actionable, because +the peer store does not track the origin of an address. Once added to the peer +store, all addresses are considered equally valid, regardless of their source. + +We would like to have a means of distributing _verifiable_ address records, +which we can prove originated from the addressed peer itself. We also need a way to +track the "provenance" of an address within libp2p's internal components such as +the peer store. Once those pieces are in place, we will also need a way to +prioritize addresses based on their authenticity, with the most strict strategy +being to only dial certified addresses. + +### Complications + +While producing a signed record is fairly trivial, there are a few aspects to +this problem that complicate things. + +1. Addresses are not static. A given peer may have several addresses at any given + time, and the set of addresses can change at arbitrary times. +2. Peers may not know their own addresses. It's often impossible to automatically + infer one's own public address, and peers may need to rely on third party + peers to inform them of their observed public addresses. +3. A peer may inadvertently or maliciously sign an address that they do not + control. In other words, a signature isn't a guarantee that a given address is + valid. +4. Some addresses may be ambiguous. For example, addresses on a private subnet + are valid within that subnet but are useless on the public internet. + +The first point can be addressed by having records contain a sequence number +that increases monotonically when new records are issued, and by having newer +records replace older ones. + +The other points, while worth thinking about, are out of scope for this RFC. +However, we can take care to make our records extensible so that we can add +additional metadata in the future. Some thoughts along these lines are in the +[Future Work section below](#future-work). + +## Address Record Format + +Here's a protobuf that might work: + +```protobuf + +// PeerRecord contains the listen addresses for a peer at a particular point in time. +message PeerRecord { + // AddressInfo wraps a multiaddr. In the future, it may be extended to + // contain additional metadata, such as "routability" (whether an address is + // local or global, etc). + message AddressInfo { + bytes multiaddr = 1; + } + + // the peer id of the subject of the record (who these addresses belong to). + bytes peer_id = 1; + + // A monotonically increasing sequence number, used for record ordering. + uint64 seq = 2; + + // All current listen addresses + repeated AddressInfo addresses = 3; +} +``` + +The `AddressInfo` wrapper message is used instead of a bare multiaddr to allow +us to extend addresses with additional metadata [in the future](#future-work). + +The `seq` field contains a sequence number that MUST increase monotonically as +new records are created. Newer records MUST have a higher `seq` value than older +records. To avoid persisting state across restarts, implementations MAY use unix +epoch time as the `seq` value, however they MUST NOT attempt to interpret a +`seq` value from another peer as a valid timestamp. + +#### Example + +```javascript + { + peer_id: "QmAlice...", + seq: 1570215229, + addresses: [ + { + multiaddr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice", + }, + { + multiaddr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice", + } + ] + } +``` + +A peer SHOULD only include addresses that it believes are routable via the +public internet, ideally having confirmed that this is the case via some +external mechanism such as a successful AutoNAT dial-back. + +In some cases we may want to include localhost or LAN-local address; for +example, when testing the DHT using many processes on a single machine. To +support this, implementations may use a global runtime configuration flag or +environment variable to control whether local addresses will be included. + +## Certification / Verification + +This structure can be serialized and contained in a [signed +envelope][envelope-rfc], which lets us issue "self-certified" address records +that are signed by the peer that the addresses belong to. + +To produce a "self-certified" address, a peer will construct a `RoutingState` +containing their listen addresses and serialize it to a byte array using a +protobuf encoder. The serialized records will then be wrapped in a [signed +envelope][envelope-rfc], which is signed with the libp2p peer's private host +key. The corresponding public key MUST be included in the envelope's +`public_key` field. + +When receiving a `RoutingState` wrapped in a signed envelope, a peer MUST +validate the signature before deserializing the `RoutingState` record. If the +signature is invalid, the envelope MUST be discarded without deserializing the +envelope payload. + +Once the signature has been verified and the `RoutingState` has been +deserialized, the receiving peer MUST verify that the `peer_id` contained in the +`RoutingState` matches the `public_key` from the envelope. If the public key in +the envelope cannot derive the peer id contained in the routing state record, +the `RoutingState` MUST be discarded. + +### Signed Envelope Domain + +Signed envelopes require a "domain separation" string that defines the scope +or purpose of a signature. + +When wrapping a `RoutingState` in a signed envelope, the domain string MUST be +`libp2p-routing-state`. + +### Signed Envelope Payload Type + +Signed envelopes contain a `payload_type` field that indicates how to interpret +the contents of the envelope. + +Ideally, we should define a new multicodec for routing records, so that we can +identify them in a few bytes. While we're still spec'ing and working on the +initial implementation, we can use the UTF-8 string +`"/libp2p/routing-state-record"` as the `payload_type` value. + +## Peer Store APIs + +We will need to add a few methods to the peer store: + +- `AddCertifiedAddrs(envelope) -> Maybe` + - Add a self-certified address, wrapped in a signed envelope. This should + validate the envelope signature & store the envelope for future reference. + If any certified addresses already exist for the peer, only accept the new + envelope if it has a greater `seq` value than existing envelopes. + +- `CertifiedAddrs(peer_id) -> Set` + - return the set of self-certified addresses for the given peer id + +- `SignedRoutingState(peer_id) -> Maybe` + - retrieve the signed envelope that was most recently added to the peerstore + for the given peer, if any exists. + +And possibly: + +- `IsCertified(peer_id, multiaddr) -> Boolean` + - has a particular address been self-certified by the given peer? + + +We'll also need a method that constructs a new `RoutingState` containing our +listen addresses and wraps it in a signed envelope. This may belong on the Host +instead of the peer store, since it needs access to the private signing key. + +When adding records to the peerstore, a receiving peer MUST keep track of the +latest `seq` value received for each peer and reject incoming `RoutingState` +messages unless they contain a greater `seq` value than the last received. + +After integrating the information from the `RoutingState` into the peerstore, +implementations SHOULD retain the original signed envelope. This will allow +other libp2p systems to share signed `RoutingState` records with other peers in +the network, preserving the signature of the issuing peer. The [Exchanging +Records section](#exchanging-records) section lists some systems that would need +to retrieve the original signed record from the peerstore. + +## Dialing Strategies + +Once self-certified addresses are available via the peer store, we can update +the dialer to prefer using them when possible. Some systems may want to _only_ +dial self-certified addresses, so we should include some configuration options +to control whether non-certified addresses are acceptable. + +## Exchanging Records + +We currently have several systems in libp2p that deal with peer addressing and +which could be updated to use signed routing records: + +- Public peer discovery using [libp2p's DHT][dht-spec] +- Local peer discovery with [mDNS][mdns-spec] +- Direct exchange using the [identify protocol][identify-spec] +- Service discovery via the [rendezvous protocol][rendezvous-spec] +- A proposal for [a public peer exchange protocol][pex-proposal] + +Of these, the highest priority for updating seems to be the DHT, since it's +actively used by several deployed systems and is vulnerable to routing attacks +by malicious peers. We should work on extending the `FIND_NODE`, `ADD_PROVIDER`, +and `GET_PROVIDERS` RPC messages to support returning signed records in addition +to the current unsigned address information they currently support. + +We should also either define a new "secure peer routing" interface or extend the +existing peer routing interfaces to support signed records, so that we don't end +up with a bunch of similar but incompatible APIs for exchanging signed address +records. + +## Future Work + +Some things that were originally considered in this RFC were trimmed so that we +can focus on delivering a basic self-certified record, which is a pressing need. + +This includes a notion of "routability", which could be used to communicate +whether a given address is global (reachable via the public internet), +LAN-local, etc. We may also want to include some kind of confidence score or +priority ranking, so that peers can communicate which addresses they would +prefer other peers to use. + +To allow these fields to be added in the future, we wrap multiaddrs in the +`AddressInfo` message instead of having the `addresses` field be a list of "raw" +multiaddrs. + +Another potentially useful extension would be a compact protocol table or bloom +filter that could be used to test whether a peer supports a given protocol +before interacting with them directly. This could be added as a new field in the +`RoutingState` message. + + + +[identify-spec]: ../identify/README.md +[peer-id-spec]: ../peer-ids/peer-ids.md +[mdns-spec]: ../discovery/mdns.md +[rendezvous-spec]: ../rendezvous/README.md +[pex-proposal]: https://github.com/libp2p/notes/issues/7 +[autonat]: https://github.com/libp2p/specs/issues/180 +[envelope-rfc]: ./0002-signed-envelopes.md +[eip-778]: https://eips.ethereum.org/EIPS/eip-778