Skip to content

Commit

Permalink
Merge pull request #217 from libp2p/rfc/address-records
Browse files Browse the repository at this point in the history
RFC: Signed Address Records
  • Loading branch information
jacobheun authored Nov 19, 2020
2 parents 2e175f0 + e401b14 commit b70ccf2
Show file tree
Hide file tree
Showing 2 changed files with 389 additions and 0 deletions.
112 changes: 112 additions & 0 deletions RFC/0002-signed-envelopes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# RFC 0002 - Signed Envelopes

- Start Date: 2019-10-21
- Related RFC: [0003 Address Records][addr-records-rfc]

## Abstract

This RFC proposes a "signed envelope" structure that contains an arbitrary byte
string payload, a signature of the payload, and the public key that can be used
to verify the signature.

This was spun out of an earlier draft of the [address records
RFC][addr-records-rfc], since it's generically useful.

## Problem Statement

Sometimes we'd like to store some data in a public location (e.g. a DHT, etc),
or make use of potentially untrustworthy intermediaries to relay information. It
would be nice to have an all-purpose data container that includes a signature of
the data, so we can verify that the data came from a specific peer and that it hasn't
been tampered with.

## Domain Separation

Signatures can be used for a variety of purposes, and a signature made for a
specific purpose MUST NOT be considered valid for a different purpose.

Without this property, an attacker could convince a peer to sign a payload in
one context and present it as valid in another, for example, presenting a signed
address record as a pubsub message.

We separate signatures into "domains" by prefixing the data to be signed with a
string unique to each domain. This string is not contained within the payload or
the outer envelope structure. Instead, each libp2p subsystem that makes use of
signed envelopes will provide their own domain string when constructing the
envelope, and again when validating the envelope. If the domain string used to
validate is different from the one used to sign, the signature validation will
fail.

Domain strings may be any valid UTF-8 string, but should be fairly short and
descriptive of their use case, for example `"libp2p-routing-record"`.

## Payload Type Information

The envelope record can contain an arbitrary byte string payload, which will
need to be interpreted in the context of a specific use case. To assist in
"hydrating" the payload into an appropriate domain object, we include a "payload
type" field. This field consists of a [multicodec][multicodec] code,
optionally followed by an arbitrary byte sequence.

This allows very compact type hints that contain just a multicodec, as well as
"path" multicodecs of the form `/some/thing`, using the ["namespace"
multicodec](https://github.com/multiformats/multicodec/blob/master/table.csv#L23),
whose binary value is equivalent to the UTF-8 `/` character.

Use of the payload type field is encouraged, but the field may be left empty
without invalidating the envelope.

## Wire Format

Since we already have a [protobuf definition for public keys][peer-id-spec], we
can use protobuf for this as well and easily embed the key in the envelope:


```protobuf
message Envelope {
PublicKey public_key = 1; // see peer id spec for definition
bytes payload_type = 2; // payload type indicator
bytes payload = 3; // opaque binary payload
bytes signature = 5; // see below for signing rules
}
```

The `public_key` field contains the public key whose secret counterpart was used
to sign the message. This MUST be consistent with the peer id of the signing
peer, as the recipient will derive the peer id of the signer from this key.

The `payload_type` field contains a [multicodec][multicodec]-prefixed type
indicator as described in the [Payload Type Information
section](#payload-type-information).

The `payload` field contains the arbitrary byte string payload.

The `signature` field contains a signature of all fields except `public_key`,
generated as described below.

## Signature Production / Verification

When signing, a peer will prepare a buffer by concatenating the following:

- The length of the [domain separation string](#domain-separation) string in
bytes
- The domain separation string, encoded as UTF-8
- The length of the `payload_type` field in bytes
- The value of the `payload_type` field
- The length of the `payload` field in bytes
- The value of the `payload` field

The length values for each field are encoded as unsigned variable-length
integers as defined in the [multiformats uvarint spec][uvarint].

Then they will sign the buffer according to the rules in the [peer id
spec][peer-id-spec] and set the `signature` field accordingly.

To verify, a peer will "inflate" the `public_key` into a domain object that can
verify signatures, prepare a buffer as above and verify the `signature` field
against it.

[addr-records-rfc]: ./0003-routing-records.md
[peer-id-spec]: ../peer-ids/peer-ids.md
[multicodec]: https://github.com/multiformats/multicodec
[uvarint]: https://github.com/multiformats/unsigned-varint
277 changes: 277 additions & 0 deletions RFC/0003-routing-records.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,277 @@
# RFC 0003 - Peer Routing Records

- Start Date: 2019-10-04
- Related Issues:
- [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47)
- [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436)

## Abstract

This RFC proposes a method for distributing peer routing records, which contain
a peer's publicly reachable listen addresses, and may be extended in the future
to contain additional metadata relevant to routing. This serves a similar
purpose to [Ethereum Node Records][eip-778]. Like ENR records, libp2p routing
records should be extensible, so that we can add information relevant to as-yet
unknown use cases.

The record described here does not include a signature, but it is expected to
be serialized and wrapped in a [signed envelope][envelope-rfc], which will
prove the identity of the issuing peer. The dialer can then prioritize
self-certified addresses over addresses from an unknown origin.

## Problem Statement

All libp2p peers keep a "peer store", which maps [peer ids][peer-id-spec] to a
set of known addresses for each peer. When the application layer wants to
contact a peer, the dialer will pull addresses from the peer store and try to
initiate a connection on one or more addresses.

Addresses for a peer can come from a variety of sources. If we have already made
a connection to a peer, the libp2p [identify protocol][identify-spec] will
inform us of other addresses that they are listening on. We may also discover
their address by querying the DHT, checking a fixed "bootstrap list", or perhaps
through a pubsub message or an application-specific protocol.

In the case of the identify protocol, we can be fairly certain that the
addresses originate from the peer we're speaking to, assuming that we're using a
secure, authenticated communication channel. However, more "ambient" discovery
methods such as DHT traversal and pubsub depend on potentially untrustworthy
third parties to relay address information.

Even in the case of receiving addresses via the identify protocol, our
confidence that the address came directly from the peer is not actionable, because
the peer store does not track the origin of an address. Once added to the peer
store, all addresses are considered equally valid, regardless of their source.

We would like to have a means of distributing _verifiable_ address records,
which we can prove originated from the addressed peer itself. We also need a way to
track the "provenance" of an address within libp2p's internal components such as
the peer store. Once those pieces are in place, we will also need a way to
prioritize addresses based on their authenticity, with the most strict strategy
being to only dial certified addresses.

### Complications

While producing a signed record is fairly trivial, there are a few aspects to
this problem that complicate things.

1. Addresses are not static. A given peer may have several addresses at any given
time, and the set of addresses can change at arbitrary times.
2. Peers may not know their own addresses. It's often impossible to automatically
infer one's own public address, and peers may need to rely on third party
peers to inform them of their observed public addresses.
3. A peer may inadvertently or maliciously sign an address that they do not
control. In other words, a signature isn't a guarantee that a given address is
valid.
4. Some addresses may be ambiguous. For example, addresses on a private subnet
are valid within that subnet but are useless on the public internet.

The first point can be addressed by having records contain a sequence number
that increases monotonically when new records are issued, and by having newer
records replace older ones.

The other points, while worth thinking about, are out of scope for this RFC.
However, we can take care to make our records extensible so that we can add
additional metadata in the future. Some thoughts along these lines are in the
[Future Work section below](#future-work).

## Address Record Format

Here's a protobuf that might work:

```protobuf
// PeerRecord contains the listen addresses for a peer at a particular point in time.
message PeerRecord {
// AddressInfo wraps a multiaddr. In the future, it may be extended to
// contain additional metadata, such as "routability" (whether an address is
// local or global, etc).
message AddressInfo {
bytes multiaddr = 1;
}
// the peer id of the subject of the record (who these addresses belong to).
bytes peer_id = 1;
// A monotonically increasing sequence number, used for record ordering.
uint64 seq = 2;
// All current listen addresses
repeated AddressInfo addresses = 3;
}
```

The `AddressInfo` wrapper message is used instead of a bare multiaddr to allow
us to extend addresses with additional metadata [in the future](#future-work).

The `seq` field contains a sequence number that MUST increase monotonically as
new records are created. Newer records MUST have a higher `seq` value than older
records. To avoid persisting state across restarts, implementations MAY use unix
epoch time as the `seq` value, however they MUST NOT attempt to interpret a
`seq` value from another peer as a valid timestamp.

#### Example

```javascript
{
peer_id: "QmAlice...",
seq: 1570215229,
addresses: [
{
multiaddr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice",
},
{
multiaddr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice",
}
]
}
```

A peer SHOULD only include addresses that it believes are routable via the
public internet, ideally having confirmed that this is the case via some
external mechanism such as a successful AutoNAT dial-back.

In some cases we may want to include localhost or LAN-local address; for
example, when testing the DHT using many processes on a single machine. To
support this, implementations may use a global runtime configuration flag or
environment variable to control whether local addresses will be included.

## Certification / Verification

This structure can be serialized and contained in a [signed
envelope][envelope-rfc], which lets us issue "self-certified" address records
that are signed by the peer that the addresses belong to.

To produce a "self-certified" address, a peer will construct a `RoutingState`
containing their listen addresses and serialize it to a byte array using a
protobuf encoder. The serialized records will then be wrapped in a [signed
envelope][envelope-rfc], which is signed with the libp2p peer's private host
key. The corresponding public key MUST be included in the envelope's
`public_key` field.

When receiving a `RoutingState` wrapped in a signed envelope, a peer MUST
validate the signature before deserializing the `RoutingState` record. If the
signature is invalid, the envelope MUST be discarded without deserializing the
envelope payload.

Once the signature has been verified and the `RoutingState` has been
deserialized, the receiving peer MUST verify that the `peer_id` contained in the
`RoutingState` matches the `public_key` from the envelope. If the public key in
the envelope cannot derive the peer id contained in the routing state record,
the `RoutingState` MUST be discarded.

### Signed Envelope Domain

Signed envelopes require a "domain separation" string that defines the scope
or purpose of a signature.

When wrapping a `RoutingState` in a signed envelope, the domain string MUST be
`libp2p-routing-state`.

### Signed Envelope Payload Type

Signed envelopes contain a `payload_type` field that indicates how to interpret
the contents of the envelope.

Ideally, we should define a new multicodec for routing records, so that we can
identify them in a few bytes. While we're still spec'ing and working on the
initial implementation, we can use the UTF-8 string
`"/libp2p/routing-state-record"` as the `payload_type` value.

## Peer Store APIs

We will need to add a few methods to the peer store:

- `AddCertifiedAddrs(envelope) -> Maybe<Error>`
- Add a self-certified address, wrapped in a signed envelope. This should
validate the envelope signature & store the envelope for future reference.
If any certified addresses already exist for the peer, only accept the new
envelope if it has a greater `seq` value than existing envelopes.

- `CertifiedAddrs(peer_id) -> Set<Multiaddr>`
- return the set of self-certified addresses for the given peer id

- `SignedRoutingState(peer_id) -> Maybe<SignedEnvelope>`
- retrieve the signed envelope that was most recently added to the peerstore
for the given peer, if any exists.

And possibly:

- `IsCertified(peer_id, multiaddr) -> Boolean`
- has a particular address been self-certified by the given peer?


We'll also need a method that constructs a new `RoutingState` containing our
listen addresses and wraps it in a signed envelope. This may belong on the Host
instead of the peer store, since it needs access to the private signing key.

When adding records to the peerstore, a receiving peer MUST keep track of the
latest `seq` value received for each peer and reject incoming `RoutingState`
messages unless they contain a greater `seq` value than the last received.

After integrating the information from the `RoutingState` into the peerstore,
implementations SHOULD retain the original signed envelope. This will allow
other libp2p systems to share signed `RoutingState` records with other peers in
the network, preserving the signature of the issuing peer. The [Exchanging
Records section](#exchanging-records) section lists some systems that would need
to retrieve the original signed record from the peerstore.

## Dialing Strategies

Once self-certified addresses are available via the peer store, we can update
the dialer to prefer using them when possible. Some systems may want to _only_
dial self-certified addresses, so we should include some configuration options
to control whether non-certified addresses are acceptable.

## Exchanging Records

We currently have several systems in libp2p that deal with peer addressing and
which could be updated to use signed routing records:

- Public peer discovery using [libp2p's DHT][dht-spec]
- Local peer discovery with [mDNS][mdns-spec]
- Direct exchange using the [identify protocol][identify-spec]
- Service discovery via the [rendezvous protocol][rendezvous-spec]
- A proposal for [a public peer exchange protocol][pex-proposal]

Of these, the highest priority for updating seems to be the DHT, since it's
actively used by several deployed systems and is vulnerable to routing attacks
by malicious peers. We should work on extending the `FIND_NODE`, `ADD_PROVIDER`,
and `GET_PROVIDERS` RPC messages to support returning signed records in addition
to the current unsigned address information they currently support.

We should also either define a new "secure peer routing" interface or extend the
existing peer routing interfaces to support signed records, so that we don't end
up with a bunch of similar but incompatible APIs for exchanging signed address
records.

## Future Work

Some things that were originally considered in this RFC were trimmed so that we
can focus on delivering a basic self-certified record, which is a pressing need.

This includes a notion of "routability", which could be used to communicate
whether a given address is global (reachable via the public internet),
LAN-local, etc. We may also want to include some kind of confidence score or
priority ranking, so that peers can communicate which addresses they would
prefer other peers to use.

To allow these fields to be added in the future, we wrap multiaddrs in the
`AddressInfo` message instead of having the `addresses` field be a list of "raw"
multiaddrs.

Another potentially useful extension would be a compact protocol table or bloom
filter that could be used to test whether a peer supports a given protocol
before interacting with them directly. This could be added as a new field in the
`RoutingState` message.



[identify-spec]: ../identify/README.md
[peer-id-spec]: ../peer-ids/peer-ids.md
[mdns-spec]: ../discovery/mdns.md
[rendezvous-spec]: ../rendezvous/README.md
[pex-proposal]: https://github.com/libp2p/notes/issues/7
[autonat]: https://github.com/libp2p/specs/issues/180
[envelope-rfc]: ./0002-signed-envelopes.md
[eip-778]: https://eips.ethereum.org/EIPS/eip-778

0 comments on commit b70ccf2

Please sign in to comment.