From 77e3b6689466ed894c6752ab9091ced38ede3e88 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Fri, 4 Oct 2019 16:00:59 -0400 Subject: [PATCH 01/17] rfc: 1st draft for signed address records --- RFC/0002-signed-address-records.md | 242 +++++++++++++++++++++++++++++ 1 file changed, 242 insertions(+) create mode 100644 RFC/0002-signed-address-records.md diff --git a/RFC/0002-signed-address-records.md b/RFC/0002-signed-address-records.md new file mode 100644 index 000000000..d83ab6b50 --- /dev/null +++ b/RFC/0002-signed-address-records.md @@ -0,0 +1,242 @@ +# RFC 0002 - Signed Address Records + +- Start Date: 2019-10-04 +- Related Issues: + - [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47) + - [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436) + +## Abstract + +This RFC proposes a method for distributing _self-certified_ address records, +which contain a peer's publicly reachable listen addresses. The record also +includes a signature, which proves that the record was produced by the peer +itself and not tampered with in transit. + +## Problem Statement + +All libp2p peers keep a "peer store" (called a peer book in some +implementations), which maps [peer ids][peer-id-spec] to a set of known +addresses for each peer. When the application layer wants to contact a peer, the +dialer will pull addresses from the peer store and try to initiate a connection +on one or more addresses. + +Addresses for a peer can come from a variety of sources. If we have already made +a connection to a peer, the libp2p [identify protocol][identify-spec] will +inform us of other addresses that they are listening on. We may also discover +their address by querying the DHT, checking a fixed "bootstrap list", or perhaps +through a pubsub message or an application-specific protocol. + +In the case of the identify protocol, we can be fairly certain that the +addresses originate from the peer we're speaking to, assuming that we're using a +secure, authenticated communication channel. However, more "ambient" discovery +methods such as DHT traversal and pubsub depend on potentially untrustworthy +third parties to relay address information. + +Even in the case of receiving addresses via the identify protocol, our +confidence that the address came directly from the peer is not actionable, because +the peer store does not track the origin of an address. Once added to the peer +store, all addresses are considered equally valid, regardless of their source. + +We would like to have a means of distributing _verifiable_ address records, +which we can prove originated from the addressed peer itself. We also need a way to +track the "provenance" of an address within libp2p's internal components such as +the peer store. Once those pieces are in place, we will also need a way to +prioritize addresses based on their authenticity, with the most strict strategy +being to only dial certified addresses. + +### Complications + +While producing a signed record is fairly trivial, there are a few aspects to +this problem that complicate things. + +1. Addresses are not static. A given peer may have several addresses at any given + time, and the set of addresses can change at arbitrary times. +2. Peers may not know their own addresses. It's often impossible to automatically + infer one's own public address, and peers may need to rely on third party + peers to inform them of their observed public addresses. +3. A peer may inadvertently or maliciously sign an address that they do not + control. In other words, a signature isn't a guarantee that a given address is + valid. +4. Some addresses may be ambiguous. For example, addresses on a private subnet + are valid within that subnet but are useless on the public internet. + +The first point implies that the address record should include some kind of +temporal component, so that newer records can replace older ones as the state +changes over time. This could be a timestamp and/or a simple sequence number +that each node increments whenever they publish a new record. + +The second and third points highlight the limits of certifying information that +is itself uncertain. While a signature can prove that the addresses originated +from the peer, it cannot prove that the addresses are correct or useful. Given +the asymmetric nature of real-world NATs, it's often the case that a peer is +_less likely_ to have correct information about its own address than an outside +observer, at least initially. + +This suggests that we should include some measure of "confidence" in our +records, so that peers can distribute addresses that they are not fully certain +are correct, while still asserting that they created the record. For example, +when requesting a dial-back via the [AutoNAT service][autonat], a peer could +send a "provisional" address record. When the AutoNAT peer confirms the address, +that address could be marked as publicly-routable and advertised in a new record. + +Regarding the fourth point about ambiguous addresses, it would also be desirable +for the address record to include a notion of "routability," which would +indicate how "accessible" the address is likely to be. This would allow us to +mark an address as "LAN-only," if we know that it is not mapped to a publicly +reachable address but would still like to distribute it to local peers. + +## Address Record Format + +There are many potential data structures that we could use to store and transmit +address information. This section sketches out a possible design using +[IPLD][ipld], although we may end up adopting a different format. Everything in +this section is subject to change as part of the RFC process. + +These types are defined using IPLD's Schema notation, the best reference for +which I'm currently aware of is [its own schema definition][ipld-schema-schema]. + +```sh + +## How accessible we believe a given address to be. +## Maybe include params? We could potentially have a subnet mask for local addresses +type Routability enum { + | "GLOBAL" ## Available on the public internet + | "LOCAL" ## Available on a local network (probably in a private address range) + | "LOOPBACK" ## Available on a loopback address on the same machine + | "UNKNOWN" ## Catch all (may include in-memory transports, etc) +} + +## How confident we are in the validity of an address +type Confidence enum { + | "CONFIRMED" ## We have verified that we're reachable on this address + | "UNCONFIRMED" ## We suspect, but have not confirmed that we're reachable + | "INVALID" ## We know that this address is invalid and should be deleted + | "UNKNOWN" ## No assertions about validity one way or another +} + +## A tuple of an address, how "routable" (public / private, etc) the address is, +## and how confident we are in its validity. +type AddressInfo struct { + addr Bytes ## Binary multiaddr + routability Routability + confidence Confidence +} + +## A point-in-time snapshot of all addresses (plus their info) that we know +## about at the time we issued the record. +## +type AddressState struct { + ## The subject of this record. Who do these addresses belong to? + subject PeerRef + + ## When was this record constructed? + issuedAt Timestamp + + ## A list of all AddressInfo records that apply at the current moment. + addresses List { + valueType &AddressInfo + } +} + +## A signed envelope containing an `AddressState` struct, our +## public key, and a signature of the state (verifiable with public key). +type AddressEnvelope { + state AddressState + + # Public key of issuer. + pubkey Bytes + + # Signature of `state`. Can be verified with `pubkey`. + # Maybe it's better to sign a merkle link to `state` instead... + sig Bytes +} + +## Unix epoch timestamp, UTC timezone. TODO: what precision? +type Timestamp Int + +# binary multihash of public key +type PeerId Bytes + +## A peer id, plus a peer-specific version clock. +## Represents a peer _at a moment in time_, where time is loosely defined as +## unit-less quantity that's always increasing. Version +## numbers must increase monotonically but do not need to be strictly +## sequential. If you don't want to preserve state across restarts or coordinate +## a counter, you can use epoch timestamps as version numbers. +type PeerRef struct { + peer PeerId + version Int +} +``` + +The idea with the structure above is that you send some metadata along with your +addresses: your "routability", and your own confidence in the validity of the +address. This is wrapped in an `AddressInfo` struct along with the address +itself. + +Then you have a big list of `AddressInfo`s, which we put in an `AddressState`. +An `AddressState` identifies the `subject` of the record, who is also the +issuing peer. We could potentially split that out into a separate `subject` and +`issuer` field, which would let peers make statements about each other in +addition to making statements about themselves. That complicates things though, +and may not be worth it. + +The state and a signature of it are wrapped in an `AddressEnvelope`, along with +the public key that produced the signature. Recipients must validate that the +public key is consistent with the peer id of the `subject` and validate the sig. + +Here's an example. Alice has an address that she thinks is publicly reachable +but has not confirmed. She also has a LAN-local address that she knows is valid, +but not routable via the public internet: + +```javascript + { + + pubkey: "", + state: { + subject: { + peer: "QmAlice...", + version: 23456 + }, + issuedAt: 1570215229, + + addresses: [ + { + addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice", + routability: "GLOBAL", + confidence: "UNCONFIRMED" + }, + { + addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice", + routability: "LOCAL", + confidence: "CONFIRMED" + } + ] + }, + sig: "" + } +``` + +If Alice wants to publish her address to a public shared resource like a DHT, +she should omit `LOCAL` and other unreachable addresses, and peers should +likewise filter out `LOCAL` addresses from public sources. + +## TODO + +Some things I'd like to cover but haven't got to or figured out yet: + +- how to store signed records + - should be separate from "working set" that's optimized for retrieval + - need to store unaltered bytes +- how to surface routability and confidence via peerstore APIs +- figure out if IPLD is the way to go here. If not, what serialization format, + etc. +- extend identify protocol to include signed records? +- how are addresses prioritized when dialing? + + +[identify-spec]: ../identify/README.md +[peer-id-spec]: ../peer-ids/peer-ids.md +[autonat]: https://github.com/libp2p/specs/issues/180 +[ipld]: https://ipld.io/ +[ipld-schema-schema]: https://github.com/ipld/specs/blob/master/schemas/schema-schema.ipldsch From 5351d94fe63c2287632d9a7d51474314e3119890 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Mon, 21 Oct 2019 10:32:37 -0400 Subject: [PATCH 02/17] wip - use protobuf instead of IPLD --- RFC/0002-signed-address-records.md | 161 +++++++++++++++-------------- 1 file changed, 85 insertions(+), 76 deletions(-) diff --git a/RFC/0002-signed-address-records.md b/RFC/0002-signed-address-records.md index d83ab6b50..063016dc9 100644 --- a/RFC/0002-signed-address-records.md +++ b/RFC/0002-signed-address-records.md @@ -77,7 +77,7 @@ records, so that peers can distribute addresses that they are not fully certain are correct, while still asserting that they created the record. For example, when requesting a dial-back via the [AutoNAT service][autonat], a peer could send a "provisional" address record. When the AutoNAT peer confirms the address, -that address could be marked as publicly-routable and advertised in a new record. +that address could be marked as confirmed and advertised in a new record. Regarding the fourth point about ambiguous addresses, it would also be desirable for the address record to include a notion of "routability," which would @@ -87,85 +87,84 @@ reachable address but would still like to distribute it to local peers. ## Address Record Format -There are many potential data structures that we could use to store and transmit -address information. This section sketches out a possible design using -[IPLD][ipld], although we may end up adopting a different format. Everything in -this section is subject to change as part of the RFC process. - -These types are defined using IPLD's Schema notation, the best reference for -which I'm currently aware of is [its own schema definition][ipld-schema-schema]. - -```sh - -## How accessible we believe a given address to be. -## Maybe include params? We could potentially have a subnet mask for local addresses -type Routability enum { - | "GLOBAL" ## Available on the public internet - | "LOCAL" ## Available on a local network (probably in a private address range) - | "LOOPBACK" ## Available on a loopback address on the same machine - | "UNKNOWN" ## Catch all (may include in-memory transports, etc) -} - -## How confident we are in the validity of an address -type Confidence enum { - | "CONFIRMED" ## We have verified that we're reachable on this address - | "UNCONFIRMED" ## We suspect, but have not confirmed that we're reachable - | "INVALID" ## We know that this address is invalid and should be deleted - | "UNKNOWN" ## No assertions about validity one way or another -} +Here's a protobuf that might work: + +```protobuf +// Routability indicates the "scope" of an address, meaning how visible +// or accessible it is. This allows us to distinguish between LAN and +// WAN addresses. +// +// Side Note: we could potentially have a GLOBAL_RELAY case, which would +// make it easy to prioritize non-relay addresses in the dialer. Bit of +// a mix of concerns though. +enum Routability { + // catch-all default / unknown scope + UNKNOWN = 1; + + // another process on the same machine + LOOPBACK = 2; + + // a local area network + LOCAL = 3; + + // public internet + GLOBAL = 4; -## A tuple of an address, how "routable" (public / private, etc) the address is, -## and how confident we are in its validity. -type AddressInfo struct { - addr Bytes ## Binary multiaddr - routability Routability - confidence Confidence + // reserved for future use + INTERPLANETARY = 100; } -## A point-in-time snapshot of all addresses (plus their info) that we know -## about at the time we issued the record. -## -type AddressState struct { - ## The subject of this record. Who do these addresses belong to? - subject PeerRef - ## When was this record constructed? - issuedAt Timestamp +// Confidence indicates how much we believe in the validity of the +// address. +enum Confidence { + // default, unknown confidence. we don't know one way or another + UNKNOWN = 1; - ## A list of all AddressInfo records that apply at the current moment. - addresses List { - valueType &AddressInfo - } -} - -## A signed envelope containing an `AddressState` struct, our -## public key, and a signature of the state (verifiable with public key). -type AddressEnvelope { - state AddressState + // INVALID means we know that this address is invalid and should be deleted + INVALID = 2; + + // UNCONFIRMED means that we suspect this address is valid, but we haven't + // fully confirmed that we're reachable. + UNCONFIRMED = 3; - # Public key of issuer. - pubkey Bytes + // CONFIRMED means that we fully believe this address is valid. + // Each node / implementation can have their own criteria for confirmation. + CONFIRMED = 4; +} - # Signature of `state`. Can be verified with `pubkey`. - # Maybe it's better to sign a merkle link to `state` instead... - sig Bytes +// AddressInfo is a multiaddr plus some metadata. +message AddressInfo { + bytes multiaddr = 1; + Routability routability = 2; + Confidence confidence = 3; } -## Unix epoch timestamp, UTC timezone. TODO: what precision? -type Timestamp Int - -# binary multihash of public key -type PeerId Bytes - -## A peer id, plus a peer-specific version clock. -## Represents a peer _at a moment in time_, where time is loosely defined as -## unit-less quantity that's always increasing. Version -## numbers must increase monotonically but do not need to be strictly -## sequential. If you don't want to preserve state across restarts or coordinate -## a counter, you can use epoch timestamps as version numbers. -type PeerRef struct { - peer PeerId - version Int +// AddressState contains the listen addresses (and their metadata) +// for a peer at a particular point in time. +// +// Although this record contains a wall-clock `issuedAt` timestamp, +// there are no guarantees about node clocks being in sync or correct. +// As such, the `issuedAt` field should be considered informational, +// and `seqno` should be preferred when ordering records. +message AddressState { + // the peer id of the subject of the record. + bytes subjectPeer = 1; + + // `seqno` is an increment-only counter that can be used to + // order AddressState records chronologically. Newer records + // MUST have a higher `seqno` than older records, but there + // can be gaps between sequence numbers. + uint64 seqno = 2; + + // The `issuedAt` timestamp stores the creation time of this record in + // seconds from the unix epoch, according to the issuer's clock. There + // are no guarantees about clock sync or correctness. SHOULD NOT be used + // to order AddressState records; use `seqno` instead. + uint64 issuedAt = 3; + + // All current listen addresses and their metadata. + repeated AddressInfo addresses = 4; } ``` @@ -175,11 +174,9 @@ address. This is wrapped in an `AddressInfo` struct along with the address itself. Then you have a big list of `AddressInfo`s, which we put in an `AddressState`. -An `AddressState` identifies the `subject` of the record, who is also the -issuing peer. We could potentially split that out into a separate `subject` and -`issuer` field, which would let peers make statements about each other in -addition to making statements about themselves. That complicates things though, -and may not be worth it. +An `AddressState` identifies the `subject` of the record, + +### TODO: rewrite this to use generic envelope The state and a signature of it are wrapped in an `AddressEnvelope`, along with the public key that produced the signature. Recipients must validate that the @@ -221,6 +218,18 @@ If Alice wants to publish her address to a public shared resource like a DHT, she should omit `LOCAL` and other unreachable addresses, and peers should likewise filter out `LOCAL` addresses from public sources. +## Signature Production & Validation + +TK: describe signing and validating the `AddressState` structure. + + +## Peer Store APIs + + + +## Dialing Strategies + + ## TODO Some things I'd like to cover but haven't got to or figured out yet: From 8d10f25e278723b69ea29364cb8f706ddec4c7d2 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Mon, 21 Oct 2019 11:35:11 -0400 Subject: [PATCH 03/17] split into RFCs for signed envelope / addr records --- RFC/0002-signed-envelopes.md | 94 +++++++++++++ RFC/0003-address-records.md | 246 +++++++++++++++++++++++++++++++++++ 2 files changed, 340 insertions(+) create mode 100644 RFC/0002-signed-envelopes.md create mode 100644 RFC/0003-address-records.md diff --git a/RFC/0002-signed-envelopes.md b/RFC/0002-signed-envelopes.md new file mode 100644 index 000000000..7a1bfcb21 --- /dev/null +++ b/RFC/0002-signed-envelopes.md @@ -0,0 +1,94 @@ +# RFC 0002 - Signed Envelopes + +- Start Date: 2019-10-21 +- Related RFC: [0003 Address Records][addr-records-rfc] + +## Abstract + +This RFC proposes a "signed envelope" structure that contains an arbitray byte +string payload, a signature of the payload, and the public key that can be used +to verify the signature. + +This was spun out of an earlier draft of the [address records +RFC][addr-records-rfc], since it's generically useful. + +## Problem Statement + +Sometimes we'd like to store some data in a public location (e.g. a DHT, etc), +or make use of potentially untrustworthy intermediaries to relay information. It +would be nice to have an all-purpose data container that includes a signature of +the data, so we can verify that the data came from a specific peer and that it hasn't +been tampered with. + +## Wire Format + +Since we already have a [protobuf definition for public keys][peer-id-spec], we +can use protobuf for this as well and easily embed the key in the envelope: + + +```protobuf +message SignedEnvelope { + PublicKey publicKey = 1; // see peer id spec for definition + string purpose = 2; // arbitrary user-defined string for context + bytes cid = 3; // CIDv1 of contents + bytes contents = 4; // payload + bytes signature = 5; // signature of purpose + cid + contents +} +``` + +The `publicKey` field contains the public key whose secret counterpart was used +to sign the message. This MUST be consistent with the peer id of the signing +peer, as the recipient will derive the peer id of the signer from this key. + +The `purpose` field is an aribitrary string that can be used to give some hint +as to the contents. For example, if `contents` contains a serialized +`AddressState` record, `purpose` might contain the string `"AddressState"`. The +contents of the ``purpose`` field are signed alongside `contents` to prevent +tampering, and may be empty if desired. + +The `cid` field contains a version 1 [CID][cid] (content id) that corresponds to +the `content` field. It's used for retrieving messages from [local +storage](#local-storage-of-signed-envelopes), and the embedded multicodec also +gives a hint as to the data type of the `contents`. If the user does not specify +a multicodec when constructing the envelope, the default will be +[`raw`](https://github.com/multiformats/multicodec/blob/master/table.csv#L34) +for raw binary. + +## Signature Production / Verification + +When signing, a peer will prepare a buffer by concatenating the following: + +- The string `"libp2p-signed-envelope:"`, encoded as UTF-8 +- The `purpose` field, encoded as UTF-8 +- The `cid` field +- The `contents` field + +Then they will sign the buffer according to the rules in the [peer id +spec][peer-id-spec] and set the `signature` field accordingly. + +To verify, a peer will "inflate" the `publicKey` into a domain object that can +verify signatures, prepare a buffer as above and verify the `signature` field +against it. + +## Local Storage of Signed Envelopes + +Signed envelopes can be used for ephemeral data, but we may also want to persist +them for a while and / or make previously recieved envelopes accesible to +various libp2p modules. + +For example, if the envelope contains an [address record][addr-records-rfc], +those records might be used to populate a peer store with self-certified +records. Rather than requiring the peer store to persist the full envelope, we +could have a separate "envelope storage" service that keeps signed messages +around for future reference. + +The peer store can then just store the `cid` alongside a flag that indicates +that the address came from a trusted source. If we're using a persistent peer +store and the process restarts, we can look up the stored `cid` in the envelope +storage and verify the signature again. + +If we decide to build this, the storage service should have some kind of garbage +collection / TTL scheme to avoid unbounded growth. + +[addr-records-rfc]: ./0003-address-records.md +[peer-id-spec]: ../peer-ids/peer-ids.md diff --git a/RFC/0003-address-records.md b/RFC/0003-address-records.md new file mode 100644 index 000000000..feaf63f0a --- /dev/null +++ b/RFC/0003-address-records.md @@ -0,0 +1,246 @@ +# RFC 0003 - Address Records with Metadata + +- Start Date: 2019-10-04 +- Related Issues: + - [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47) + - [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436) + +## Abstract + +This RFC proposes a method for distributing address records, which contain a +peer's publicly reachable listen addresses, as well as some metadata that can +help other peers categorize addresses and prioritize thme when dialing. + +The record described here does not include a signature, but it is expected to +be serialized and wrapped in a [signed envelope][envelope-rfc], which will +prove the identity of the issuing peer. The dialer can then prioritize +self-certified addresses over addresses from an unknown origin. + +## Problem Statement + +All libp2p peers keep a "peer store" (called a peer book in some +implementations), which maps [peer ids][peer-id-spec] to a set of known +addresses for each peer. When the application layer wants to contact a peer, the +dialer will pull addresses from the peer store and try to initiate a connection +on one or more addresses. + +Addresses for a peer can come from a variety of sources. If we have already made +a connection to a peer, the libp2p [identify protocol][identify-spec] will +inform us of other addresses that they are listening on. We may also discover +their address by querying the DHT, checking a fixed "bootstrap list", or perhaps +through a pubsub message or an application-specific protocol. + +In the case of the identify protocol, we can be fairly certain that the +addresses originate from the peer we're speaking to, assuming that we're using a +secure, authenticated communication channel. However, more "ambient" discovery +methods such as DHT traversal and pubsub depend on potentially untrustworthy +third parties to relay address information. + +Even in the case of receiving addresses via the identify protocol, our +confidence that the address came directly from the peer is not actionable, because +the peer store does not track the origin of an address. Once added to the peer +store, all addresses are considered equally valid, regardless of their source. + +We would like to have a means of distributing _verifiable_ address records, +which we can prove originated from the addressed peer itself. We also need a way to +track the "provenance" of an address within libp2p's internal components such as +the peer store. Once those pieces are in place, we will also need a way to +prioritize addresses based on their authenticity, with the most strict strategy +being to only dial certified addresses. + +### Complications + +While producing a signed record is fairly trivial, there are a few aspects to +this problem that complicate things. + +1. Addresses are not static. A given peer may have several addresses at any given + time, and the set of addresses can change at arbitrary times. +2. Peers may not know their own addresses. It's often impossible to automatically + infer one's own public address, and peers may need to rely on third party + peers to inform them of their observed public addresses. +3. A peer may inadvertently or maliciously sign an address that they do not + control. In other words, a signature isn't a guarantee that a given address is + valid. +4. Some addresses may be ambiguous. For example, addresses on a private subnet + are valid within that subnet but are useless on the public internet. + +The first point implies that the address record should include some kind of +temporal component, so that newer records can replace older ones as the state +changes over time. This could be a timestamp and/or a simple sequence number +that each node increments whenever they publish a new record. + +The second and third points highlight the limits of certifying information that +is itself uncertain. While a signature can prove that the addresses originated +from the peer, it cannot prove that the addresses are correct or useful. Given +the asymmetric nature of real-world NATs, it's often the case that a peer is +_less likely_ to have correct information about its own address than an outside +observer, at least initially. + +This suggests that we should include some measure of "confidence" in our +records, so that peers can distribute addresses that they are not fully certain +are correct, while still asserting that they created the record. For example, +when requesting a dial-back via the [AutoNAT service][autonat], a peer could +send a "provisional" address record. When the AutoNAT peer confirms the address, +that address could be marked as confirmed and advertised in a new record. + +Regarding the fourth point about ambiguous addresses, it would also be desirable +for the address record to include a notion of "routability," which would +indicate how "accessible" the address is likely to be. This would allow us to +mark an address as "LAN-only," if we know that it is not mapped to a publicly +reachable address but would still like to distribute it to local peers. + +## Address Record Format + +Here's a protobuf that might work: + +```protobuf +// Routability indicates the "scope" of an address, meaning how visible +// or accessible it is. This allows us to distinguish between LAN and +// WAN addresses. +// +// Side Note: we could potentially have a GLOBAL_RELAY case, which would +// make it easy to prioritize non-relay addresses in the dialer. Bit of +// a mix of concerns though. +enum Routability { + // catch-all default / unknown scope + UNKNOWN = 1; + + // another process on the same machine + LOOPBACK = 2; + + // a local area network + LOCAL = 3; + + // public internet + GLOBAL = 4; + + // reserved for future use + INTERPLANETARY = 100; +} + + +// Confidence indicates how much we believe in the validity of the +// address. +enum Confidence { + // default, unknown confidence. we don't know one way or another + UNKNOWN = 1; + + // INVALID means we know that this address is invalid and should be deleted + INVALID = 2; + + // UNCONFIRMED means that we suspect this address is valid, but we haven't + // fully confirmed that we're reachable. + UNCONFIRMED = 3; + + // CONFIRMED means that we fully believe this address is valid. + // Each node / implementation can have their own criteria for confirmation. + CONFIRMED = 4; +} + +// AddressInfo is a multiaddr plus some metadata. +message AddressInfo { + bytes multiaddr = 1; + Routability routability = 2; + Confidence confidence = 3; +} + +// AddressState contains the listen addresses (and their metadata) +// for a peer at a particular point in time. +// +// Although this record contains a wall-clock `issuedAt` timestamp, +// there are no guarantees about node clocks being in sync or correct. +// As such, the `issuedAt` field should be considered informational, +// and `version` should be preferred when ordering records. +message AddressState { + // the peer id of the subject of the record. + bytes subjectPeer = 1; + + // `version` is an increment-only counter that can be used to + // order AddressState records chronologically. Newer records + // MUST have a higher `version` than older records, but there + // can be gaps between version numbers. + uint64 version = 2; + + // The `issuedAt` timestamp stores the creation time of this record in + // seconds from the unix epoch, according to the issuer's clock. There + // are no guarantees about clock sync or correctness. SHOULD NOT be used + // to order AddressState records; use `seqno` instead. + uint64 issuedAt = 3; + + // All current listen addresses and their metadata. + repeated AddressInfo addresses = 4; +} +``` + +The idea with the structure above is that you send some metadata along with your +addresses: your "routability", and your own confidence in the validity of the +address. This is wrapped in an `AddressInfo` struct along with the address +itself. + +Then you have a big list of `AddressInfo`s, which we put in an `AddressState`. +An `AddressState` identifies the `subject` of the record, + + +#### Example + +Here's an example. Alice has an address that she thinks is publicly reachable +but has not confirmed. She also has a LAN-local address that she knows is valid, +but not routable via the public internet: + +```javascript + { + subjectPeer: "QmAlice...", + version: 23456, + issuedAt: 1570215229, + + addresses: [ + { + addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice", + routability: "GLOBAL", + confidence: "UNCONFIRMED" + }, + { + addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice", + routability: "LOCAL", + confidence: "CONFIRMED" + } + ] + } +``` + +If Alice wants to publish her address to a public shared resource like a DHT, +she should omit `LOCAL` and other unreachable addresses, and peers should +likewise filter out `LOCAL` addresses from public sources. + +## Certification / Verification + +This structure can be contained in a [signed envelope][envelope-rfc], which lets +us issue "self-certified" address records that are signed by the `subjectPeer`. + +## Peer Store APIs + + + +## Dialing Strategies + + +## TODO + +Some things I'd like to cover but haven't got to or figured out yet: + +- how to store signed records + - should be separate from "working set" that's optimized for retrieval + - need to store unaltered bytes +- how to surface routability and confidence via peerstore APIs +- figure out if IPLD is the way to go here. If not, what serialization format, + etc. +- extend identify protocol to include signed records? +- how are addresses prioritized when dialing? + + +[identify-spec]: ../identify/README.md +[peer-id-spec]: ../peer-ids/peer-ids.md +[autonat]: https://github.com/libp2p/specs/issues/180 +[ipld]: https://ipld.io/ +[ipld-schema-schema]: https://github.com/ipld/specs/blob/master/schemas/schema-schema.ipldsch +[envelope-rfc]: ./0002-signed-envelopes.md From 59f660b55d2ba64ea0b37bbe101917e8a4231359 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Mon, 21 Oct 2019 11:53:47 -0400 Subject: [PATCH 04/17] wip discussion of peerstore API changes --- RFC/0002-signed-address-records.md | 251 ----------------------------- RFC/0003-address-records.md | 47 ++++-- 2 files changed, 34 insertions(+), 264 deletions(-) delete mode 100644 RFC/0002-signed-address-records.md diff --git a/RFC/0002-signed-address-records.md b/RFC/0002-signed-address-records.md deleted file mode 100644 index 063016dc9..000000000 --- a/RFC/0002-signed-address-records.md +++ /dev/null @@ -1,251 +0,0 @@ -# RFC 0002 - Signed Address Records - -- Start Date: 2019-10-04 -- Related Issues: - - [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47) - - [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436) - -## Abstract - -This RFC proposes a method for distributing _self-certified_ address records, -which contain a peer's publicly reachable listen addresses. The record also -includes a signature, which proves that the record was produced by the peer -itself and not tampered with in transit. - -## Problem Statement - -All libp2p peers keep a "peer store" (called a peer book in some -implementations), which maps [peer ids][peer-id-spec] to a set of known -addresses for each peer. When the application layer wants to contact a peer, the -dialer will pull addresses from the peer store and try to initiate a connection -on one or more addresses. - -Addresses for a peer can come from a variety of sources. If we have already made -a connection to a peer, the libp2p [identify protocol][identify-spec] will -inform us of other addresses that they are listening on. We may also discover -their address by querying the DHT, checking a fixed "bootstrap list", or perhaps -through a pubsub message or an application-specific protocol. - -In the case of the identify protocol, we can be fairly certain that the -addresses originate from the peer we're speaking to, assuming that we're using a -secure, authenticated communication channel. However, more "ambient" discovery -methods such as DHT traversal and pubsub depend on potentially untrustworthy -third parties to relay address information. - -Even in the case of receiving addresses via the identify protocol, our -confidence that the address came directly from the peer is not actionable, because -the peer store does not track the origin of an address. Once added to the peer -store, all addresses are considered equally valid, regardless of their source. - -We would like to have a means of distributing _verifiable_ address records, -which we can prove originated from the addressed peer itself. We also need a way to -track the "provenance" of an address within libp2p's internal components such as -the peer store. Once those pieces are in place, we will also need a way to -prioritize addresses based on their authenticity, with the most strict strategy -being to only dial certified addresses. - -### Complications - -While producing a signed record is fairly trivial, there are a few aspects to -this problem that complicate things. - -1. Addresses are not static. A given peer may have several addresses at any given - time, and the set of addresses can change at arbitrary times. -2. Peers may not know their own addresses. It's often impossible to automatically - infer one's own public address, and peers may need to rely on third party - peers to inform them of their observed public addresses. -3. A peer may inadvertently or maliciously sign an address that they do not - control. In other words, a signature isn't a guarantee that a given address is - valid. -4. Some addresses may be ambiguous. For example, addresses on a private subnet - are valid within that subnet but are useless on the public internet. - -The first point implies that the address record should include some kind of -temporal component, so that newer records can replace older ones as the state -changes over time. This could be a timestamp and/or a simple sequence number -that each node increments whenever they publish a new record. - -The second and third points highlight the limits of certifying information that -is itself uncertain. While a signature can prove that the addresses originated -from the peer, it cannot prove that the addresses are correct or useful. Given -the asymmetric nature of real-world NATs, it's often the case that a peer is -_less likely_ to have correct information about its own address than an outside -observer, at least initially. - -This suggests that we should include some measure of "confidence" in our -records, so that peers can distribute addresses that they are not fully certain -are correct, while still asserting that they created the record. For example, -when requesting a dial-back via the [AutoNAT service][autonat], a peer could -send a "provisional" address record. When the AutoNAT peer confirms the address, -that address could be marked as confirmed and advertised in a new record. - -Regarding the fourth point about ambiguous addresses, it would also be desirable -for the address record to include a notion of "routability," which would -indicate how "accessible" the address is likely to be. This would allow us to -mark an address as "LAN-only," if we know that it is not mapped to a publicly -reachable address but would still like to distribute it to local peers. - -## Address Record Format - -Here's a protobuf that might work: - -```protobuf -// Routability indicates the "scope" of an address, meaning how visible -// or accessible it is. This allows us to distinguish between LAN and -// WAN addresses. -// -// Side Note: we could potentially have a GLOBAL_RELAY case, which would -// make it easy to prioritize non-relay addresses in the dialer. Bit of -// a mix of concerns though. -enum Routability { - // catch-all default / unknown scope - UNKNOWN = 1; - - // another process on the same machine - LOOPBACK = 2; - - // a local area network - LOCAL = 3; - - // public internet - GLOBAL = 4; - - // reserved for future use - INTERPLANETARY = 100; -} - - -// Confidence indicates how much we believe in the validity of the -// address. -enum Confidence { - // default, unknown confidence. we don't know one way or another - UNKNOWN = 1; - - // INVALID means we know that this address is invalid and should be deleted - INVALID = 2; - - // UNCONFIRMED means that we suspect this address is valid, but we haven't - // fully confirmed that we're reachable. - UNCONFIRMED = 3; - - // CONFIRMED means that we fully believe this address is valid. - // Each node / implementation can have their own criteria for confirmation. - CONFIRMED = 4; -} - -// AddressInfo is a multiaddr plus some metadata. -message AddressInfo { - bytes multiaddr = 1; - Routability routability = 2; - Confidence confidence = 3; -} - -// AddressState contains the listen addresses (and their metadata) -// for a peer at a particular point in time. -// -// Although this record contains a wall-clock `issuedAt` timestamp, -// there are no guarantees about node clocks being in sync or correct. -// As such, the `issuedAt` field should be considered informational, -// and `seqno` should be preferred when ordering records. -message AddressState { - // the peer id of the subject of the record. - bytes subjectPeer = 1; - - // `seqno` is an increment-only counter that can be used to - // order AddressState records chronologically. Newer records - // MUST have a higher `seqno` than older records, but there - // can be gaps between sequence numbers. - uint64 seqno = 2; - - // The `issuedAt` timestamp stores the creation time of this record in - // seconds from the unix epoch, according to the issuer's clock. There - // are no guarantees about clock sync or correctness. SHOULD NOT be used - // to order AddressState records; use `seqno` instead. - uint64 issuedAt = 3; - - // All current listen addresses and their metadata. - repeated AddressInfo addresses = 4; -} -``` - -The idea with the structure above is that you send some metadata along with your -addresses: your "routability", and your own confidence in the validity of the -address. This is wrapped in an `AddressInfo` struct along with the address -itself. - -Then you have a big list of `AddressInfo`s, which we put in an `AddressState`. -An `AddressState` identifies the `subject` of the record, - -### TODO: rewrite this to use generic envelope - -The state and a signature of it are wrapped in an `AddressEnvelope`, along with -the public key that produced the signature. Recipients must validate that the -public key is consistent with the peer id of the `subject` and validate the sig. - -Here's an example. Alice has an address that she thinks is publicly reachable -but has not confirmed. She also has a LAN-local address that she knows is valid, -but not routable via the public internet: - -```javascript - { - - pubkey: "", - state: { - subject: { - peer: "QmAlice...", - version: 23456 - }, - issuedAt: 1570215229, - - addresses: [ - { - addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice", - routability: "GLOBAL", - confidence: "UNCONFIRMED" - }, - { - addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice", - routability: "LOCAL", - confidence: "CONFIRMED" - } - ] - }, - sig: "" - } -``` - -If Alice wants to publish her address to a public shared resource like a DHT, -she should omit `LOCAL` and other unreachable addresses, and peers should -likewise filter out `LOCAL` addresses from public sources. - -## Signature Production & Validation - -TK: describe signing and validating the `AddressState` structure. - - -## Peer Store APIs - - - -## Dialing Strategies - - -## TODO - -Some things I'd like to cover but haven't got to or figured out yet: - -- how to store signed records - - should be separate from "working set" that's optimized for retrieval - - need to store unaltered bytes -- how to surface routability and confidence via peerstore APIs -- figure out if IPLD is the way to go here. If not, what serialization format, - etc. -- extend identify protocol to include signed records? -- how are addresses prioritized when dialing? - - -[identify-spec]: ../identify/README.md -[peer-id-spec]: ../peer-ids/peer-ids.md -[autonat]: https://github.com/libp2p/specs/issues/180 -[ipld]: https://ipld.io/ -[ipld-schema-schema]: https://github.com/ipld/specs/blob/master/schemas/schema-schema.ipldsch diff --git a/RFC/0003-address-records.md b/RFC/0003-address-records.md index feaf63f0a..0af69109e 100644 --- a/RFC/0003-address-records.md +++ b/RFC/0003-address-records.md @@ -164,7 +164,7 @@ message AddressState { // The `issuedAt` timestamp stores the creation time of this record in // seconds from the unix epoch, according to the issuer's clock. There // are no guarantees about clock sync or correctness. SHOULD NOT be used - // to order AddressState records; use `seqno` instead. + // to order AddressState records; use `version` instead. uint64 issuedAt = 3; // All current listen addresses and their metadata. @@ -178,8 +178,10 @@ address. This is wrapped in an `AddressInfo` struct along with the address itself. Then you have a big list of `AddressInfo`s, which we put in an `AddressState`. -An `AddressState` identifies the `subject` of the record, - +An `AddressState` identifies the `subjectPeer`, which is the peer that the +record is about, to whom the addresses belong. It also includes a `version` +number, so that we can replace earlier `AddressState`s with newer ones, and a +timestamp for informational purposes. #### Example @@ -219,23 +221,42 @@ us issue "self-certified" address records that are signed by the `subjectPeer`. ## Peer Store APIs +This section is a WIP, and I'd love input. + +We need to figure out how to surface the address metadata in the peerstore APIs. +In go, extending the [`AddrInfo` +struct](https://github.com/libp2p/go-libp2p-core/blob/master/peer/addrinfo.go) +to include metadata seems like a decent place to start, and js likewise has +[js-peer-info](https://github.com/libp2p/js-peer-info) that could be extended. + +When storing this metadata internally, we may want to make a distinction between +the remote peer's confidence in an address and our own confidence; we may decide +an address is invalid when the remote peer thinks otherwise. One idea is to have +our local confidence just be a numeric score (for easy sorting) that takes the +remote peer's confidence value as an input. + +The go [AddrBook +interface](https://github.com/libp2p/go-libp2p-core/blob/master/peerstore/peerstore.go#L89) +would also need to be updated - it currently deals with "raw" multiaddrs, and +the only metadata exposed is a TTL for expiration. Changing this interface seems +like a fairly big refactor to me, especially with the implementation in another +repo. I'd love if some gophers could weigh in on a good way forward. ## Dialing Strategies +Once we're surfacing routability info alongside addresses, the dialer can decide +to optionally prioritize addresses it thinks are most likely to be reachable. We +can also add an option to only dial self-certified addresses, although that +likely won't be practical until self-certified addresses become commonplace. -## TODO +## Changes to core libp2p protocols -Some things I'd like to cover but haven't got to or figured out yet: +How to publish these to the DHT? Are the backward compatibility issues with +older unsigned address records? Maybe we just publish these to a different key +prefix... -- how to store signed records - - should be separate from "working set" that's optimized for retrieval - - need to store unaltered bytes -- how to surface routability and confidence via peerstore APIs -- figure out if IPLD is the way to go here. If not, what serialization format, - etc. -- extend identify protocol to include signed records? -- how are addresses prioritized when dialing? +Should we update identify and mDNS discovery to use signed records? [identify-spec]: ../identify/README.md From b8f1c5ee0b3f025bc8d459144121ec0efafd7e66 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Thu, 24 Oct 2019 11:12:17 -0400 Subject: [PATCH 05/17] domain separation, remove CID and local storage --- RFC/0002-signed-envelopes.md | 65 ++++++++++++++---------------------- 1 file changed, 25 insertions(+), 40 deletions(-) diff --git a/RFC/0002-signed-envelopes.md b/RFC/0002-signed-envelopes.md index 7a1bfcb21..e3d816557 100644 --- a/RFC/0002-signed-envelopes.md +++ b/RFC/0002-signed-envelopes.md @@ -20,6 +20,27 @@ would be nice to have an all-purpose data container that includes a signature of the data, so we can verify that the data came from a specific peer and that it hasn't been tampered with. +## Domain Separation + +Signatures can be used for a variety of purposes, and a signature made for a +specific purpose MUST NOT be considered valid for a different purpose. + +Without this property, an attacker could convince a peer to sign a paylod in one +context and present it as valid in another, for example, presenting a signed +address record as a pubsub message. + +We separate signatures into "domains" by prefixing the data to be signed with a +string unique to each domain. This string is not contained within the payload or +the outer envelope structure. Instead, each libp2p subystem that makes use of +signed envelopes will provide their own domain string when constructing the +envelope, and again when validating the envelope. If the domain string used to +validate is different from the one used to sign, the signature validation will +fail. + +Domain strings may be any valid UTF-8 string, but MUST NOT contain the `:` +character (UTF-8 code point `0x3A`), as this is used to separate the domain +string from the content when signing. + ## Wire Format Since we already have a [protobuf definition for public keys][peer-id-spec], we @@ -29,10 +50,8 @@ can use protobuf for this as well and easily embed the key in the envelope: ```protobuf message SignedEnvelope { PublicKey publicKey = 1; // see peer id spec for definition - string purpose = 2; // arbitrary user-defined string for context - bytes cid = 3; // CIDv1 of contents - bytes contents = 4; // payload - bytes signature = 5; // signature of purpose + cid + contents + bytes contents = 2; // payload + bytes signature = 3; // signature of domain string + contents } ``` @@ -40,27 +59,13 @@ The `publicKey` field contains the public key whose secret counterpart was used to sign the message. This MUST be consistent with the peer id of the signing peer, as the recipient will derive the peer id of the signer from this key. -The `purpose` field is an aribitrary string that can be used to give some hint -as to the contents. For example, if `contents` contains a serialized -`AddressState` record, `purpose` might contain the string `"AddressState"`. The -contents of the ``purpose`` field are signed alongside `contents` to prevent -tampering, and may be empty if desired. - -The `cid` field contains a version 1 [CID][cid] (content id) that corresponds to -the `content` field. It's used for retrieving messages from [local -storage](#local-storage-of-signed-envelopes), and the embedded multicodec also -gives a hint as to the data type of the `contents`. If the user does not specify -a multicodec when constructing the envelope, the default will be -[`raw`](https://github.com/multiformats/multicodec/blob/master/table.csv#L34) -for raw binary. ## Signature Production / Verification When signing, a peer will prepare a buffer by concatenating the following: -- The string `"libp2p-signed-envelope:"`, encoded as UTF-8 -- The `purpose` field, encoded as UTF-8 -- The `cid` field +- The [domain separation string](#domain-separation), encoded as UTF-8 +- The UTF-8 encoded `:` character - The `contents` field Then they will sign the buffer according to the rules in the [peer id @@ -70,25 +75,5 @@ To verify, a peer will "inflate" the `publicKey` into a domain object that can verify signatures, prepare a buffer as above and verify the `signature` field against it. -## Local Storage of Signed Envelopes - -Signed envelopes can be used for ephemeral data, but we may also want to persist -them for a while and / or make previously recieved envelopes accesible to -various libp2p modules. - -For example, if the envelope contains an [address record][addr-records-rfc], -those records might be used to populate a peer store with self-certified -records. Rather than requiring the peer store to persist the full envelope, we -could have a separate "envelope storage" service that keeps signed messages -around for future reference. - -The peer store can then just store the `cid` alongside a flag that indicates -that the address came from a trusted source. If we're using a persistent peer -store and the process restarts, we can look up the stored `cid` in the envelope -storage and verify the signature again. - -If we decide to build this, the storage service should have some kind of garbage -collection / TTL scheme to avoid unbounded growth. - [addr-records-rfc]: ./0003-address-records.md [peer-id-spec]: ../peer-ids/peer-ids.md From 107ddde2842d073695b96eab59805379fd081dc5 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Fri, 1 Nov 2019 17:04:42 -0400 Subject: [PATCH 06/17] add type hints, length-prefix sig components --- RFC/0002-signed-envelopes.md | 50 ++++++++++++++++++++++++++++-------- 1 file changed, 39 insertions(+), 11 deletions(-) diff --git a/RFC/0002-signed-envelopes.md b/RFC/0002-signed-envelopes.md index e3d816557..74ef9b8e3 100644 --- a/RFC/0002-signed-envelopes.md +++ b/RFC/0002-signed-envelopes.md @@ -25,21 +25,33 @@ been tampered with. Signatures can be used for a variety of purposes, and a signature made for a specific purpose MUST NOT be considered valid for a different purpose. -Without this property, an attacker could convince a peer to sign a paylod in one -context and present it as valid in another, for example, presenting a signed +Without this property, an attacker could convince a peer to sign a payload in +one context and present it as valid in another, for example, presenting a signed address record as a pubsub message. We separate signatures into "domains" by prefixing the data to be signed with a string unique to each domain. This string is not contained within the payload or -the outer envelope structure. Instead, each libp2p subystem that makes use of +the outer envelope structure. Instead, each libp2p subsystem that makes use of signed envelopes will provide their own domain string when constructing the envelope, and again when validating the envelope. If the domain string used to validate is different from the one used to sign, the signature validation will fail. -Domain strings may be any valid UTF-8 string, but MUST NOT contain the `:` -character (UTF-8 code point `0x3A`), as this is used to separate the domain -string from the content when signing. +Domain strings may be any valid UTF-8 string, but should be fairly short and +descriptive of their use case, for example `"libp2p-routing-record"`. + +## Type Hinting + +The envelope record can contain an arbitrary byte string payload, which will +need to be interpreted in the context of a specific use case. To assist in +"hydrating" the payload into an appropriate domain object, we include a "type +hint" field. The type hint consists of a [multicodec][multicodec] code, +optionally followed by an arbitrary byte sequence. + +This allows very compact type hints that contain just a multicodec, as well as +"path" multicodecs of the form `/some/thing`, using the ["namespace" +multicodec](https://github.com/multiformats/multicodec/blob/master/table.csv#L23), +whose binary value is equivalent to the UTF-8 `/` character. ## Wire Format @@ -50,8 +62,9 @@ can use protobuf for this as well and easily embed the key in the envelope: ```protobuf message SignedEnvelope { PublicKey publicKey = 1; // see peer id spec for definition - bytes contents = 2; // payload - bytes signature = 3; // signature of domain string + contents + bytes typeHint = 2; // type hint + bytes contents = 3; // payload + bytes signature = 4; // see below for signing rules } ``` @@ -59,14 +72,27 @@ The `publicKey` field contains the public key whose secret counterpart was used to sign the message. This MUST be consistent with the peer id of the signing peer, as the recipient will derive the peer id of the signer from this key. +The `typeHint` field contains a [multicodec][multicodec]-prefixed type hint as +described in the [Type Hinting section](#type-hinting). + +The `contents` field contains the arbitrary byte string payload. + +The `signature` field contains a signature of all fields except `publicKey`, +generated as described below. ## Signature Production / Verification When signing, a peer will prepare a buffer by concatenating the following: -- The [domain separation string](#domain-separation), encoded as UTF-8 -- The UTF-8 encoded `:` character -- The `contents` field +- The length of the [domain separation string](#domain-separation) string in + bytes, encoded as an [unsigned varint][uvarint] +- The domain separation string, encoded as UTF-8 +- The length of the `typeHint` field in bytes, encoded as an [unsigned + varint][uvarint] +- The value of the `typeHint` field +- The length of the `contents` field in bytes, encoded as an [unsigned + varint][uvarint] +- The value of the `contents` field Then they will sign the buffer according to the rules in the [peer id spec][peer-id-spec] and set the `signature` field accordingly. @@ -77,3 +103,5 @@ against it. [addr-records-rfc]: ./0003-address-records.md [peer-id-spec]: ../peer-ids/peer-ids.md +[multicodec]: https://github.com/multiformats/multicodec +[uvarint]: https://github.com/multiformats/unsigned-varint From cba046fd4270038b9792d2833b9945407d81c045 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Fri, 1 Nov 2019 17:20:12 -0400 Subject: [PATCH 07/17] trim scope & rename to "routing records" --- RFC/0003-address-records.md | 267 ------------------------------------ RFC/0003-routing-records.md | 237 ++++++++++++++++++++++++++++++++ 2 files changed, 237 insertions(+), 267 deletions(-) delete mode 100644 RFC/0003-address-records.md create mode 100644 RFC/0003-routing-records.md diff --git a/RFC/0003-address-records.md b/RFC/0003-address-records.md deleted file mode 100644 index 0af69109e..000000000 --- a/RFC/0003-address-records.md +++ /dev/null @@ -1,267 +0,0 @@ -# RFC 0003 - Address Records with Metadata - -- Start Date: 2019-10-04 -- Related Issues: - - [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47) - - [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436) - -## Abstract - -This RFC proposes a method for distributing address records, which contain a -peer's publicly reachable listen addresses, as well as some metadata that can -help other peers categorize addresses and prioritize thme when dialing. - -The record described here does not include a signature, but it is expected to -be serialized and wrapped in a [signed envelope][envelope-rfc], which will -prove the identity of the issuing peer. The dialer can then prioritize -self-certified addresses over addresses from an unknown origin. - -## Problem Statement - -All libp2p peers keep a "peer store" (called a peer book in some -implementations), which maps [peer ids][peer-id-spec] to a set of known -addresses for each peer. When the application layer wants to contact a peer, the -dialer will pull addresses from the peer store and try to initiate a connection -on one or more addresses. - -Addresses for a peer can come from a variety of sources. If we have already made -a connection to a peer, the libp2p [identify protocol][identify-spec] will -inform us of other addresses that they are listening on. We may also discover -their address by querying the DHT, checking a fixed "bootstrap list", or perhaps -through a pubsub message or an application-specific protocol. - -In the case of the identify protocol, we can be fairly certain that the -addresses originate from the peer we're speaking to, assuming that we're using a -secure, authenticated communication channel. However, more "ambient" discovery -methods such as DHT traversal and pubsub depend on potentially untrustworthy -third parties to relay address information. - -Even in the case of receiving addresses via the identify protocol, our -confidence that the address came directly from the peer is not actionable, because -the peer store does not track the origin of an address. Once added to the peer -store, all addresses are considered equally valid, regardless of their source. - -We would like to have a means of distributing _verifiable_ address records, -which we can prove originated from the addressed peer itself. We also need a way to -track the "provenance" of an address within libp2p's internal components such as -the peer store. Once those pieces are in place, we will also need a way to -prioritize addresses based on their authenticity, with the most strict strategy -being to only dial certified addresses. - -### Complications - -While producing a signed record is fairly trivial, there are a few aspects to -this problem that complicate things. - -1. Addresses are not static. A given peer may have several addresses at any given - time, and the set of addresses can change at arbitrary times. -2. Peers may not know their own addresses. It's often impossible to automatically - infer one's own public address, and peers may need to rely on third party - peers to inform them of their observed public addresses. -3. A peer may inadvertently or maliciously sign an address that they do not - control. In other words, a signature isn't a guarantee that a given address is - valid. -4. Some addresses may be ambiguous. For example, addresses on a private subnet - are valid within that subnet but are useless on the public internet. - -The first point implies that the address record should include some kind of -temporal component, so that newer records can replace older ones as the state -changes over time. This could be a timestamp and/or a simple sequence number -that each node increments whenever they publish a new record. - -The second and third points highlight the limits of certifying information that -is itself uncertain. While a signature can prove that the addresses originated -from the peer, it cannot prove that the addresses are correct or useful. Given -the asymmetric nature of real-world NATs, it's often the case that a peer is -_less likely_ to have correct information about its own address than an outside -observer, at least initially. - -This suggests that we should include some measure of "confidence" in our -records, so that peers can distribute addresses that they are not fully certain -are correct, while still asserting that they created the record. For example, -when requesting a dial-back via the [AutoNAT service][autonat], a peer could -send a "provisional" address record. When the AutoNAT peer confirms the address, -that address could be marked as confirmed and advertised in a new record. - -Regarding the fourth point about ambiguous addresses, it would also be desirable -for the address record to include a notion of "routability," which would -indicate how "accessible" the address is likely to be. This would allow us to -mark an address as "LAN-only," if we know that it is not mapped to a publicly -reachable address but would still like to distribute it to local peers. - -## Address Record Format - -Here's a protobuf that might work: - -```protobuf -// Routability indicates the "scope" of an address, meaning how visible -// or accessible it is. This allows us to distinguish between LAN and -// WAN addresses. -// -// Side Note: we could potentially have a GLOBAL_RELAY case, which would -// make it easy to prioritize non-relay addresses in the dialer. Bit of -// a mix of concerns though. -enum Routability { - // catch-all default / unknown scope - UNKNOWN = 1; - - // another process on the same machine - LOOPBACK = 2; - - // a local area network - LOCAL = 3; - - // public internet - GLOBAL = 4; - - // reserved for future use - INTERPLANETARY = 100; -} - - -// Confidence indicates how much we believe in the validity of the -// address. -enum Confidence { - // default, unknown confidence. we don't know one way or another - UNKNOWN = 1; - - // INVALID means we know that this address is invalid and should be deleted - INVALID = 2; - - // UNCONFIRMED means that we suspect this address is valid, but we haven't - // fully confirmed that we're reachable. - UNCONFIRMED = 3; - - // CONFIRMED means that we fully believe this address is valid. - // Each node / implementation can have their own criteria for confirmation. - CONFIRMED = 4; -} - -// AddressInfo is a multiaddr plus some metadata. -message AddressInfo { - bytes multiaddr = 1; - Routability routability = 2; - Confidence confidence = 3; -} - -// AddressState contains the listen addresses (and their metadata) -// for a peer at a particular point in time. -// -// Although this record contains a wall-clock `issuedAt` timestamp, -// there are no guarantees about node clocks being in sync or correct. -// As such, the `issuedAt` field should be considered informational, -// and `version` should be preferred when ordering records. -message AddressState { - // the peer id of the subject of the record. - bytes subjectPeer = 1; - - // `version` is an increment-only counter that can be used to - // order AddressState records chronologically. Newer records - // MUST have a higher `version` than older records, but there - // can be gaps between version numbers. - uint64 version = 2; - - // The `issuedAt` timestamp stores the creation time of this record in - // seconds from the unix epoch, according to the issuer's clock. There - // are no guarantees about clock sync or correctness. SHOULD NOT be used - // to order AddressState records; use `version` instead. - uint64 issuedAt = 3; - - // All current listen addresses and their metadata. - repeated AddressInfo addresses = 4; -} -``` - -The idea with the structure above is that you send some metadata along with your -addresses: your "routability", and your own confidence in the validity of the -address. This is wrapped in an `AddressInfo` struct along with the address -itself. - -Then you have a big list of `AddressInfo`s, which we put in an `AddressState`. -An `AddressState` identifies the `subjectPeer`, which is the peer that the -record is about, to whom the addresses belong. It also includes a `version` -number, so that we can replace earlier `AddressState`s with newer ones, and a -timestamp for informational purposes. - -#### Example - -Here's an example. Alice has an address that she thinks is publicly reachable -but has not confirmed. She also has a LAN-local address that she knows is valid, -but not routable via the public internet: - -```javascript - { - subjectPeer: "QmAlice...", - version: 23456, - issuedAt: 1570215229, - - addresses: [ - { - addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice", - routability: "GLOBAL", - confidence: "UNCONFIRMED" - }, - { - addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice", - routability: "LOCAL", - confidence: "CONFIRMED" - } - ] - } -``` - -If Alice wants to publish her address to a public shared resource like a DHT, -she should omit `LOCAL` and other unreachable addresses, and peers should -likewise filter out `LOCAL` addresses from public sources. - -## Certification / Verification - -This structure can be contained in a [signed envelope][envelope-rfc], which lets -us issue "self-certified" address records that are signed by the `subjectPeer`. - -## Peer Store APIs - -This section is a WIP, and I'd love input. - -We need to figure out how to surface the address metadata in the peerstore APIs. - -In go, extending the [`AddrInfo` -struct](https://github.com/libp2p/go-libp2p-core/blob/master/peer/addrinfo.go) -to include metadata seems like a decent place to start, and js likewise has -[js-peer-info](https://github.com/libp2p/js-peer-info) that could be extended. - -When storing this metadata internally, we may want to make a distinction between -the remote peer's confidence in an address and our own confidence; we may decide -an address is invalid when the remote peer thinks otherwise. One idea is to have -our local confidence just be a numeric score (for easy sorting) that takes the -remote peer's confidence value as an input. - -The go [AddrBook -interface](https://github.com/libp2p/go-libp2p-core/blob/master/peerstore/peerstore.go#L89) -would also need to be updated - it currently deals with "raw" multiaddrs, and -the only metadata exposed is a TTL for expiration. Changing this interface seems -like a fairly big refactor to me, especially with the implementation in another -repo. I'd love if some gophers could weigh in on a good way forward. - -## Dialing Strategies - -Once we're surfacing routability info alongside addresses, the dialer can decide -to optionally prioritize addresses it thinks are most likely to be reachable. We -can also add an option to only dial self-certified addresses, although that -likely won't be practical until self-certified addresses become commonplace. - -## Changes to core libp2p protocols - -How to publish these to the DHT? Are the backward compatibility issues with -older unsigned address records? Maybe we just publish these to a different key -prefix... - -Should we update identify and mDNS discovery to use signed records? - - -[identify-spec]: ../identify/README.md -[peer-id-spec]: ../peer-ids/peer-ids.md -[autonat]: https://github.com/libp2p/specs/issues/180 -[ipld]: https://ipld.io/ -[ipld-schema-schema]: https://github.com/ipld/specs/blob/master/schemas/schema-schema.ipldsch -[envelope-rfc]: ./0002-signed-envelopes.md diff --git a/RFC/0003-routing-records.md b/RFC/0003-routing-records.md new file mode 100644 index 000000000..d4e1c5884 --- /dev/null +++ b/RFC/0003-routing-records.md @@ -0,0 +1,237 @@ +# RFC 0003 - Peer Routing Records + +- Start Date: 2019-10-04 +- Related Issues: + - [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47) + - [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436) + +## Abstract + +This RFC proposes a method for distributing peer routing records, which contain +a peer's publicly reachable listen addresses, and may be extended in the future +to contain additional metadata relevant to routing. This serves a similar +purpose to [Ethereum Node Records][eip-778]. Like ENR records, libp2p routing +records should be extensible, so that we can add information relevant to as-yet +unknown use cases. + +The record described here does not include a signature, but it is expected to +be serialized and wrapped in a [signed envelope][envelope-rfc], which will +prove the identity of the issuing peer. The dialer can then prioritize +self-certified addresses over addresses from an unknown origin. + +## Problem Statement + +All libp2p peers keep a "peer store", which maps [peer ids][peer-id-spec] to a +set of known addresses for each peer. When the application layer wants to +contact a peer, the dialer will pull addresses from the peer store and try to +initiate a connection on one or more addresses. + +Addresses for a peer can come from a variety of sources. If we have already made +a connection to a peer, the libp2p [identify protocol][identify-spec] will +inform us of other addresses that they are listening on. We may also discover +their address by querying the DHT, checking a fixed "bootstrap list", or perhaps +through a pubsub message or an application-specific protocol. + +In the case of the identify protocol, we can be fairly certain that the +addresses originate from the peer we're speaking to, assuming that we're using a +secure, authenticated communication channel. However, more "ambient" discovery +methods such as DHT traversal and pubsub depend on potentially untrustworthy +third parties to relay address information. + +Even in the case of receiving addresses via the identify protocol, our +confidence that the address came directly from the peer is not actionable, because +the peer store does not track the origin of an address. Once added to the peer +store, all addresses are considered equally valid, regardless of their source. + +We would like to have a means of distributing _verifiable_ address records, +which we can prove originated from the addressed peer itself. We also need a way to +track the "provenance" of an address within libp2p's internal components such as +the peer store. Once those pieces are in place, we will also need a way to +prioritize addresses based on their authenticity, with the most strict strategy +being to only dial certified addresses. + +### Complications + +While producing a signed record is fairly trivial, there are a few aspects to +this problem that complicate things. + +1. Addresses are not static. A given peer may have several addresses at any given + time, and the set of addresses can change at arbitrary times. +2. Peers may not know their own addresses. It's often impossible to automatically + infer one's own public address, and peers may need to rely on third party + peers to inform them of their observed public addresses. +3. A peer may inadvertently or maliciously sign an address that they do not + control. In other words, a signature isn't a guarantee that a given address is + valid. +4. Some addresses may be ambiguous. For example, addresses on a private subnet + are valid within that subnet but are useless on the public internet. + +The first point can be addressed by having records contain a sequence number +that increases monotonically when new records are issued, and by having newer +records replace older ones. + +The other points, while worth thinking about, are out of scope for this RFC. +However, we can take care to make our records extensible so that we can add +additional metadata in the future. Some thoughts along these lines are in the +[Future Work section below](#future-work). + +## Address Record Format + +Here's a protobuf that might work: + +```protobuf + +// RoutingRecord contains the listen addresses for a peer at a particular point in time. +message RoutingRecord { + // AddressInfo wraps a multiaddr. In the future, it may be extended to + // contain additional metadata, such as "routability" (whether an address is + // local or global, etc). + message AddressInfo { + bytes multiaddr = 1; + } + + // the peer id of the subject of the record (who these addresses belong to). + bytes subjectPeer = 1; + + // A monotonically increasing sequence number, used for record ordering. + uint64 seq = 2; + + // All current listen addresses + repeated AddressInfo addresses = 4; +} +``` + +The `AddressInfo` wrapper message is used instead of a bare multiaddr to allow +us to extend addresses with additional metadata [in the future](#future-work). + +The `seq` field contains a sequence number that MUST increase monotonically as +new records are created. Newer records MUST have a higher `seq` value than older +records. To avoid persisting state across restarts, implementations MAY use unix +epoch time as the `seq` value, however they MUST NOT attempt to interpret a +`seq` value from another peer as a valid timestamp. + +#### Example + +```javascript + { + subjectPeer: "QmAlice...", + seq: 1570215229, + + addresses: [ + { + addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice", + }, + { + addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice", + } + ] + } +``` + + +## Certification / Verification + +This structure can be contained in a [signed envelope][envelope-rfc], which lets +us issue "self-certified" address records that are signed by the `subjectPeer`. + +To produce a "self-certified" address, a peer will construct a `RoutingRecord` +containing all of their publicly-reachable listen addresses. A peer SHOULD only +include addresses that it believes are routable via the public internet, ideally +having confirmed that this is the case via some external mechanism such as a +successful AutoNAT dial-back. + +In some cases we may want to include localhost or LAN-local address; for +example, when testing the DHT using many processes on a single machine. To +support this, implementations may use a global runtime configuration flag or +environment variable to control whether local addresses will be included. + +Once the `RoutingRecord` has been constructed, it should be serialized to a byte +string and wrapped in a [signed envelope][envelope-rfc]. The `publicKey` field +of the envelope MUST be consistent with the `subjectPeer` peer id for the record +to be considered valid. + +### Signed Envelope Domain + +Signed envelopes require a "domain separation" string that defines the "scope" +or purpose of a signature. + +When wrapping a `RoutingRecord` in a signed envelope, the domain string MUST be +`libp2p-routing-record`. + +### Signed Envelope Type Hint + +Signed envelopes contain a "type hint" that indicates how to interpret the +contents of the envelope. + +Ideally, we should define a new multicodec for routing records, so that we can +identify them in a few bytes. While we're still spec'ing and working on the +initial implementation, we can use the UTF-8 string ``"/libp2p/routing-record"` +as the type hint value. + +## Peer Store APIs + +We will need to add a few methods to the peer store: + +- `AddCertifiedAddrs(envelope) -> Maybe` + - Add a self-certified address, wrapped in a signed envelope. This should + validate the envelope signature & store the envelope for future reference. + If any certified addresses already exist for the peer, only accept the new + envelope if it has a greater `seq` value than existing envelopes. + +- `CertifiedAddrs(peerId) -> Set` + - return the set of self-certified addresses for the given peer id + +And possibly: + +- `IsCertified(peerId, multiaddr) -> Boolean` + - has a particular address been self-certified by the given peer? + + +We'll also need a method that constructs a new `RoutingRecord` containing our +listen address and wraps it in a signed envelope. This may belong on the Host +instead of the peer store, since it needs access to the private signing key. + +## Dialing Strategies + +Once self-certified addresses are available via the peer store, we can update +the dialer to prefer using them when possible. Some systems may want to _only_ +dial self-certified addresses, so we should include some configuration options +to control whether non-certified addresses are acceptable. + +## Changes to core libp2p protocols + +How to publish these to the DHT? Are there backward compatibility issues with +older unsigned address records? Maybe we just publish these to a different key +prefix... + +Should we update identify and mDNS discovery to use signed records? + +## Future Work + +Some things that were originally considered in this RFC were trimmed so that we +can focus on delivering a basic self-certified record, which is a pressing need. + +This includes a notion of "routability", which could be used to communicate +whether a given address is global (reachable via the public internet), +LAN-local, etc. We may also want to include some kind of confidence score or +priority ranking, so that peers can communicate which addresses they would +prefer other peers to use. + +To allow these fields to be added in the future, we wrap multiaddrs in the +`AddressInfo` message instead of having the `addresses` field be a list of "raw" +multiaddrs. + +Another potentially useful extension would be a compact protocol table or bloom +filter that could be used to test whether a peer supports a given protocol +before interacting with them directly. This could be added as a new field in the +`RoutingRecord` message. + + + +[identify-spec]: ../identify/README.md +[peer-id-spec]: ../peer-ids/peer-ids.md +[autonat]: https://github.com/libp2p/specs/issues/180 +[ipld]: https://ipld.io/ +[ipld-schema-schema]: https://github.com/ipld/specs/blob/master/schemas/schema-schema.ipldsch +[envelope-rfc]: ./0002-signed-envelopes.md +[eip-778]: https://eips.ethereum.org/EIPS/eip-778 From 35fda193d58788464b0d6fff61afd02d7c2f8810 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Fri, 8 Nov 2019 10:21:44 -0500 Subject: [PATCH 08/17] encode lengths in sig buffer as uint64 --- RFC/0002-signed-envelopes.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/RFC/0002-signed-envelopes.md b/RFC/0002-signed-envelopes.md index 74ef9b8e3..5a6def681 100644 --- a/RFC/0002-signed-envelopes.md +++ b/RFC/0002-signed-envelopes.md @@ -85,15 +85,16 @@ generated as described below. When signing, a peer will prepare a buffer by concatenating the following: - The length of the [domain separation string](#domain-separation) string in - bytes, encoded as an [unsigned varint][uvarint] + bytes - The domain separation string, encoded as UTF-8 -- The length of the `typeHint` field in bytes, encoded as an [unsigned - varint][uvarint] +- The length of the `typeHint` field in bytes - The value of the `typeHint` field -- The length of the `contents` field in bytes, encoded as an [unsigned - varint][uvarint] +- The length of the `contents` field in bytes - The value of the `contents` field +The length values for each field are encoded as 64-bit unsigned integers in +network order (big-endian). + Then they will sign the buffer according to the rules in the [peer id spec][peer-id-spec] and set the `signature` field accordingly. From 627a57ca65fe95cd1f1a9c4d04965936cc4b159d Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Fri, 8 Nov 2019 13:33:23 -0500 Subject: [PATCH 09/17] rename from RoutingRecord to RoutingState --- RFC/0003-routing-records.md | 36 +++++++++++++++++++----------------- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/RFC/0003-routing-records.md b/RFC/0003-routing-records.md index d4e1c5884..989ed0ca7 100644 --- a/RFC/0003-routing-records.md +++ b/RFC/0003-routing-records.md @@ -81,8 +81,8 @@ Here's a protobuf that might work: ```protobuf -// RoutingRecord contains the listen addresses for a peer at a particular point in time. -message RoutingRecord { +// RoutingState contains the listen addresses for a peer at a particular point in time. +message RoutingState { // AddressInfo wraps a multiaddr. In the future, it may be extended to // contain additional metadata, such as "routability" (whether an address is // local or global, etc). @@ -91,7 +91,7 @@ message RoutingRecord { } // the peer id of the subject of the record (who these addresses belong to). - bytes subjectPeer = 1; + bytes peerId = 1; // A monotonically increasing sequence number, used for record ordering. uint64 seq = 2; @@ -114,7 +114,7 @@ epoch time as the `seq` value, however they MUST NOT attempt to interpret a ```javascript { - subjectPeer: "QmAlice...", + peerId: "QmAlice...", seq: 1570215229, addresses: [ @@ -131,10 +131,11 @@ epoch time as the `seq` value, however they MUST NOT attempt to interpret a ## Certification / Verification -This structure can be contained in a [signed envelope][envelope-rfc], which lets -us issue "self-certified" address records that are signed by the `subjectPeer`. +This structure can be serialized and contained in a [signed +envelope][envelope-rfc], which lets us issue "self-certified" address records +that are signed by the peer that the addresses belong to. -To produce a "self-certified" address, a peer will construct a `RoutingRecord` +To produce a "self-certified" address, a peer will construct a `RoutingState` containing all of their publicly-reachable listen addresses. A peer SHOULD only include addresses that it believes are routable via the public internet, ideally having confirmed that this is the case via some external mechanism such as a @@ -145,18 +146,19 @@ example, when testing the DHT using many processes on a single machine. To support this, implementations may use a global runtime configuration flag or environment variable to control whether local addresses will be included. -Once the `RoutingRecord` has been constructed, it should be serialized to a byte +Once the `RoutingState` has been constructed, it should be serialized to a byte string and wrapped in a [signed envelope][envelope-rfc]. The `publicKey` field -of the envelope MUST be consistent with the `subjectPeer` peer id for the record -to be considered valid. +of the envelope MUST be able to derive the `peerId` contained in the record. If +the envelope's `publicKey` does not match the `peerId` of the routing record, +the record MUST be rejected as invalid. ### Signed Envelope Domain -Signed envelopes require a "domain separation" string that defines the "scope" +Signed envelopes require a "domain separation" string that defines the scope or purpose of a signature. -When wrapping a `RoutingRecord` in a signed envelope, the domain string MUST be -`libp2p-routing-record`. +When wrapping a `RoutingState` in a signed envelope, the domain string MUST be +`libp2p-routing-state`. ### Signed Envelope Type Hint @@ -165,8 +167,8 @@ contents of the envelope. Ideally, we should define a new multicodec for routing records, so that we can identify them in a few bytes. While we're still spec'ing and working on the -initial implementation, we can use the UTF-8 string ``"/libp2p/routing-record"` -as the type hint value. +initial implementation, we can use the UTF-8 string +`"/libp2p/routing-state-record"` as the type hint value. ## Peer Store APIs @@ -187,7 +189,7 @@ And possibly: - has a particular address been self-certified by the given peer? -We'll also need a method that constructs a new `RoutingRecord` containing our +We'll also need a method that constructs a new `RoutingState` containing our listen address and wraps it in a signed envelope. This may belong on the Host instead of the peer store, since it needs access to the private signing key. @@ -224,7 +226,7 @@ multiaddrs. Another potentially useful extension would be a compact protocol table or bloom filter that could be used to test whether a peer supports a given protocol before interacting with them directly. This could be added as a new field in the -`RoutingRecord` message. +`RoutingState` message. From 238ca9f0d8b22aefd4e5334ca532105cb2b17a8b Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Fri, 8 Nov 2019 13:38:37 -0500 Subject: [PATCH 10/17] add method to fetch signed records from peerstore --- RFC/0003-routing-records.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/RFC/0003-routing-records.md b/RFC/0003-routing-records.md index 989ed0ca7..62be1b68f 100644 --- a/RFC/0003-routing-records.md +++ b/RFC/0003-routing-records.md @@ -183,6 +183,10 @@ We will need to add a few methods to the peer store: - `CertifiedAddrs(peerId) -> Set` - return the set of self-certified addresses for the given peer id +- `SignedRoutingState(peerId) -> Maybe` + - retrive the signed envelope that was most recently added to the peerstore + for the given peer, if any exists. + And possibly: - `IsCertified(peerId, multiaddr) -> Boolean` @@ -190,7 +194,7 @@ And possibly: We'll also need a method that constructs a new `RoutingState` containing our -listen address and wraps it in a signed envelope. This may belong on the Host +listen addresses and wraps it in a signed envelope. This may belong on the Host instead of the peer store, since it needs access to the private signing key. ## Dialing Strategies From 4accd0a09debfa5e25f33bc652b55330e3e863da Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Kripalani?= Date: Tue, 12 Nov 2019 10:29:58 +0000 Subject: [PATCH 11/17] fix typo. --- RFC/0002-signed-envelopes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/RFC/0002-signed-envelopes.md b/RFC/0002-signed-envelopes.md index 5a6def681..a58a686e4 100644 --- a/RFC/0002-signed-envelopes.md +++ b/RFC/0002-signed-envelopes.md @@ -5,7 +5,7 @@ ## Abstract -This RFC proposes a "signed envelope" structure that contains an arbitray byte +This RFC proposes a "signed envelope" structure that contains an arbitrary byte string payload, a signature of the payload, and the public key that can be used to verify the signature. From 5e068425768dd4733f81cde45f1ff4102a74d075 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Thu, 14 Nov 2019 10:37:51 -0500 Subject: [PATCH 12/17] naming things --- RFC/0002-signed-envelopes.md | 38 ++++++++++++++++++++---------------- RFC/0003-routing-records.md | 24 +++++++++++------------ 2 files changed, 33 insertions(+), 29 deletions(-) diff --git a/RFC/0002-signed-envelopes.md b/RFC/0002-signed-envelopes.md index a58a686e4..bf956dda3 100644 --- a/RFC/0002-signed-envelopes.md +++ b/RFC/0002-signed-envelopes.md @@ -40,12 +40,12 @@ fail. Domain strings may be any valid UTF-8 string, but should be fairly short and descriptive of their use case, for example `"libp2p-routing-record"`. -## Type Hinting +## Payload Type Information The envelope record can contain an arbitrary byte string payload, which will need to be interpreted in the context of a specific use case. To assist in -"hydrating" the payload into an appropriate domain object, we include a "type -hint" field. The type hint consists of a [multicodec][multicodec] code, +"hydrating" the payload into an appropriate domain object, we include a "payload +type" field. This field consists of a [multicodec][multicodec] code, optionally followed by an arbitrary byte sequence. This allows very compact type hints that contain just a multicodec, as well as @@ -53,6 +53,9 @@ This allows very compact type hints that contain just a multicodec, as well as multicodec](https://github.com/multiformats/multicodec/blob/master/table.csv#L23), whose binary value is equivalent to the UTF-8 `/` character. +Use of the payload type field is encouraged, but the field may be left empty +without invalidating the envelope. + ## Wire Format Since we already have a [protobuf definition for public keys][peer-id-spec], we @@ -61,23 +64,24 @@ can use protobuf for this as well and easily embed the key in the envelope: ```protobuf message SignedEnvelope { - PublicKey publicKey = 1; // see peer id spec for definition - bytes typeHint = 2; // type hint - bytes contents = 3; // payload - bytes signature = 4; // see below for signing rules + PublicKey public_key = 1; // see peer id spec for definition + bytes payload_type = 2; // payload type indicator + bytes payload = 3; // opaque binary payload + bytes signature = 4; // see below for signing rules } ``` -The `publicKey` field contains the public key whose secret counterpart was used +The `public_key` field contains the public key whose secret counterpart was used to sign the message. This MUST be consistent with the peer id of the signing peer, as the recipient will derive the peer id of the signer from this key. -The `typeHint` field contains a [multicodec][multicodec]-prefixed type hint as -described in the [Type Hinting section](#type-hinting). +The `payload_type` field contains a [multicodec][multicodec]-prefixed type +indicator as described in the [Payload Type Information +section](#payload-type-information). -The `contents` field contains the arbitrary byte string payload. +The `payload` field contains the arbitrary byte string payload. -The `signature` field contains a signature of all fields except `publicKey`, +The `signature` field contains a signature of all fields except `public_key`, generated as described below. ## Signature Production / Verification @@ -87,10 +91,10 @@ When signing, a peer will prepare a buffer by concatenating the following: - The length of the [domain separation string](#domain-separation) string in bytes - The domain separation string, encoded as UTF-8 -- The length of the `typeHint` field in bytes -- The value of the `typeHint` field -- The length of the `contents` field in bytes -- The value of the `contents` field +- The length of the `payload_type` field in bytes +- The value of the `payload_type` field +- The length of the `payload` field in bytes +- The value of the `payload` field The length values for each field are encoded as 64-bit unsigned integers in network order (big-endian). @@ -98,7 +102,7 @@ network order (big-endian). Then they will sign the buffer according to the rules in the [peer id spec][peer-id-spec] and set the `signature` field accordingly. -To verify, a peer will "inflate" the `publicKey` into a domain object that can +To verify, a peer will "inflate" the `public_key` into a domain object that can verify signatures, prepare a buffer as above and verify the `signature` field against it. diff --git a/RFC/0003-routing-records.md b/RFC/0003-routing-records.md index 62be1b68f..4f9d6fec2 100644 --- a/RFC/0003-routing-records.md +++ b/RFC/0003-routing-records.md @@ -91,7 +91,7 @@ message RoutingState { } // the peer id of the subject of the record (who these addresses belong to). - bytes peerId = 1; + bytes peer_id = 1; // A monotonically increasing sequence number, used for record ordering. uint64 seq = 2; @@ -114,7 +114,7 @@ epoch time as the `seq` value, however they MUST NOT attempt to interpret a ```javascript { - peerId: "QmAlice...", + peer_id: "QmAlice...", seq: 1570215229, addresses: [ @@ -147,9 +147,9 @@ support this, implementations may use a global runtime configuration flag or environment variable to control whether local addresses will be included. Once the `RoutingState` has been constructed, it should be serialized to a byte -string and wrapped in a [signed envelope][envelope-rfc]. The `publicKey` field -of the envelope MUST be able to derive the `peerId` contained in the record. If -the envelope's `publicKey` does not match the `peerId` of the routing record, +string and wrapped in a [signed envelope][envelope-rfc]. The `public_key` field +of the envelope MUST be able to derive the `peer_id` contained in the record. If +the envelope's `public_key` does not match the `peer_id` of the routing record, the record MUST be rejected as invalid. ### Signed Envelope Domain @@ -160,15 +160,15 @@ or purpose of a signature. When wrapping a `RoutingState` in a signed envelope, the domain string MUST be `libp2p-routing-state`. -### Signed Envelope Type Hint +### Signed Envelope Payload Type -Signed envelopes contain a "type hint" that indicates how to interpret the -contents of the envelope. +Signed envelopes contain a `payload_type` field that indicates how to interpret +the contents of the envelope. Ideally, we should define a new multicodec for routing records, so that we can identify them in a few bytes. While we're still spec'ing and working on the initial implementation, we can use the UTF-8 string -`"/libp2p/routing-state-record"` as the type hint value. +`"/libp2p/routing-state-record"` as the `payload_type` value. ## Peer Store APIs @@ -180,16 +180,16 @@ We will need to add a few methods to the peer store: If any certified addresses already exist for the peer, only accept the new envelope if it has a greater `seq` value than existing envelopes. -- `CertifiedAddrs(peerId) -> Set` +- `CertifiedAddrs(peer_id) -> Set` - return the set of self-certified addresses for the given peer id -- `SignedRoutingState(peerId) -> Maybe` +- `SignedRoutingState(peer_id) -> Maybe` - retrive the signed envelope that was most recently added to the peerstore for the given peer, if any exists. And possibly: -- `IsCertified(peerId, multiaddr) -> Boolean` +- `IsCertified(peer_id, multiaddr) -> Boolean` - has a particular address been self-certified by the given peer? From 61617d61aaef8f3b482b37da2271edd3a4097b47 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Thu, 14 Nov 2019 11:27:23 -0500 Subject: [PATCH 13/17] more detail about verification --- RFC/0003-routing-records.md | 39 +++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 15 deletions(-) diff --git a/RFC/0003-routing-records.md b/RFC/0003-routing-records.md index 4f9d6fec2..bf24426ae 100644 --- a/RFC/0003-routing-records.md +++ b/RFC/0003-routing-records.md @@ -128,6 +128,14 @@ epoch time as the `seq` value, however they MUST NOT attempt to interpret a } ``` +A peer SHOULD only include addresses that it believes are routable via the +public internet, ideally having confirmed that this is the case via some +external mechanism such as a successful AutoNAT dial-back. + +In some cases we may want to include localhost or LAN-local address; for +example, when testing the DHT using many processes on a single machine. To +support this, implementations may use a global runtime configuration flag or +environment variable to control whether local addresses will be included. ## Certification / Verification @@ -136,21 +144,22 @@ envelope][envelope-rfc], which lets us issue "self-certified" address records that are signed by the peer that the addresses belong to. To produce a "self-certified" address, a peer will construct a `RoutingState` -containing all of their publicly-reachable listen addresses. A peer SHOULD only -include addresses that it believes are routable via the public internet, ideally -having confirmed that this is the case via some external mechanism such as a -successful AutoNAT dial-back. - -In some cases we may want to include localhost or LAN-local address; for -example, when testing the DHT using many processes on a single machine. To -support this, implementations may use a global runtime configuration flag or -environment variable to control whether local addresses will be included. - -Once the `RoutingState` has been constructed, it should be serialized to a byte -string and wrapped in a [signed envelope][envelope-rfc]. The `public_key` field -of the envelope MUST be able to derive the `peer_id` contained in the record. If -the envelope's `public_key` does not match the `peer_id` of the routing record, -the record MUST be rejected as invalid. +containing their listen addresses and serialize it to a byte array using a +protobuf encoder. The serialized records will then be wrapped in a [signed +envelope][envelope-rfc], which is signed with the libp2p peer's private host +key. The corresponding public key MUST be included in the envelope's +`public_key` field. + +When receiving a `RoutingState` wrapped in a signed envelope, a peer MUST +validate the signature before deserializing the `RoutingState` record. If the +signature is invalid, the envelope MUST be discarded without deserializing the +envelope payload. + +Once the signature has been verified and the `RoutingState` has been +deserialized, the receiving peer MUST verify that the `peer_id` contained in the +`RoutingState` matches the `public_key` from the envelope. If the public key in +the envelope cannot derive the peer id contained in the routing state record, +the `RoutingState` MUST be discarded. ### Signed Envelope Domain From 536ae936326f1af074f748f68d90a8f9fa78ea39 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Thu, 14 Nov 2019 11:28:01 -0500 Subject: [PATCH 14/17] add "exchanging records" section with use cases --- RFC/0003-routing-records.md | 42 ++++++++++++++++++++++++++++++------- 1 file changed, 34 insertions(+), 8 deletions(-) diff --git a/RFC/0003-routing-records.md b/RFC/0003-routing-records.md index bf24426ae..23ed5a8e7 100644 --- a/RFC/0003-routing-records.md +++ b/RFC/0003-routing-records.md @@ -193,7 +193,7 @@ We will need to add a few methods to the peer store: - return the set of self-certified addresses for the given peer id - `SignedRoutingState(peer_id) -> Maybe` - - retrive the signed envelope that was most recently added to the peerstore + - retrieve the signed envelope that was most recently added to the peerstore for the given peer, if any exists. And possibly: @@ -206,6 +206,17 @@ We'll also need a method that constructs a new `RoutingState` containing our listen addresses and wraps it in a signed envelope. This may belong on the Host instead of the peer store, since it needs access to the private signing key. +When adding records to the peerstore, a receiving peer MUST keep track of the +latest `seq` value received for each peer and reject incoming `RoutingState` +messages unless they contain a greater `seq` value than the last received. + +After integrating the information from the `RoutingState` into the peerstore, +implementations SHOULD retain the original signed envelope. This will allow +other libp2p systems to share signed `RoutingState` records with other peers in +the network, preserving the signature of the issuing peer. The [Exchanging +Records section](#exchanging-records) section lists some systems that would need +to retrieve the original signed record from the peerstore. + ## Dialing Strategies Once self-certified addresses are available via the peer store, we can update @@ -213,13 +224,27 @@ the dialer to prefer using them when possible. Some systems may want to _only_ dial self-certified addresses, so we should include some configuration options to control whether non-certified addresses are acceptable. -## Changes to core libp2p protocols +## Exchanging Records + +We currently have several systems in libp2p that deal with peer addressing and +which could be updated to use signed routing records: + +- Public peer discovery using [libp2p's DHT][dht-spec] +- Local peer discovery with [mDNS][mdns-spec] +- Direct exchange using the [identify protocol][identify-spec] +- Service discovery via the [rendezvous protocol][rendezvous-spec] +- A proposal for [a public peer exchange protocol][pex-proposal] -How to publish these to the DHT? Are there backward compatibility issues with -older unsigned address records? Maybe we just publish these to a different key -prefix... +Of these, the highest priority for updating seems to be the DHT, since it's +actively used by several deployed systems and is vulnerable to routing attacks +by malicious peers. We should work on extending the `FIND_NODE`, `ADD_PROVIDER`, +and `GET_PROVIDERS` RPC messages to support returning signed records in addition +to the current unsigned address information they currently support. -Should we update identify and mDNS discovery to use signed records? +We should also either define a new "secure peer routing" interface or extend the +existing peer routing interfaces to support signed records, so that we don't end +up with a bunch of similar but incompatible APIs for exchanging signed address +records. ## Future Work @@ -245,8 +270,9 @@ before interacting with them directly. This could be added as a new field in the [identify-spec]: ../identify/README.md [peer-id-spec]: ../peer-ids/peer-ids.md +[mdns-spec]: ../discovery/mdns.md +[rendezvous-spec]: ../rendezvous/README.md +[pex-proposal]: https://github.com/libp2p/notes/issues/7 [autonat]: https://github.com/libp2p/specs/issues/180 -[ipld]: https://ipld.io/ -[ipld-schema-schema]: https://github.com/ipld/specs/blob/master/schemas/schema-schema.ipldsch [envelope-rfc]: ./0002-signed-envelopes.md [eip-778]: https://eips.ethereum.org/EIPS/eip-778 From 47606a0da072cf764ee034d7126f8cfc547aae7b Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Mon, 25 Nov 2019 10:36:32 -0600 Subject: [PATCH 15/17] use varints for length-prefixes in sig buffer --- RFC/0002-signed-envelopes.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/RFC/0002-signed-envelopes.md b/RFC/0002-signed-envelopes.md index bf956dda3..f9be76d90 100644 --- a/RFC/0002-signed-envelopes.md +++ b/RFC/0002-signed-envelopes.md @@ -96,8 +96,8 @@ When signing, a peer will prepare a buffer by concatenating the following: - The length of the `payload` field in bytes - The value of the `payload` field -The length values for each field are encoded as 64-bit unsigned integers in -network order (big-endian). +The length values for each field are encoded as unsigned variable-length +integers as defined in the [multiformats uvarint spec][uvarint]. Then they will sign the buffer according to the rules in the [peer id spec][peer-id-spec] and set the `signature` field accordingly. From 377f05abe37fca3bfe392b5e07c134da94306c02 Mon Sep 17 00:00:00 2001 From: Jacob Heun Date: Tue, 21 Jul 2020 19:11:38 +0200 Subject: [PATCH 16/17] update signed records rfc to match Go implementation --- RFC/0002-signed-envelopes.md | 4 ++-- RFC/0003-routing-records.md | 19 +++++++++---------- 2 files changed, 11 insertions(+), 12 deletions(-) diff --git a/RFC/0002-signed-envelopes.md b/RFC/0002-signed-envelopes.md index f9be76d90..ed837e972 100644 --- a/RFC/0002-signed-envelopes.md +++ b/RFC/0002-signed-envelopes.md @@ -63,11 +63,11 @@ can use protobuf for this as well and easily embed the key in the envelope: ```protobuf -message SignedEnvelope { +message Envelope { PublicKey public_key = 1; // see peer id spec for definition bytes payload_type = 2; // payload type indicator bytes payload = 3; // opaque binary payload - bytes signature = 4; // see below for signing rules + bytes signature = 5; // see below for signing rules } ``` diff --git a/RFC/0003-routing-records.md b/RFC/0003-routing-records.md index 23ed5a8e7..fbb3394b4 100644 --- a/RFC/0003-routing-records.md +++ b/RFC/0003-routing-records.md @@ -4,7 +4,7 @@ - Related Issues: - [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47) - [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436) - + ## Abstract This RFC proposes a method for distributing peer routing records, which contain @@ -81,8 +81,8 @@ Here's a protobuf that might work: ```protobuf -// RoutingState contains the listen addresses for a peer at a particular point in time. -message RoutingState { +// PeerRecord contains the listen addresses for a peer at a particular point in time. +message PeerRecord { // AddressInfo wraps a multiaddr. In the future, it may be extended to // contain additional metadata, such as "routability" (whether an address is // local or global, etc). @@ -92,12 +92,12 @@ message RoutingState { // the peer id of the subject of the record (who these addresses belong to). bytes peer_id = 1; - + // A monotonically increasing sequence number, used for record ordering. uint64 seq = 2; - + // All current listen addresses - repeated AddressInfo addresses = 4; + repeated AddressInfo addresses = 3; } ``` @@ -116,13 +116,12 @@ epoch time as the `seq` value, however they MUST NOT attempt to interpret a { peer_id: "QmAlice...", seq: 1570215229, - addresses: [ { - addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice", + multiaddr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice", }, { - addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice", + multiaddr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice", } ] } @@ -188,7 +187,7 @@ We will need to add a few methods to the peer store: validate the envelope signature & store the envelope for future reference. If any certified addresses already exist for the peer, only accept the new envelope if it has a greater `seq` value than existing envelopes. - + - `CertifiedAddrs(peer_id) -> Set` - return the set of self-certified addresses for the given peer id From e401b14ada894b57d6942cdb282ba161b6899e47 Mon Sep 17 00:00:00 2001 From: Jacob Heun Date: Fri, 13 Nov 2020 13:35:04 +0100 Subject: [PATCH 17/17] fix: correct link in RFC/0002-signed-envelopes.md Co-authored-by: tmakarios --- RFC/0002-signed-envelopes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/RFC/0002-signed-envelopes.md b/RFC/0002-signed-envelopes.md index ed837e972..9e232f973 100644 --- a/RFC/0002-signed-envelopes.md +++ b/RFC/0002-signed-envelopes.md @@ -106,7 +106,7 @@ To verify, a peer will "inflate" the `public_key` into a domain object that can verify signatures, prepare a buffer as above and verify the `signature` field against it. -[addr-records-rfc]: ./0003-address-records.md +[addr-records-rfc]: ./0003-routing-records.md [peer-id-spec]: ../peer-ids/peer-ids.md [multicodec]: https://github.com/multiformats/multicodec [uvarint]: https://github.com/multiformats/unsigned-varint