Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase 0 Networking Specifications #763

Merged
merged 11 commits into from
Mar 28, 2019
41 changes: 41 additions & 0 deletions specs/networking/messaging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
ETH 2.0 Networking Spec - Messaging
===

# Abstract

This specification describes how individual Ethereum 2.0 messages are represented on the wire.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL”, NOT", “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

# Motivation

This specification seeks to define a messaging protocol that is flexible enough to be changed easily as the ETH 2.0 specification evolves.

# Specification

## Message Structure

An ETH 2.0 message consists of a single byte representing the message version followed by the encoded, potentially compressed body. We separate the message's version from the version included in the `libp2p` protocol path in order to allow encoding and compression schemes to be updated independently of the `libp2p` protocols themselves.
mslipper marked this conversation as resolved.
Show resolved Hide resolved

It is unlikely that more than 255 message versions will need to be supported, so a single byte should suffice.

Visually, a message looks like this:

```
+--------------------------+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If other comments are accepted, this enveloping can go away.

| version byte |
+--------------------------+
| |
| body |
| |
+--------------------------+
```

Clients MUST ignore messages with mal-formed bodies. The `version` byte MUST be one of the below values:

## Version Byte Values

### `0x01`

- **Encoding Scheme:** SSZ
mslipper marked this conversation as resolved.
Show resolved Hide resolved
- **Compression Scheme:** Snappy
32 changes: 32 additions & 0 deletions specs/networking/node-identification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
ETH 2.0 Networking Spec - Node Identification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we don't need to specify anything here as everything's already either part of the referenced EIP or multiaddr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, will remove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be appropriate to file an EIP to allocate a key for multiaddrs in the pre-defined key/value table in the ENR standard?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @fjl

===

# Abstract

This specification describes how Ethereum 2.0 nodes identify and address each other on the network.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL", NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

# Specification

Clients use Ethereum Node Records (as described in [EIP-778](http://eips.ethereum.org/EIPS/eip-778)) to discover one another. Each ENR includes, among other things, the following keys:

- The node's IP.
- The node's TCP port.
- The node's public key.

For clients to be addressable, their ENR responses MUST contain all of the above keys. Client MUST verify the signature of any received ENRs, and disconnect from peers whose ENR signatures are invalid. Each node's public key MUST be unique.

The keys above are enough to construct a [multiaddr](https://github.com/multiformats/multiaddr) for use with the rest of the `libp2p` stack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other consideration maybe: ENR (and Discovery v5) is being designed to support multiple types of identity. It is not going to be a hard requirement that secp256k1 EC pubkeys will identify the node. ENRs will describe the identity type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libp2p peer IDs are derived from the public key protobuf, which is just key type + bytes. Here's the spec: libp2p/specs#100. Both SECIO and TLS 1.3 validate peer IDs against the pubkey, so following the spec is important or connections will fail.

Copy link
Contributor

@arnetheduck arnetheduck Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mention in https://github.com/libp2p/specs/pull/100/files#r266291995 - protobuf is not deterministic, and thus not great for feeding into a hashing function or using to determine an ID, unless you used a modified protobuf version that's locked down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be handled at the libp2p layer? Here we're describing how to construct a multiaddr from an ENR; the actual handling of the multiaddr itself and the underlying hash construction would be the responsibiliy of libp2p.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would, but libp2p itself looks broken in this case - we need to keep an eye on that upstream issue so that we don't spread the breakage further.

Does using ENR require decoding RLP in this context?


It is RECOMMENDED that clients set their TCP port to the default of `9000`.

## Peer ID Generation

The `libp2p` networking stack identifies peers via a "peer ID." Simply put, a node's Peer ID is the SHA2-256 `multihash` of the node's public key. `go-libp2p-crypto` contains the canonical implementation of how to hash `secp256k1` keys for use as a peer ID.
mslipper marked this conversation as resolved.
Show resolved Hide resolved

# See Also

- [multiaddr](https://github.com/multiformats/multiaddr)
- [multihash](https://multiformats.io/multihash/)
- [go-libp2p-crypto](https://github.com/libp2p/go-libp2p-crypto)
246 changes: 246 additions & 0 deletions specs/networking/rpc-interface.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
ETH 2.0 Networking Spec - RPC Interface
===

# Abstract

The Ethereum 2.0 networking stack uses two modes of communication: a broadcast protocol that gossips information to interested parties via GossipSub, and an RPC protocol that retrieves information from specific clients. This specification defines the RPC protocol.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL", NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

# Dependencies

This specification assumes familiarity with the [Messaging](./messaging.md), [Node Identification](./node-identification), and [Beacon Chain](../core/0_beacon-chain.md) specifications.

# Specification

## Message Schemas

Message body schemas are notated like this:

```
(
field_name_1: type
field_name_2: type
)
```

SSZ serialization is field-order dependent. Therefore, fields MUST be encoded and decoded according to the order described in this document. The encoded values of each field are concatenated to form the final encoded message body. Embedded structs are serialized as Containers unless otherwise noted.
mslipper marked this conversation as resolved.
Show resolved Hide resolved

All referenced data structures can be found in the [0-beacon-chain](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/core/0_beacon-chain.md#data-structures) specification.

## `libp2p` Protocol Names

A "Protocol Name" in `libp2p` parlance refers to a human-readable identifier `libp2p` uses in order to identify sub-protocols and stream messages of different types over the same connection. A client's supported protocol paths are negotiated by the `libp2p` stack at connection time; as such they are not part of individual message bodies.
mslipper marked this conversation as resolved.
Show resolved Hide resolved

## RPC-Over-`libp2p`

To facilitate RPC-over-`libp2p`, a single protocol path is used: `/eth/serenity/rpc/1.0.0`. Remote method calls are wrapped in a "request" structure:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add beacon somewhere in the protocol path? I think this might be useful to distinguish between shard and beacon RPC commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the semantic meaning of these long version numbers?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would imagine it's there because over time it will be bugfixed (bugfix version), updated with a commitment to being backward compatible (minor version), and updated with complete disregard for any backward compatibility, for the sake of progress (major version)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea is that they follow semver. In practice I'd estimate that the only time we'd change these version numbers is if there was a backwards-incompatible change to the serialization/compression scheme.

See @raulk's point above re: message envelopes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A version number can either be interpreted or not. If we rely on semver, we should specify the correct behavior for clients: consider client a that supports 1.0.0 - should it also accept 1.0.1 messages as valid automatically, or discard them? This matters for forwards compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I'm in favor of simply having integer version numbers and have a blanket statement that sub-protocol version numbers are neither forwards nor backwards compatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be my preference as well, with how the encoding and protocol looks today.

if we had a serialization format that allowed forwards/backwards compatible additions (ie adding fields) we could maybe consider two-level numbering here, where the first number would be the blanket statement, while the second would signal the version with additional fields added, which would still be compatible with previous clients.

Such an encoding is generally a good thing in the wire use case, which would be a reason to look to extensions to SSZ when used outside consensus (a super-set of SSZ for example).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on integers to signal a generation (generation 1, generation 2...). Any reason you wouldn’t have a varint style bitmap in the HELLO message to communicate finer-grained capabilities? @arnetheduck

I would model serialisation format and compression as part of the protocol ID. Then allow Multistream to negotiate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible compatible change could be added message types, so I think minor version numbers could be useful in some cases.

Any reason you wouldn’t have a varint style bitmap in the HELLO message to communicate finer-grained capabilities?

Capabilities are known from the discovery protocol/ENRs already (but we need to define what types of capabilities we need). So I don't think we need it in the HELLO message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulk taking a step back, my initial understanding of the libp2p setup was that you would negotiate capabilities with discovery mainly and connect to clients you know you have common ground with - then merely verify the support by signing up to the various streams here and that each stream would be a protocol on its with libp2p dealing with the mulitplexing etc - that has changed now I see, and it looks like there's another layer of protocol negotiation within the stream to discover capabilities - that feels.. redundant, to do the same work twice, and somewhat limiting, because how does a client add a completely new message they want to test or use in some client-specific scenario (for example to establish / evaluate its usefulness) - but it seems I need to reread the newer spec.

I'll be honest though and say that I don't fully understand where the varint would go at this point with the various layers, but integers tend to be harder to negotiate than strings, in a decentralized manner - a string, you just pick one and start using it - if it becomes popular, people will avoid it. Numbers.. you need a registry and the associated maintenance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnetheduck my intention isn't to propose any changes here; was just curious to hear the rationale of the Eth2.0 community re: finer-grained protocols vs. coarse-grained protocol with capabilities. We also debate this occasionally in the libp2p community ;-) [libp2p supports both patterns].

Re: semver vs. protocol generations. libp2p does not impose a specific version token (if any). When registering a handler, you can attach a predicate to evaluate the match. So handler X could match N versions, and when receiving the callback, you can inspect the protocol pinned on the stream to infer which abilities you activate, etc.

We've traditionally used semver it in libp2p, IPFS, etc., but a few of us are not convinced of its aptness. Semver is good to convey the severity of changes in APIs, but protocol evolution is a different beast.

You generally strive to keep things backwards compatible, yet enable feature upgrades/opt-ins over time that may not be rolling cumulative, e.g. if 1.14.0 and 1.15.0 introduce feature X and Y respectively, how do I convey that a given peer supports Y but not X?

That's where protocol granularity comes into play: potentially make each feature/message/RPC a different protocol, and track "generations/revisions" of those protocols. libp2p supports that design. A few thoughts:

  • We're evolving Multistream to avoid round trips when you're certain the other party supports a protocol (selection vs negotiation), for example, via an ENR.
  • If two messages have timing dependencies (can't send message B until after A), and they segregated across protocols, it may make state-tracking a bit more complicated.

integers tend to be harder to negotiate than strings, in a decentralized manner - a string, you just pick one and start using it - if it becomes popular, people will avoid it.

I meant replacing semver by protocol generations, e.g. /eth/serenity/rpc/v10. Sorry for not being clear!


```
(
id: uint64
method_id: uint16
body: Request
)
```

and their corresponding responses are wrapped in a "response" structure:
mslipper marked this conversation as resolved.
Show resolved Hide resolved

```
(
id: uint64
result: Response
)
```

If an error occurs, a variant of the response structure is returned:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least with SSZ it's not easily possible to distinguish between normal and error responses, as one needs to know the schema before being able to decode the message. What one could do is have a general response format and then an embedded result/error blob that can be decoded in a second step. E.g.:

Response: (
    id: uint64
    status_code: uint64
    data: bytes
)

SuccessData: (
...
)

ErrorData (
...
)

Not really elegant, but I don't really see a better solution (for SSZ that is).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is a good point. SSZ doesn't support null values either - let me think on this one for a little bit and come up with a solution.

Copy link
Contributor Author

@mslipper mslipper Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an is_error boolean field. Note that with SSZ at least you can read the is_error field prior to the contents of the result via offsets. This allows clients to switch the deserialized type based on the is_error value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the alternative would be to use a list - empty if there's no error, and one item if there is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be clear - when encoding or decoding ssz, there generally exists no provision for skipping fields - even if is_error is false, data must contain bytes. embedding a StatusData in the data field seems to go against the spirit of SSZ generally, as SSZ decoders in general expect to know the exact type of each field, thus would not fit "naturally" in "normal" ssz code.

That said, this issue stems from using SSZ in a wire protocol setting for which it is not.. great.


```
(
id: uint64
error: (
code: uint16
data: bytes
)
)
```

The details of the RPC-Over-`libp2p` protocol are similar to [JSON-RPC 2.0](https://www.jsonrpc.org/specification). Specifically:

1. The `id` member is REQUIRED.
mslipper marked this conversation as resolved.
Show resolved Hide resolved
2. The `id` member in the response MUST be the same as the value of the `id` in the request.
mslipper marked this conversation as resolved.
Show resolved Hide resolved
3. The `method_id` member is REQUIRED.
4. The `result` member is required on success, and MUST NOT exist if there was an error.
5. The `error` member is REQUIRED on errors, and MUST NOT exist if there wasn't an error.

Structuring RPC requests in this manner allows multiple calls and responses to be multiplexed over the same stream without switching.

The "method ID" fields in the below messages refer to the `method` field in the request structure above.

The first 1,000 values in `error.code` are reserved for system use. The following error codes are predefined:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the error codes, this seems very useful (e.g. for block not found or something). Not sure about the examples below though, shouldn't 0, 10, and 20 just result in a disconnect?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort of like port numbers :)


1. `0`: Parse error.
2. `10`: Invalid request.
3. `20`: Method not found.
4. `30`: Server error.

## Messages

### Hello

**Method ID:** `0`

**Body**:

```
(
network_id: uint8
latest_finalized_root: bytes32
latest_finalized_epoch: uint64
best_root: bytes32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking into account the backwards sync suggested elsewhere, and that we can use attestations as a (strong) heuristic that a block is valid and useful, it seems prudent to include (some) attestations here - instead of simply supplying some data like best_root that cannot be trusted anyway, a recent attestation would help the connecting client both with head / fork selection and to know with a higher degree of certainty that the root sent "makes sense" and should be downloaded.

The details of this are TBD - but probably we're looking at something like attestations: [Attestation] where it's up to the client to choose a representative and recent set (or none, which is also fine, because then one can listen to broadcasts).

best_slot: uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slots are based on wall time - what's the best_slot field for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure this is supposed to refer to the slot of the head block. Maybe rename best_root and best_slot to head_root and head_slot (or to be even more clear head_block_root/slot)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think head_block_root and head_slot would be clearer

)
```

Clients exchange `hello` messages upon connection, forming a two-phase handshake. The first message the initiating client sends MUST be the `hello` message. In response, the receiving client MUST respond with its own `hello` message.

Clients SHOULD immediately disconnect from one another following the handshake above under the following conditions:

1. If `network_id` belongs to a different chain, since the client definitionally cannot sync with this client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should consider spelling out network ID and chain ID as separate fields. Chain ID should be set to a fixed number "1" for ETH, and if others want to run their own chain they can change that ID.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetworkId vs ChainId +1.
Also, message body compression algorithm indicator.
Also, upgrade paths for SSZ (I get the feeling this might change on the wire)..maybe a sorted list of serialization method preferences, the highest mutual being selected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not convinced that we actually need a network id at all and not only a chain id. Especially for RPC as arguably this isn't even a network, just a set of bidirectional connections (as opposed to the gossip layer where we actually relay data).

2. If the `latest_finalized_root` shared by the peer is not in the client's chain at the expected epoch. For example, if Peer 1 in the diagram below has `(root, epoch)` of `(A, 5)` and Peer 2 has `(B, 3)`, Peer 1 would disconnect because it knows that `B` is not the root in their chain at epoch 3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe clarify that this is (because it can only be) checked by the peer with the higher latest finalized epoch. I tried to come up with a one sentence fix, but it's probably better, to rewrite the whole paragraph from the point of view of one node shaking hands with another node (right now it's talking about both at the same time).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool got it, will do.


```
Root A

+---+
|xxx| +----+ Epoch 5
+-+-+
^
|
+-+-+
| | +----+ Epoch 4
+-+-+
Root B ^
|
+---+ +-+-+
|xxx+<---+--->+ | +----+ Epoch 3
+---+ | +---+
|
+-+-+
| | +-----------+ Epoch 2
+-+-+
^
|
+-+-+
| | +-----------+ Epoch 1
+---+
```

Once the handshake completes, the client with the higher `latest_finalized_epoch` or `best_slot` (if the clients have equal `latest_finalized_epoch`s) SHOULD send beacon block roots to its counterparty via `beacon_block_roots` (i.e., RPC method `10`).
mslipper marked this conversation as resolved.
Show resolved Hide resolved

### Goodbye

**Method ID:** `1`

**Body:**

```
(
reason: uint64
)
```

Client MAY send `goodbye` messages upon disconnection. The reason field MUST be one of the following values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more from the top of my head that might be helpful:

- too many peers
- not helpful
- malicious/faulty
- wrong chain
- sync finished (?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we still be connected after sync finished ? We would still need to propagate any newly proposed blocks to our peers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally, the standard way to sync in these kinds of "live" protocols is to start listening to broadcasts, then initiate sync.. else you'll miss packets during sync and will have to recover again.


- `1`: Client shut down.
- `2`: Irrelevant network.
- `3`: Irrelevant shard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deals with the beacon chain only, so there are no shards. I think we should have a completely separate protocol and separate connections for shard networks.


### Provide Beacon Block Roots

**Method ID:** `10`

**Body:**

```
# BlockRootSlot
(
block_root: HashTreeRoot
mslipper marked this conversation as resolved.
Show resolved Hide resolved
mslipper marked this conversation as resolved.
Show resolved Hide resolved
slot: uint64
)

(
roots: []BlockRootSlot
mslipper marked this conversation as resolved.
Show resolved Hide resolved
)
```

Send a list of block roots and slots to the peer.

### Beacon Block Headers

**Method ID:** `11`

**Request Body**

```
(
start_root: HashTreeRoot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the root? It seems redundant to me, except for the case of chain reorgs which shouldn't happen frequently at sync (and even then, it's probably better to get blocks from the current chain that we'll be able to use later, instead of getting outdated ones).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ẁe need a mechanism for recovering blocks, in case something is lost or the client goes offline for a short bit and loses a few (computer went to sleep / ISP went down for 10 minutes).

I argue in the original issue (#692 (comment)) that it's often natural to request blocks backwards for this reason: the data structure we're syncing is a singly linked list pointing backwards in time and we receive attestations and blocks that let us discover heads "naturally" by listening to the broadcasts. With a block_root+previous_n_blocks kind of request we can both sync and recover, and for example use attestations to discover "viable" heads to work on, from a sync or recovery perspective. Indeed, negotiating finalized epochs in the handshake is somewhat redundant in that case, albeit a nice optimization (except for the chain id) - we could equally well request blocks from the peer that gossiped us the block or attestation whose parent we're missing - they should not be gossiping attestations they have not linked to a finalized epoch of value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! To summarize my understanding of your comment: Syncing forwards is safer as we can verify each block immediately when we receive it, but syncing backwards is more efficient/doesn't require additional database indexing (and I guess syncing forwards may require a negotiating phase to discover the best shared block). You're proposing to interpret the fact that I see lots of attestations on top of my sync peer's head flying around the network as evidence that their head is valid? And therefore, I'd be pretty safe syncing backwards?

That sounds reasonable. My original concern was that this requires me to know (at least some good fraction of) the validator set as otherwise my sync peer could create lots of fraudulent attestations for free that I have no chance of verifying. But I would notice this if I have at least one single honest peer (if I try to sync from them or compare the attestations coming from them).

Do you think having only a backwards sync is fine or do we need both (e.g. for highly adversarial environments, or resource constrained devices that don't participate in gossiping?).

Copy link
Contributor

@arnetheduck arnetheduck Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more efficient

In terms of network / bandwidth, I'd say it's about the same but there are some nuances:

  • in forward sync, I can ask for "more" slots than already exist, potentially saving round trips - a client could use this to request "all" data at the time of request arrival. consider the following race: A sends request, a new block is produced, B receives request (similar: B starts sending response which takes time, new block is produced).
  • in backward sync, one could have an (latest-head, known_slot_number) request ("give me the block you consider to be the head, and history back to slot N") to alleviate this race, but then the server selects the head.
  • both above races are generally solved by collecting data from broadcasts while syncing (classic subscribe-then-sync pattern) - they are mainly concerns if you don't want to subscribe or want to delay subscribing.
  • in forward sync, I might end up on a different branch / head than I thought I would - the request itself does not point one out

In terms of client implementations, I think of backward sync as biased to make it cheaper for the server: the server already has the data necessary - also because the head is kept hot - while the client has to keep a chain of "unknown" blocks around / can't validate eagerly. An additional rule that the response must be forward-ordered could help the client apply / validate the blocks eagerly.

The backwards sync can be seen as more passive/reactive/lazy while forward sync is more active..

attestations on top of my sync peer's head flying around the network as evidence that their head is valid

right. the assumption rests on several premises (thanks @djrtwo!):

  • honest clients will not propagate invalid data (not signed by a validator they know)
  • there's a slashing condition on creating unviable attestations - there's currently no penalty to create & sign an unviable block so one can perhaps imagine a malicious group of validators creating lots of these and for example spam everyone during important periods "for free". It sounds a bit far fetched though, tbh, to be creating blocks this way - would love to hear thoughts.
  • I've weak-subjectively selected an initial state that contains some validators. I'd primarily look for anything signed by those validators as another heuristic for where to start syncing (even if the validator set might have changed from there).

Do you think having only a backwards sync is fine or do we need both (e.g. for highly adversarial environments, or resource constrained devices that don't participate in gossiping?).

I'm not sure :) I'm curious to hear feedback on this point, but here are some thoughts:

  • it's important that we have a request like Hello to ask for what clients consider to be the head for non-gossiping use cases - but I think that's orthogonal to the sync direction.
  • clients should be free to send that request at any time, not just during the initial negotiation phase
  • direction can be different for request and response - if different, requires a slightly "smarter" server
  • there's a cost for each direction, in terms of implementation. I'd start with one and look for strong motivations before implementing the other, as the returns are not that great. Either direction is sufficient, really.

Copy link
Contributor

@jrhea jrhea Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think having only a backwards sync is fine or do we need both (e.g. for highly adversarial environments, or resource constrained devices that don't participate in gossiping?).

It seems reasonable to sync backwards from the latest received gossiped block (at least as an initial implementation)

in backward sync, one could have an (latest-head, known_slot_number) request ("give me the block you consider to be the head, and history back to slot N") to alleviate this race, but then the server selects the head.

Do we really need start_slot? if we give clients the option to request a block by either start_slot or start_root then that forces us to maintain a lookup or search mechanism for both. if we are saying that both fields (start_slot and start_root) required to sync, then I would disagree. we should be able to simply perform a lookup by block_root and walk the chain backwards until we reach max_headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest received gossiped block

or even better, latest gossiped attestation

Do we really need start_slot?

I would say that if we go with backwards sync, we should not implement forwards sync here or elsewhere unless there's a strong case for that direction. Having to implement both directions negates some of the benefits of backward sync and adds implementation surface.

It is quite possible to add forward sync in a later version of the protocol as well should it prove necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even better, latest gossiped attestation

I can dig that

@arnetheduck or anyone else. Why do we need start_slot

start_slot: uint64
max_headers: uint64
skip_slots: uint64
)
```

**Response Body:**

```
(
headers: []BlockHeader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note BlockHeader is not defined in the beacon chain spec. I opened a PR to define it as a struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not needed specifically for the spec, we could also just define it here.

)
```

Requests beacon block headers from the peer starting from `(start_root, start_slot)`. The response MUST contain no more than `max_headers` headers. `skip_slots` defines the maximum number of slots to skip between blocks. For example, requesting blocks starting at slots `2` a `skip_slots` value of `2` would return the blocks at `[2, 4, 6, 8, 10]`. In cases where a slot is undefined for a given slot number, the closest previous block MUST be returned. For example, if slot `4` were undefined in the previous example, the returned array would contain `[2, 3, 6, 8, 10]`. If slot three were further undefined, the array would contain `[2, 6, 8, 10]` - i.e., duplicate blocks MUST be collapsed.
mslipper marked this conversation as resolved.
Show resolved Hide resolved
mslipper marked this conversation as resolved.
Show resolved Hide resolved

The function of the `skip_slots` parameter helps facilitate light client sync - for example, in [#459](https://github.com/ethereum/eth2.0-specs/issues/459) - and allows clients to balance the peers from whom they request headers. Client could, for instance, request every 10th block from a set of peers where each per has a different starting block in order to populate block data.
mslipper marked this conversation as resolved.
Show resolved Hide resolved

### Beacon Block Bodies

**Method ID:** `12`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like in LES, I would use different method ids for requests and responses. So it's possible for me to send you proactively blocks and headers using RPC, and you don't need to know about it in advance.


**Request Body:**

```
(
block_roots: []HashTreeRoot
)
```

**Response Body:**

```
(
block_bodies: []BeaconBlockBody
mslipper marked this conversation as resolved.
Show resolved Hide resolved
)
```

Requests the `block_bodies` associated with the provided `block_roots` from the peer. Responses MUST return `block_roots` in the order provided in the request. If the receiver does not have a particular `block_root`, it must return a zero-value `block_body` (i.e., a `block_body` container with all zero fields).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that when everything is going smoothly, block bodies consist of very few attestations (they should be combined by then), and a few minor items like the transfers etc. has anything looked at the numbers to see how much value there is in having separate requests for headers and bodies? Requesting headers then bodies creates additional round-trips which are a cost on its own.


### Beacon Chain State

**Note:** This section is preliminary, pending the definition of the data structures to be transferred over the wire during fast sync operations.

**Method ID:** `13`

**Request Body:**

```
(
hashes: []HashTreeRoot
)
```

**Response Body:** TBD

Requests contain the hashes of Merkle tree nodes that when merkelized yield the block's `state_root`.

The response will contain the values that, when hashed, yield the hashes inside the request body.