Add Ethereum network indexer (phase 1: blocks only) #1383

Jannis · 2019-11-26T15:22:10Z

I consider this generally ready for review!

This PR implements phase 1 of #297, including the following features:

A network indexer that indexes blocks from the selected network. (Transactions, logs, receipts, accounts, balances come in the next two phases.) This network indexer handles reorgs to an unlimited depth (bounded only by the memory of the machine the node runs on).
A --network-subgraphs CLI flag to enable network subgraph indexing per network, e.g. with
```
graph-node ... --network-subgraphs ethereum/mainnet ethereum/kovan
```
A built-in GraphQL schema that allows Ethereum network subgraphs to be queried (or subscribed to) at /subgraphs/ethereum/mainnet. Apart from slightly different relationship fields, this schema is heavily based on Geth's GraphQL schema.
Tests for basic indexing and reorg handling. These should be extended further by also verifying that the blocks are actually written to the store. Right now they only test the Revert and AddBlock events emitted by the indexer (which are only emitted after reverting/writing blocks successfully; however that doesn't mean the written data is correct).

Database size

Regarding the size of the indexed data: The most recent 4000 mainnet blocks resulted in a Postgres database size of 20MB. That makes ~65GB for all 9,000,000 blocks, assuming they are the same size on average. They are probably smaller on average (older blocks were less busy), so we're looking at maybe 50GB just for the blocks (or their headers, rather).

Review guide

I recommend reviewing commit by commit first. I've consolidated the PR into commits that mostly change only one thing at a time, so it's easier to follow what was added when.

It may help to read up on the terminology used across the network indexer here:

graph-node/chain/ethereum/src/network_indexer/network_indexer.rs

Lines 16 to 54 in ec19ad2

    
           /// Terminology used in this component: 
        
           /// 
        
           /// Head / head block: 
        
           ///   The most recent block of a chain. 
        
           /// 
        
           /// Local head: 
        
           ///   The block that the network indexer is at locally. 
        
           ///   We get this from the store. 
        
           /// 
        
           /// Chain head: 
        
           ///   The block that the network is at. 
        
           ///   We get this from the Ethereum node(s). 
        
           /// 
        
           /// Common ancestor (during a reorg): 
        
           ///   The most recent block that two versions of a chain (e.g. the locally 
        
           ///   indexed version and the latest version that the network recognizes) 
        
           ///   have in common. 
        
           /// 
        
           ///   When handling a reorg, this is the block after which the new version 
        
           ///   has diverged. All blocks up to and including the common ancestor 
        
           ///   remain untouched during the reorg. The blocks after the common ancestor 
        
           ///   are reverted and the blocks from the new version are added after the 
        
           ///   common ancestor. 
        
           /// 
        
           ///   The common ancestor is identified by traversing new blocks from a reorg 
        
           ///   back to the most recent block that we already have indexed locally. 
        
           /// 
        
           /// Old blocks (during a reorg): 
        
           ///   Blocks after the common ancestor that are indexed locally but are 
        
           ///   being removed as part of a reorg. We collect these from the store by 
        
           ///   traversing from the current local head back to the common ancestor. 
        
           /// 
        
           /// New blocks (during a reorg): 
        
           ///   Blocks between the common ancestor and the block that triggered the 
        
           ///   reorg. After reverting the old blocks, these are the blocks that need 
        
           ///   to be fetched from the network and added after the common ancestor. 
        
           /// 
        
           ///   We collect these from the network by traversing from the block that 
        
           ///   triggered the reorg back to the common ancestor.

The state machine for the network indexer is documented here:

graph-node/chain/ethereum/src/network_indexer/network_indexer.rs

Lines 509 to 673 in ec19ad2

    
           /// State machine that handles block fetching and block reorganizations. 
        
           #[derive(StateMachineFuture)] 
        
           #[state_machine_future(context = "Context")] 
        
           enum StateMachine { 
        
               /// The indexer start in an empty state and immediately moves on 
        
               /// to loading the local head block from the store. 
        
               #[state_machine_future(start, transitions(LoadLocalHead))] 
        
               Start, 
        
               /// This state waits until the local head block has been loaded from the 
        
               /// store. It then moves on to polling the chain head block. 
        
               #[state_machine_future(transitions(PollChainHead, Failed))] 
        
               LoadLocalHead { local_head: LocalHeadFuture }, 
        
               /// This state waits until the chain head block has been polled 
        
               /// successfully. 
        
               /// 
        
               /// Based on the (local head, chain head) pair, the indexer then moves 
        
               /// on to fetching and processing a range of blocks starting at 
        
               /// local head + 1 up, leading up to the chain head. This is done 
        
               /// in chunks of e.g. 100 blocks at a time for two reasons: 
        
               /// 
        
               /// 1. To limit the amount of blocks we keep in memory. 
        
               /// 2. To be able to re-evaluate the chain head and check for reorgs 
        
               ///    frequently. 
        
               #[state_machine_future(transitions(ProcessBlocks, PollChainHead, Failed))] 
        
               PollChainHead { 
        
                   local_head: Option<EthereumBlockPointer>, 
        
                   chain_head: ChainHeadFuture, 
        
               }, 
        
               /// This state takes the next block from the stream. If the stream is 
        
               /// exhausted, it transitions back to polling the chain head block 
        
               /// and deciding on the next chunk of blocks to fetch. If there is still 
        
               /// a block to read from the stream, it's passed on to vetting for 
        
               /// validation and reorg checking. 
        
               #[state_machine_future(transitions(VetBlock, PollChainHead, Failed))] 
        
               ProcessBlocks { 
        
                   local_head: Option<EthereumBlockPointer>, 
        
                   chain_head: LightEthereumBlock, 
        
                   next_blocks: BlockStream, 
        
               }, 
        
               /// This state vets incoming blocks with regards to two aspects: 
        
               /// 
        
               /// 1. Does the block have a number and hash? This is a requirement for 
        
               ///    indexing to continue. If not, the indexer re-evaluates the chain 
        
               ///    head and starts over. 
        
               /// 
        
               /// 2. Is the block the successor of the local head block? If yes, move 
        
               ///    on to indexing this block. If not, we have a reorg. 
        
               /// 
        
               /// Notes on the reorg handling: 
        
               /// 
        
               ///   By checking parent/child succession, we ensure that there are no gaps 
        
               ///   in the indexed data (class mathematical induction). So if the local 
        
               ///   head is `x` and a block `f` comes in that is not a successor/child, it 
        
               ///   must be on a different version/fork of the chain. 
        
               /// 
        
               ///   E.g.: 
        
               /// 
        
               ///   ```ignore 
        
               ///   a---b---c---x 
        
               ///       \ 
        
               ///        +--d---e---f 
        
               ///   ``` 
        
               /// 
        
               ///   In that case we need to do the following: 
        
               /// 
        
               ///   1. Find the common ancestor of `x` and `f`, which is the block after 
        
               ///      which the two versions diverged (in the above example: `b`). 
        
               /// 
        
               ///   2. Collect old blocks betweeen the common ancestor and (including) 
        
               ///      the local head that need to be reverted (in the above example: 
        
               ///      `c`, `x`). 
        
               /// 
        
               ///   3. Fetch new blocks between the common ancestor and (including) `f` 
        
               ///      that are to be inserted instead of the old blocks in order to 
        
               ///      make the incoming block (`f`) the local head (in the above 
        
               ///      example: `d`, `e`, `f`). 
        
               #[state_machine_future(transitions(FetchNewBlocks, AddBlock, PollChainHead, Failed))] 
        
               VetBlock { 
        
                   local_head: Option<EthereumBlockPointer>, 
        
                   chain_head: LightEthereumBlock, 
        
                   next_blocks: BlockStream, 
        
                   block: BlockWithUncles, 
        
               }, 
        
               /// This state waits until all new blocks from the incoming block back to 
        
               /// the common ancestor are available. Identifying the common ancestor is 
        
               /// part of this process. 
        
               /// 
        
               /// If successful, the indexer moves on to collecting old blocks and 
        
               /// reverting the indexed data to the common ancestor. If fetching the new 
        
               /// blocks fails, it discards any new information and re-evaluates the chain 
        
               /// head. 
        
               /// 
        
               /// The new blocks that were fetched are prepending to the incoming blocks 
        
               /// stream, so that after reverting blocks the indexer can proceed with these 
        
               /// as if no reorg happened. It'll still want to vet these blocks so it wouldn't 
        
               /// be wise to just index the blocks without further checks. 
        
               /// 
        
               /// Note: This state also carries over the incoming block stream to not lose 
        
               /// its blocks. This is because even if there was a reorg, the blocks following 
        
               /// the current block that made us detect it will likely be valid successors. 
        
               /// So once the reorg has been handled, the indexer should be able to 
        
               /// continue with the remaining blocks on the stream. 
        
               /// 
        
               /// Only when going back to re-evaluating the chain head, the incoming 
        
               /// blocks stream is thrown away in the hope that of receiving a better 
        
               /// chain head with different blocks leading up to it. 
        
               #[state_machine_future(transitions(RevertToCommonAncestor, PollChainHead, Failed))] 
        
               FetchNewBlocks { 
        
                   local_head: Option<EthereumBlockPointer>, 
        
                   chain_head: LightEthereumBlock, 
        
                   next_blocks: BlockStream, 
        
                   new_blocks: NewBlocksFuture, 
        
               }, 
        
               /// This state collects and reverts old blocks in the store. If successful, 
        
               /// the indexer moves on to processing the blocks regularly (at this point, 
        
               /// the incoming blocks stream includes new blocks for the reorg, the 
        
               /// block that triggered the reorg and any blocks that were already in the 
        
               /// stream following the block that triggered the reorg). 
        
               /// 
        
               /// After reverting, the local head is updated to the common ancestor. 
        
               /// 
        
               /// If reverting fails at any block, the local head is updated to the 
        
               /// last block that we managed to revert to. Following that, the indexer 
        
               /// re-evaluates the chain head and starts over. 
        
               /// 
        
               /// Note: failing to revert an old block locally may be something that 
        
               /// the indexer cannot recover from, so it may run into a loop at this 
        
               /// point. 
        
               #[state_machine_future(transitions(ProcessBlocks, PollChainHead, Failed))] 
        
               RevertToCommonAncestor { 
        
                   local_head: Option<EthereumBlockPointer>, 
        
                   chain_head: LightEthereumBlock, 
        
                   next_blocks: BlockStream, 
        
                   new_local_head: RevertBlocksFuture, 
        
               }, 
        
               /// This state waits until a block has been written and an event for it 
        
               /// has been sent out. After that, the indexer continues processing the 
        
               /// next block. If anything goes wrong at this point, it's back to 
        
               /// re-evaluating the chain head and fetching (potentially) different 
        
               /// blocks for indexing. 
        
               #[state_machine_future(transitions(ProcessBlocks, PollChainHead, Failed))] 
        
               AddBlock { 
        
                   chain_head: LightEthereumBlock, 
        
                   next_blocks: BlockStream, 
        
                   old_local_head: Option<EthereumBlockPointer>, 
        
                   new_local_head: AddBlockFuture, 
        
               }, 
        
               /// This is unused, the indexing never ends. 
        
               #[state_machine_future(ready)] 
        
               Ready(()), 
        
               /// State for fatal errors that cause the indexing to terminate. This should 
        
               /// almost never happen. If it does, it should cause the entire node to crash 
        
               /// and restart. 
        
               #[state_machine_future(error)] 
        
               Failed(Error), 
        
           }

While this is being reviewed, I'll work on improving the tests to test data correctness and potentially generalizing the network indexer across chains. I've done some initial thinking to identify how the indexer depends on Ethereum right now and I think we can abstract that away.

leoyvens · 2019-11-26T18:49:22Z

potentially generalizing the network indexer across chains

This is probably quite a diff, would it be done in this PR or a follow up?

Jannis · 2019-11-26T18:51:25Z

@leoyvens I'd be happy to make that a follow up, given the size of this PR.

Jannis · 2019-11-26T18:53:37Z

One thing I'll still do is put metrics back in. I added the utilities (Aggregate and .measure()) to the PR and was using them at some point. But I rewrote this code about three times and dropped the metrics along the way.

Jannis · 2019-11-28T16:04:05Z

I've added extensive instrumentation to enable Grafana dashboards like this one:

Jannis · 2019-12-04T00:00:37Z

Tests generally pass, just sometimes they hang on Travis. I'm not sure yet what causes it.

leoyvens

I haven't yet fully understood the reorg algorithm, but it seems complicated due to the need of finding the common ancestor. I wonder if we could do a simpler algorithm which is to revert a single block if the next block is not a child of the current one, and go back to the starting state. This would naturally find the common ancestor by reverting one block at a time until it is found. It would in theory be less efficient for large reorgs, but looking at Etherscan statistics, it seems that 95% of reorgs are 1 block deep, and I couldn't even find a 3 block deep reorg, those must happen only once every full moon. So the simpler algorithm could maybe be more efficient because it needs to do less work on 1 block reorgs which are the common case. My point being that the performance difference would not matter if there is any, so we should favor simplicity.

chain/ethereum/src/network_indexer/block_writer.rs

leoyvens · 2019-12-04T16:54:31Z

chain/ethereum/src/network_indexer/block_writer.rs

+
+        Box::new(
+            // Add the block entity
+            self.set_entity(block.as_ref(), Some(vec![("isOmmer", false.into())]))


It seems hacky that isOmmer is set here, probably reflecting the fact that this field was the last thing added. It would be nicer if we set that in a data structure when the block is fetched.

Agreed. I've changed this so ommer blocks are wrapped in an Ommer newtype and the isOmmer flag is now set in the TryIntoEntity implementations for Ommer and BlockWithUncles.

leoyvens · 2019-12-04T17:27:02Z

chain/ethereum/src/network_indexer/mod.rs

+#[derive(Clone, Debug, Default, PartialEq)]
+pub struct BlockWithUncles {
+    pub block: EthereumBlock,
+    pub uncles: Vec<Option<Block<H256>>>,


A Vec<Option<_>> is weird, what does None mean?

The docs are not super clear; I think https://github.com/ethereum/wiki/wiki/JSON-RPC#eth_getUncleByBlockHashAndIndex and it's link to https://github.com/ethereum/wiki/wiki/JSON-RPC#eth_getblockbyhash suggests that None means the uncle couldn't be found.

We can't allow different nodes to return different uncles though – we need them all for computing block rewards. So I think if an uncle (ommer) is not found, that's a serious reason to fail the network indexer. I'll drop the Option.

chain/ethereum/src/network_indexer/mod.rs

chain/ethereum/src/network_indexer/network_indexer.rs

leoyvens · 2019-12-06T16:21:33Z

chain/ethereum/src/network_indexer/network_indexer.rs

+        // Check whether we have a reorg (parent of the new block != our local head).
+        if block.inner().parent_ptr() != state.local_head {
+            let depth = block.inner().number.unwrap().as_u64()
+                - state.local_head.map_or(0u64, |ptr| ptr.number);


I don't see the purpose of this variable, it's only used in logs and metrics, and afaict it's either 0 if block is genesis or 1 otherwise, so not very meaningful.

Eh, you're right, this will never report the real depth of the reorg. I do want that information, but I think I can only log it once we have found the common ancestor.

leoyvens · 2019-12-06T16:30:03Z

chain/ethereum/src/network_indexer/network_indexer.rs

+                let state = state.take();
+
+                transition!(PollChainHead {
+                    local_head: state.old_local_head,


What if we went back to the starting state here, and then we wouldn't need keep old_local_head.

True, I like that.

chain/ethereum/src/lib.rs

leoyvens · 2019-12-09T21:47:08Z

graph/src/ext/futures.rs

@@ -224,6 +228,10 @@ impl<F: Future> FutureExtension for F {
            on_cancel,
        }
    }
+
+    fn measure<C: FnOnce(&Self::Item, Duration)>(self, callback: C) -> Measure<Self, C> {
+        Measure::new(self, callback)


I think having this helper is unecessary, it only has one caller from what I can tell, and with async/.await it won't be idiomatic because it's a variation of and_then.

I can move it into the indexer code.

And this is done also.

Jannis · 2019-12-13T11:59:07Z

@leoyvens I think I've addressed all comments with the appropriate changes. Could you take another look?

Jannis · 2019-12-13T12:17:32Z

Rebased on top of master.

Jannis · 2019-12-13T15:48:02Z

Reorg handling was simplified as per @leoyvens's suggestion. Reduced the implementation by about 600 lines and made it a lot easier to follow.

Jannis · 2019-12-13T15:54:54Z

Left to do: Count consecutive reverts to capture and log reorg depths.

leoyvens

Code is looking good, I only have few minor comments. Once this is ready for merging I'll also give it a run locally.

leoyvens · 2019-12-13T18:16:38Z

chain/ethereum/src/network_indexer/mod.rs

+            difficulty: block.difficulty,
+            total_difficulty: block.total_difficulty,
+            seal_fields: block.seal_fields,
+            uncles: block.uncles,


Is correct and worth it to assert that this is empty?

Good question. I wouldn't think uncles ever report more uncles (that's not how it works). Asserting this might cause failures though. I'd rather have references to uncles of uncles in the resulting data that resolve to null.

chain/ethereum/src/network_indexer/network_indexer.rs

lutter · 2019-12-14T02:00:51Z

chain/ethereum/src/network_indexer/ethereum.graphql

@@ -0,0 +1,83 @@
+""" Block is an Ethereum block."""
+type Block @entity {
+  id: ID!


Before we start rolling this out, we should brush up the work I did on making it possible to make ID equivalent to Bytes for a subgraph so that these id's take only 20 instead of 40 (or 42) bytes to store. It will be transparent to the rest of the code, but requires that we pass a flag to create_subgraph that indicates whether ID should be a String or Bytes.

The code for this is in store::postgres::relational::Layout::new, but it's not exposed to callers, and instead capped at IdType::String - we should expose this up in the callstack as arguments so it can be set in create_subgraph and is stored in the database (maybe as a field on deployment_schemas)

How much is there to do? And how much, if any, risk does that support introduce? Does it affect clients, filters, anything?

Besides passing IdType::Bytes or IdType::String through when creating a subgraph, the following needs to be done in the Store:

fix up a handful of places in relational_queries.rs where we assume that the id is a String

Look at the type of the id column in information_schema.columns when starting a subgraph and decide whether it uses Bytes or String as the id

At the Entity layer, the block explorer would have to make sure to pass id as a Value::Bytes rather than a Value::String.

At the GraphQL layer, users have to pass the id as something that can be converted to Value::Bytes, and we'd have to do that conversion when coercing values. (I could make it so that we convert a Value::String into a Value::Bytes in relational_queries.rs, but that seems a bit hacky)

I looked a little more, and to avoid changing too much in the code base, I think the best course of action is to keep that distinction within the relational mapping code. That means that code that deals with entity ID's continues to use strings, and the conversion from string to bytea and vice versa all happens in relational_queries. For users of the storage layer, the main change is that they might get a new error when the id is not a string in the form 0xdeadbeef.

Those changes should be possible to do in a couple of days. The only change for the block explorer would be to pass IdType::Bytes when creating the schema.

Can we do this separately? We're not going to activate this feature right away.

Yes, totally; we just need to do this before the feature goes live. We can migrate the database after the fact, but it's likely to take long (maybe hours) and during that time, the block explorer would be unavailable.

When this goes into a release, we need to make sure it's behind a feature flag so that users who install that release don't wind up with this data in their database which we would have to migrate.

Opened #1414 to track this properly.

It's already behind a --network-subgraphs CLI flag.

Set it to `false` for regular blocks and to `true` for ommers.

If they all start with `Failed ...`, they are easier to grep for.

When writing blocks, set the `isOmmer` entity field based on whether the block being written is an `Ommer` (true) or a `BlockWithOmmers` (false).

This is more idiomatic, apart from `LightEthereumBlock`, where a new `format()` method is added because `LightEthereumBlock` is a foreign type that we can't implement `Display` for without a wrapper.

Move these into `graph` so they can be used in other places as well (like other chain integrations in the future).

This avoids dealing with `Option` blocks in the rest of the indexer and therefore simplifies things a bit.

Since it's only used in one place right now (`track_future!` in the network indexer), we can get away with something as simple as ```rust let start_time = Instant::now(); ... .inspect(move |_| { let duration = start_time.elapsed(); ... }) ``` Squashme: remove measure

Instead of collecting all old and new blocks to find the common ancestor and revert old blocks, we simply revert the local head block one block at a time and re-evaluate the situation (by polling the chain head block again and deciding which blocks to look up next). Eventually, this procedure will revert the local head back to the common ancestor. For deep reorgs, this will be slow, but about 99% of Ethereum reorgs have a depth of one, so this is something we can live with easily.

Jannis · 2019-12-29T11:05:23Z

@leoyvens @Zerim I've made it so that the following routes / subgraph names are used:

The subgraph name itself becomes network/ethereum/mainnet, network/ethereum/kovan.

The routes become: /subgraphs/network/ethereum/mainnet and /subgraphs/network/ethereum/kovan.

Jannis force-pushed the jannis/block-explorer-phase-1-v1 branch from e1de997 to 5a69e25 Compare November 26, 2019 15:28

Jannis requested a review from a team November 26, 2019 18:39

Jannis self-assigned this Nov 26, 2019

Jannis added chains/ethereum enhancement New feature or request labels Nov 26, 2019

Jannis marked this pull request as ready for review November 26, 2019 18:43

Jannis force-pushed the jannis/block-explorer-phase-1-v1 branch from ec19ad2 to 02e93a5 Compare November 26, 2019 18:46

Jannis force-pushed the jannis/block-explorer-phase-1-v1 branch from 25fd80e to b2232a9 Compare November 26, 2019 22:06

Jannis mentioned this pull request Nov 27, 2019

Add Aggregate metric utility #1367

Closed

Jannis force-pushed the jannis/block-explorer-phase-1-v1 branch 2 times, most recently from ef11471 to ae8b81b Compare December 3, 2019 22:52

leoyvens mentioned this pull request Dec 5, 2019

Block handlers affect block scanning too early #1395

Closed

leoyvens requested changes Dec 9, 2019

View reviewed changes

Jannis mentioned this pull request Dec 11, 2019

Enable time-travel GraphQL queries #1397

Merged

Jannis force-pushed the jannis/block-explorer-phase-1-v1 branch from 3d6eacb to 2f3437e Compare December 12, 2019 14:11

lutter mentioned this pull request Dec 12, 2019

Address certain corner cases of time-travel queries by block hash #1405

Open

Jannis force-pushed the jannis/block-explorer-phase-1-v1 branch from 127bdba to c072697 Compare December 13, 2019 12:17

leoyvens requested changes Dec 13, 2019

View reviewed changes

lutter reviewed Dec 14, 2019

View reviewed changes

Jannis force-pushed the jannis/block-explorer-phase-1-v1 branch 2 times, most recently from 99f3dcb to d085330 Compare December 16, 2019 10:24

Jannis added 24 commits December 29, 2019 11:29

chain/ethereum: Add isOmmer flag to blocks

abf9e99

Set it to `false` for regular blocks and to `true` for ommers.

chain/ethereum: Make failed messages more consistent

0ecf5b5

If they all start with `Failed ...`, they are easier to grep for.

chain/ethereum: Remove unnecessary Arc import

7e425a5

chain/ethereum: Distinguish ommers more clearly

4bac506

When writing blocks, set the `isOmmer` entity field based on whether the block being written is an `Ommer` (true) or a `BlockWithOmmers` (false).

chain/ethereum: Retry range if ommers are unavailable

02698af

chain/ethereum: Remove unused method

c3c1dc2

graph, chain/ethereum: Display blocks more idiomatically

5605567

This is more idiomatic, apart from `LightEthereumBlock`, where a new `format()` method is added because `LightEthereumBlock` is a foreign type that we can't implement `Display` for without a wrapper.

chain/ethereum: Remove unnecessary extern crate

a6738f5

graph, chain/ethereum: Move entity conversation traits

d801c94

Move these into `graph` so they can be used in other places as well (like other chain integrations in the future).

chain/ethereum, node: Move subgraph creation into NetworkIndexer

861119b

chain/ethereum: Update network indexer tests

5d32dfd

chain/ethereum: Terminate fetching at the first unavailable block

48b1b08

This avoids dealing with `Option` blocks in the rest of the indexer and therefore simplifies things a bit.

chain/ethereum: Double-check chain lengths

f841014

chain/ethereum: Log reorg depth properly

a9be13e

chain/ethereum: Simplify rolling back when adding a block fails

443f059

chain/ethereum: Explicitly trigger test reorgs

cffb4ed

chain/ethereum: Log inclusive block range in []

4d89059

chain/ethereum: Drop unnecessary scope

158268b

chain/ethereum: Fix BlockWriter code formatting

70cce7c

chain/ethereum: Remove unused import

ee4c1a7

chain/ethereum: Give consecutive reorg tests more time

621bfc8

Cargo: Update lockfile

f92ced3

Jannis force-pushed the jannis/block-explorer-phase-1-v1 branch from f3363e6 to f92ced3 Compare December 29, 2019 10:31

Jannis added 2 commits December 29, 2019 12:00

node: Prefix network subgraphs with network/

092f033

server: Add support for network/ subgraphs

e55a160

leoyvens approved these changes Dec 30, 2019

View reviewed changes

Jannis merged commit 096dab9 into master Dec 30, 2019

	/// Terminology used in this component:
	///
	/// Head / head block:
	/// The most recent block of a chain.
	///
	/// Local head:
	/// The block that the network indexer is at locally.
	/// We get this from the store.
	///
	/// Chain head:
	/// The block that the network is at.
	/// We get this from the Ethereum node(s).
	///
	/// Common ancestor (during a reorg):
	/// The most recent block that two versions of a chain (e.g. the locally
	/// indexed version and the latest version that the network recognizes)
	/// have in common.
	///
	/// When handling a reorg, this is the block after which the new version
	/// has diverged. All blocks up to and including the common ancestor
	/// remain untouched during the reorg. The blocks after the common ancestor
	/// are reverted and the blocks from the new version are added after the
	/// common ancestor.
	///
	/// The common ancestor is identified by traversing new blocks from a reorg
	/// back to the most recent block that we already have indexed locally.
	///
	/// Old blocks (during a reorg):
	/// Blocks after the common ancestor that are indexed locally but are
	/// being removed as part of a reorg. We collect these from the store by
	/// traversing from the current local head back to the common ancestor.
	///
	/// New blocks (during a reorg):
	/// Blocks between the common ancestor and the block that triggered the
	/// reorg. After reverting the old blocks, these are the blocks that need
	/// to be fetched from the network and added after the common ancestor.
	///
	/// We collect these from the network by traversing from the block that
	/// triggered the reorg back to the common ancestor.

	/// State machine that handles block fetching and block reorganizations.
	#[derive(StateMachineFuture)]
	#[state_machine_future(context = "Context")]
	enum StateMachine {
	/// The indexer start in an empty state and immediately moves on
	/// to loading the local head block from the store.
	#[state_machine_future(start, transitions(LoadLocalHead))]
	Start,

	/// This state waits until the local head block has been loaded from the
	/// store. It then moves on to polling the chain head block.
	#[state_machine_future(transitions(PollChainHead, Failed))]
	LoadLocalHead { local_head: LocalHeadFuture },

	/// This state waits until the chain head block has been polled
	/// successfully.
	///
	/// Based on the (local head, chain head) pair, the indexer then moves
	/// on to fetching and processing a range of blocks starting at
	/// local head + 1 up, leading up to the chain head. This is done
	/// in chunks of e.g. 100 blocks at a time for two reasons:
	///
	/// 1. To limit the amount of blocks we keep in memory.
	/// 2. To be able to re-evaluate the chain head and check for reorgs
	/// frequently.
	#[state_machine_future(transitions(ProcessBlocks, PollChainHead, Failed))]
	PollChainHead {
	local_head: Option<EthereumBlockPointer>,
	chain_head: ChainHeadFuture,
	},

	/// This state takes the next block from the stream. If the stream is
	/// exhausted, it transitions back to polling the chain head block
	/// and deciding on the next chunk of blocks to fetch. If there is still
	/// a block to read from the stream, it's passed on to vetting for
	/// validation and reorg checking.
	#[state_machine_future(transitions(VetBlock, PollChainHead, Failed))]
	ProcessBlocks {
	local_head: Option<EthereumBlockPointer>,
	chain_head: LightEthereumBlock,
	next_blocks: BlockStream,
	},

	/// This state vets incoming blocks with regards to two aspects:
	///
	/// 1. Does the block have a number and hash? This is a requirement for
	/// indexing to continue. If not, the indexer re-evaluates the chain
	/// head and starts over.
	///
	/// 2. Is the block the successor of the local head block? If yes, move
	/// on to indexing this block. If not, we have a reorg.
	///
	/// Notes on the reorg handling:
	///
	/// By checking parent/child succession, we ensure that there are no gaps
	/// in the indexed data (class mathematical induction). So if the local
	/// head is `x` and a block `f` comes in that is not a successor/child, it
	/// must be on a different version/fork of the chain.
	///
	/// E.g.:
	///
	/// ```ignore
	/// a---b---c---x
	/// \
	/// +--d---e---f
	/// ```
	///
	/// In that case we need to do the following:
	///
	/// 1. Find the common ancestor of `x` and `f`, which is the block after
	/// which the two versions diverged (in the above example: `b`).
	///
	/// 2. Collect old blocks betweeen the common ancestor and (including)
	/// the local head that need to be reverted (in the above example:
	/// `c`, `x`).
	///
	/// 3. Fetch new blocks between the common ancestor and (including) `f`
	/// that are to be inserted instead of the old blocks in order to
	/// make the incoming block (`f`) the local head (in the above
	/// example: `d`, `e`, `f`).
	#[state_machine_future(transitions(FetchNewBlocks, AddBlock, PollChainHead, Failed))]
	VetBlock {
	local_head: Option<EthereumBlockPointer>,
	chain_head: LightEthereumBlock,
	next_blocks: BlockStream,
	block: BlockWithUncles,
	},

	/// This state waits until all new blocks from the incoming block back to
	/// the common ancestor are available. Identifying the common ancestor is
	/// part of this process.
	///
	/// If successful, the indexer moves on to collecting old blocks and
	/// reverting the indexed data to the common ancestor. If fetching the new
	/// blocks fails, it discards any new information and re-evaluates the chain
	/// head.
	///
	/// The new blocks that were fetched are prepending to the incoming blocks
	/// stream, so that after reverting blocks the indexer can proceed with these
	/// as if no reorg happened. It'll still want to vet these blocks so it wouldn't
	/// be wise to just index the blocks without further checks.
	///
	/// Note: This state also carries over the incoming block stream to not lose
	/// its blocks. This is because even if there was a reorg, the blocks following
	/// the current block that made us detect it will likely be valid successors.
	/// So once the reorg has been handled, the indexer should be able to
	/// continue with the remaining blocks on the stream.
	///
	/// Only when going back to re-evaluating the chain head, the incoming
	/// blocks stream is thrown away in the hope that of receiving a better
	/// chain head with different blocks leading up to it.
	#[state_machine_future(transitions(RevertToCommonAncestor, PollChainHead, Failed))]
	FetchNewBlocks {
	local_head: Option<EthereumBlockPointer>,
	chain_head: LightEthereumBlock,
	next_blocks: BlockStream,
	new_blocks: NewBlocksFuture,
	},

	/// This state collects and reverts old blocks in the store. If successful,
	/// the indexer moves on to processing the blocks regularly (at this point,
	/// the incoming blocks stream includes new blocks for the reorg, the
	/// block that triggered the reorg and any blocks that were already in the
	/// stream following the block that triggered the reorg).
	///
	/// After reverting, the local head is updated to the common ancestor.
	///
	/// If reverting fails at any block, the local head is updated to the
	/// last block that we managed to revert to. Following that, the indexer
	/// re-evaluates the chain head and starts over.
	///
	/// Note: failing to revert an old block locally may be something that
	/// the indexer cannot recover from, so it may run into a loop at this
	/// point.
	#[state_machine_future(transitions(ProcessBlocks, PollChainHead, Failed))]
	RevertToCommonAncestor {
	local_head: Option<EthereumBlockPointer>,
	chain_head: LightEthereumBlock,
	next_blocks: BlockStream,
	new_local_head: RevertBlocksFuture,
	},

	/// This state waits until a block has been written and an event for it
	/// has been sent out. After that, the indexer continues processing the
	/// next block. If anything goes wrong at this point, it's back to
	/// re-evaluating the chain head and fetching (potentially) different
	/// blocks for indexing.
	#[state_machine_future(transitions(ProcessBlocks, PollChainHead, Failed))]
	AddBlock {
	chain_head: LightEthereumBlock,
	next_blocks: BlockStream,
	old_local_head: Option<EthereumBlockPointer>,
	new_local_head: AddBlockFuture,
	},

	/// This is unused, the indexing never ends.
	#[state_machine_future(ready)]
	Ready(()),

	/// State for fatal errors that cause the indexing to terminate. This should
	/// almost never happen. If it does, it should cause the entire node to crash
	/// and restart.
	#[state_machine_future(error)]
	Failed(Error),
	}

Add Ethereum network indexer (phase 1: blocks only) #1383

Add Ethereum network indexer (phase 1: blocks only) #1383

Conversation

Jannis commented Nov 26, 2019 • edited Loading

Database size

Review guide

leoyvens commented Nov 26, 2019

Jannis commented Nov 26, 2019

Jannis commented Nov 26, 2019

Jannis commented Nov 28, 2019

Jannis commented Dec 4, 2019

leoyvens left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jannis commented Dec 13, 2019

Jannis commented Dec 13, 2019

Jannis commented Dec 13, 2019

Jannis commented Dec 13, 2019

leoyvens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jannis commented Dec 29, 2019

Jannis commented Nov 26, 2019 •

edited

Loading

leoyvens left a comment •

edited

Loading