Historical Block Scanning For Dynamic Data Sources #902

dOrgJelli · 2019-04-30T19:48:58Z

Do you want to request a feature or report a bug?
Feature.

What is the current behavior?
From #832 :
"Whenever we we process a block and mappings request new data sources, these data sources are collected and, after having processed the block, are instantiated from templates. We then process the current block again but only with those new data sources. The entity operations from this are merged into the ones we already have for the block. After that, the dynamic data sources are persisted by adding the data sources to the subgraph instance and by adding entity operations to store them in the db (for 1.)."

What is the desired behavior?
".... We then process all blocks in the range of [contract creation -> current block] for each newly created data source."

More explanation.
In the DAOstack protocol, new DAOs can be created in a number of different ways. In order to support this (and minimize spam), the desired behavior for the subgraph is to only index DAOs that're added to a universal registry of "accepted" DAOs. Given the graph-node's current functionality, a lot of information about these newly registered DAOs would never be read/processed/stored because it has taken place prior to the current block. In order to support this scenario (which IMO is bound to be commonplace for other projects), being able to process the entire block history for a given contract upon instantiating it as a data source would be ideal.

Thread here: daostack/subgraph#197

leoyvens · 2019-05-02T12:57:50Z

This is a use case that's likely to come up again and we'll want to support it in some way, however the design and implementation is not simple. It sounds like you can solve this in a different, less ideal way, so I recommend you take that route while we figure this one out.

Jannis · 2019-05-02T13:51:19Z

The three main parts that are difficult about this:

There's currently one block stream per subgraph. It only moves forward. When a data source template is instantiated, the block stream is restarted with the additional filters from the new data sources. Making this stream go back and process all past events of a new data source will be complicated. It may be better to spawn separate block streams for each data source but have them synced somehow.
Processing past events would require accessing the store at different points in the past. This is not currently possible. Even if it was, it could introduce a time travel problem, where changes generated by new data source B on an older block would have changed the behavior of an already existing data source A.
Chain reorgs. A block in which a dynamic data source is created may be removed from the chain again later. When that happens, the dynamic data source needs to be removed again and all its data from that block needs to be removed as well. Now, if we include all past events of such a new data source, we'd have to also remove all entity changes from those past events.

dOrgJelli · 2019-05-02T23:17:15Z

Thank you for this breakdown @Jannis, it was super insightful. We have a workaround we'll use for now @leodasvacas. Looking forward to seeing this spec progress. Godspeed!

leoyvens · 2019-09-23T20:44:09Z

Rationale / Use Cases

Data source templates are commonly used in subgraphs that track contracts which server as registries for other contracts. In many cases the contract was already in use before being added to the registry and the subgraph would like to process that past data, but currently that's not possible. See the original comment for a description of a use case.

A relatively simple way to solve this is to extend our logic that reprocesses a single block for the new data source to process any range of past blocks.

Requirements

A data source template should be able to process historical triggers starting from a given past block, that's anywhere between genesis and the block at which the data source creation block.
The entity changes should all be written as if they happened on data source creation block, to avoid re-writing history.
What data should be visible to store.get is an open question. I'd suggest that all of this event processing should be done on top of the current state of the store. A consequence is that triggers on the new data source can see entity changes done by blocks "from the future" in the parent data source. This may or may not be desirable depending on the use case, but I think it's a good default because to do store.get against the historical store state we'd need always store the complete history of entity changes for all subgraphs, increasing storage costs across the board.

Proposed User Experience

A new method is added to the generate data source class, createWithStartBlock(address: Address, startBlock: BigInt). The current create method will behave as if startBlock is the current block. When a data source is created, all of its triggers in the [startBlock, currentBlock] ranged are processed before indexing continues, generalizing the current behaviour of re-processing the currentBlock.

A caveat is that this will halt indexing of other data sources and the progress is not stored across restarts of the graph node, so this should only be used if the triggers can be processed in a few minutes.

Proposed Implementation

Code generation in graph-cli will need to include createWithStartBlock, which will now pass an additional string parameter to dataSource.create which is the start block. In graph-node, the dataSource.create host export will need to handle that second parameter, defaulting to the current block is not present.

The implementation will generalize the logic we have for re-processing the current block in the instance manager, but we will to better organize our trigger filtering logic so that it can be used in the in the block stream for the initial filtering and also in the instance manager for processing dynamic data sources. The goal is to refactor the current code so we have end up with a re-usable function that looks like:

fn triggers(from: BlockPtr, to: BlockPtr, current_block: BlockPtr,
          log_filter: LogFilter, call_filter: CallFilter,
  	  block_filter: BlockFilter) -> Vec<EthereumTrigger>

This should internally check the reorg threshold to decide how to safely retrieve the triggers.

Using this might simplify our current block stream, since what we do right now is basically:

Get potentially relevant logs and calls.
Get the block hashes in which they occur.
Load those blocks.
Look for relevant logs and calls in those blocks.
Pass those as triggers to the data sources that match on them.

I don't know what our rationale was to do things in this way, but steps 2-4 seem unecessary to me, I think we could just get the logs and calls and pass them to the data sources, which could also be a performance win.

Proposed Documentation Updates

The secton 'Data Source Templates -> Instantiating a Data Source Template' should be updated to document createWithStartBlock, the target use cases and its caveats.

Proposed Tests / Acceptance Criteria

Codegen tests in graph-cli.
Manual testing of the graph-node changes and the new feature, on realistic subgraphs.

Tasks

(3d) graph-node: Refactor how triggers are filtered for better code re-use. This might end up as a separate PR.
(1d) graph-cli: Codegen for createWithStartBlock and tests.
(1d) graph-node: Accept two params in dataSource.create, plumb that to the instance manager.
(1d) graph-node: Use new trigger filtering code to generalize the block re-processing logic to take a range.
(2d) graph-node: Manual testing.
(1d) docs: Write the docs.

davekaj · 2019-10-01T15:34:48Z

Would be good to simplify compounds subgraph and allow us to not upgrade it each time a new asset is added

andsilver · 2021-12-15T02:13:41Z

Is this supported now?

leoyvens · 2021-12-15T10:06:18Z

@itopmoon This is still on the radar but it is a difficult one.

andsilver · 2021-12-15T10:08:12Z

Thanks for your update. Just wanted to know if it's available. It seems a good feature to have I think.

dOrgJelli mentioned this issue Apr 30, 2019

Support Templated dataSources daostack/subgraph#197

Closed

4 tasks

Jannis added chains/ethereum area/subgraphs new feature projects/daostack labels May 2, 2019

Jannis mentioned this issue May 13, 2019

Document that past blocks of dynamic data sources are not indexed graphprotocol/support#7

Closed

leoyvens mentioned this issue Jun 28, 2019

IPFS data sources #1017

Closed

Jannis assigned leoyvens Sep 16, 2019

leoyvens mentioned this issue Sep 19, 2019

Optimize historical event scanning #1130

Closed

leoyvens mentioned this issue Sep 30, 2019

[Bugfix + Feature request] transaction.gasUsed is not gasUsed graphprotocol/support#34

Open

leoyvens mentioned this issue Oct 8, 2019

Block explorer data #297

Open

leoyvens mentioned this issue Oct 24, 2019

Pull in transaction receipts only when necessary #1308

Merged

Jannis added enhancement New feature or request p2 labels Nov 11, 2019 — with Leander Project

JessTaDa removed the new feature label Nov 12, 2019

leoyvens removed their assignment Jun 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Historical Block Scanning For Dynamic Data Sources #902

Historical Block Scanning For Dynamic Data Sources #902

dOrgJelli commented Apr 30, 2019

leoyvens commented May 2, 2019

Jannis commented May 2, 2019

dOrgJelli commented May 2, 2019 •

edited

Loading

leoyvens commented Sep 23, 2019

davekaj commented Oct 1, 2019

andsilver commented Dec 15, 2021

leoyvens commented Dec 15, 2021

andsilver commented Dec 15, 2021

Historical Block Scanning For Dynamic Data Sources #902

Historical Block Scanning For Dynamic Data Sources #902

Comments

dOrgJelli commented Apr 30, 2019

leoyvens commented May 2, 2019

Jannis commented May 2, 2019

dOrgJelli commented May 2, 2019 • edited Loading

leoyvens commented Sep 23, 2019

Rationale / Use Cases

Requirements

Proposed User Experience

Proposed Implementation

Proposed Documentation Updates

Proposed Tests / Acceptance Criteria

Tasks

davekaj commented Oct 1, 2019

andsilver commented Dec 15, 2021

leoyvens commented Dec 15, 2021

andsilver commented Dec 15, 2021

dOrgJelli commented May 2, 2019 •

edited

Loading