Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address certain corner cases of time-travel queries by block hash #1405

Open
lutter opened this issue Dec 12, 2019 · 6 comments
Open

Address certain corner cases of time-travel queries by block hash #1405

lutter opened this issue Dec 12, 2019 · 6 comments

Comments

@lutter
Copy link
Collaborator

lutter commented Dec 12, 2019

Time-travel queries introduced in PR #1397 might still return incorrect results in certain circumstances. A query like query { things(block: { hash: $hash }) { ... } } might return data from different blocks if either

  • [H1] the block with the given hash is not on the chain starting at the subgraph's current head
  • [H2] the block gets removed from the subgraph's chain because of a block reorganization running concurrently with query execution (solved by PR Detect reorgs during execution of a GraphQL query #1801, not an issue anymore, only mentioned for documentation purposes)

Users can make sure they do not encounter this problem by only querying per block hash for blocks that are final, and by ensuring that the block hash is on the main chain.

Problem H1 can happen if we downloaded a block with $hash that was later removed from the subgraph's chain because of a reorganization, and is possible for any block, no matter whether it can be considered final or not. The query should return an error, as the block is not on the main chain, but might return successfully. The data returned will be from the block on the main chain that has the same number as block $hash.

Problem H2 can only happen for blocks that can not be considered final yet; for graph_node, that means that they are within REORG_THRESHOLD of the current chain head, but in practice only affects blocks within two or three blocks of the chain head. The query should return an error, as the block $hash is no longer on the main chain when the query finishes, but will return successfully with data that might come from either the block with $hash or one of the siblings with the same number that graph-node downloaded while it was processing the query.

Solving H1

Solving this issue requires that we have the infrastructure in place to answer the question "Is the block $hash on the chain starting at the subgraph's current head?"

Phase 1 of the block explorer will make it possible to distinguish between blocks on the main chain and ommers. If block $hash is at least REORG_THRESHOLD away from the chain head, we reject the query if the block is an ommer. For blocks $hash within REORG_THRESHOLD of the chain head, we will keep a list of (block_hash, block_number) for each subgraph, tying the subgraph's current head to the finalized chain, and reject the query if $hash is not in that list.

Solving H2

For this, we need to detect that the subgraph's view of the chain was reorganized while the query was running; this is only necessary for blocks that are not final yet, i.e., that are within REORG_THRESHOLD of the chain head. To support this check, we will add a generation attribute to each SubgraphDeployment that is incremented every time the subgraph's chain is reorganized. We retrieve the generation attribute before we start executing the query, and check that it has not changed after we have finished executing the query. If it did change, we report an error to the user.

Time-travel queries by block number

Problem H1 does not affect these queries, but problem H2 could. It might be worth putting these checks in place when doing a time-travel query by block number, i.e. get the hash of the block with the given number before executing the query, and then executing as if the user had given us that hash. That ensures that the result of such a query comes from a consistent state of the chain.

@Jannis Jannis added area/graphql area/store bug Something isn't working labels Dec 12, 2019
@lutter
Copy link
Collaborator Author

lutter commented Dec 12, 2019

I wonder if all this could be simplified to the following:

  1. Check that the block $hash exists and is not an ommer, and use its number in the query
  2. Execute the query
  3. Check again that the block exists and is not an ommer

The thought (and hope) behind this is that the first thing that happens when we know that we have to revert blocks is that the old block is marked as an ommer in the block explorer; as that would happen before the subgraph gets reorged, it would also be quicker in detecting queries against invalid blocks, and it would avoid the complication of storing per subgraph block hashes.

@lutter
Copy link
Collaborator Author

lutter commented Dec 12, 2019

There's one other complication, somewhat unrelated, that the current implementation does not address: it is possible to query at a block that is on the main chain, but that the subgraph hasn't caught up to yet. We need to ensure that queries at a block only happen for blocks before the subgraph's current head.

@lutter
Copy link
Collaborator Author

lutter commented Dec 13, 2019

Here's a simpler way to solve these two problems. For each subgraph, we will track three pieces of data:

  • reorgCount: the number of blocks that have been reverted in the current subgraph. The actual value is unimportant; what's important is that this value increases every time a revert happens. Increment every time a block is reverted.
  • currentReorgDepth: the number of blocks that have been reverted during the current reorg. Increment every time a block is reverted, set to 0 when storing changes for a 'forward' block
  • maxReorgDepth: the maximum number of blocks we ever changed during a reorg. Every time currentReorgDepth is incremented, check if it is now bigger than maxReorgDepth. If it is, set maxReorgDepth to currentReorgDepth.

We then run queries doing the following:

  1. Check that block $hash is not an ommer, and remember reorgCount
  2. Run the query
  3. Check again that block $hash is not an ommer; if it is within maxReorgDepth of the subgraph's current head, also check that reorgCount hasn't changed while we were running the query

It's probably not needed, but reorgCount, currentReorgDepth, and maxReorgDepth can be safely reset to 0 when processing a 'forward' block whenever we know that any query that started before the last reorg has finished. That would for example be the case when no reorg has happened for the last half hour or so (because any query running that long should have been killed along the way anyway)

@wighawag
Copy link

Check that the block $hash exists and is not an ommer, and use its number in the query

If I understand correctly thegraph transform blockHash query into blockNumber query when the blockHash in question is not part of the chain that the graph-node track

If that is the case, I don't think this is the right approach. as a client I need to query consistent data for a particular block.

and so if I use block: {hash: $hash} I need that this query gives me the data for tha particular block. if it does not have that data, it is better that it error out than gives inconsistent data to the client. this is especially important when multiple queries need to be exexuted.

a query for block: {hash: $hash} should not be transformed in a query for a different block

@lutter
Copy link
Collaborator Author

lutter commented Jun 17, 2021

Check that the block $hash exists and is not an ommer, and use its number in the query

If I understand correctly thegraph transform blockHash query into blockNumber query when the blockHash in question is not part of the chain that the graph-node track

That's not quite what can go wrong here; the situation described in this ticket can only arise if we processed a reorg for a certain block; with that, we would translate both the hash for the reorged block and the one for the block that replaced that block (at the same height) into the same block number, and would therefore produce the same query result.

If that is the case, I don't think this is the right approach. as a client I need to query consistent data for a particular block.

With the exception of what I described above, that's what happens, and using a hash that is not the hash of a valid block will cause an error. And we do plan on addressing that corner-case at some point, it's just not entirely straightforward without affecting query performance.

@wighawag
Copy link

Thanks for the reply @lutter

You got me confused with With the exception of what I described above

the main thing important to me is whether a query for a particular blockHash produce the same output in all circumstances (error are fine) ?

Is it the case or not ? in my opinion there should be zero exception for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants