Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize historical event scanning #1130

Closed
Jannis opened this issue Aug 24, 2019 · 4 comments · Fixed by #1260
Closed

Optimize historical event scanning #1130

Jannis opened this issue Aug 24, 2019 · 4 comments · Fixed by #1260
Assignees

Comments

@Jannis
Copy link
Contributor

Jannis commented Aug 24, 2019

Do you want to request a feature or report a bug?

Performance bug.

What is the current behavior?

Scanning for common events like Transfer is currently slow. Scanning for events from many data sources (e.g. created by templates) at once is also slow. Both used in combination... you can guess for yourself.

The reason for this is that all event signatures that we are scanning for are hashed together for the eth_getLogs filter. The more event signatures are hashed together, the less efficient the filter is. Also, as soon as there is more than one contract / data source involved, the filter we're building is no longer tied to a single contract address and therefor much less efficient.

What is the expected behavior?

Scanning for any event should be fast, regardless of how many data sources there are. It should not take days to handle Transfer events.

@leoyvens
Copy link
Collaborator

Really we should do the bloom filter check locally by first pulling in all block headers, which we probably already do anyways.

@leoyvens
Copy link
Collaborator

Expanding on the previous comment, scanning logs through eth_getLogs has a few issues:

  1. Robustness. We've seen different issues arise with eth_getLogs behaviour changing under our feet such as Subgraph gets stuck when eth_getLogs fails because there are too many results #776 and Stuck when a block in the log filter can no longer be found #1111. It's a thing Ethereum node services tend to tinker with, because they have more general requirements than ours, so reducing our exposure to this seems worth it.

  2. Performance. eth_getLogs is fast but we wish it was faster in cases where the filter is broad or when scanning through the initial blocks which are not relevant for most subgraphs. A local implementation will be at least as fast, more predictable and also tunable for our use case, just skipping the roundtrip should be enough to see a nice speedup.

  3. Code simplicity. The eth_getLogs interface takes block numbers not hashes, which results in this complexity in the code. In order to implement Historical Block Scanning For Dynamic Data Sources #902 we need a function that can scan and re-scan block ranges correctly and quickly, we could probably work with the complexity of the existing situation but doing a full bottom-up refactoring may be worthwhile.

Proposed Solution

The block ingestor would always keep a complete and consistent chain of block headers in the DB. This way, we can scan the bloom filter locally and find the relevant block pointers. Note that we don't really use the logs returned from eth_getLogs, we only need to know if a block is potentially relevant. The ethereum_blocks table would be re-created to have the header fields in the ddl instead of in jsonb, and a unique constraint on the block number, having a local consistent view of the chain should be a generally useful thing. Storing all block headers should require under 1GB, even with indexes.

A challenge is how to get all the headers in the database. By batching eth_getBlockByNumber calls, I could get a rate of 200 / blocks a second, which is pretty good, at that rate we can get the entire mainnet in about 12 hours. But we don't want to have to wait 12 hours for a fresh node to be usable. A couple solutions here:

  • Fall back to eth_getLogs while we are syncing. We'll certainly need to do this for private networks with a lot of blocks.
  • Provide a dump of block headers, at least for the mainnet and the main testnets, to jump start the sync.

@leoyvens
Copy link
Collaborator

Also, as soon as there is more than one contract / data source involved, the filter we're building is no longer tied to a single contract address and therefor much less efficient.

This part of the issue we can solve while still using eth_getLogs, by separating the calls for each contract.

@Jannis
Copy link
Contributor Author

Jannis commented Sep 26, 2019

Just now we're hitting a (so far undocumented) limit in Alchemy, where expensive eth_getLogs filters cause requests to time out with a 503 Service Unavailable error. Their suggestion was it could have to do with filters that include many (e.g. 400) contract addresses at once.

So splitting these calls up by contract addresses (or by smaller groups of addresses) and then merging the results afterwards would be a good next step.

@Jannis Jannis assigned leoyvens and unassigned Jannis Sep 26, 2019
leoyvens added a commit that referenced this issue Sep 26, 2019
Treat it the same as the Infura log limit, and reduce the range.

I still want to do #1130 and try separating the log calls,
but this is a quick way for affected subgraphs make progress again,
and is desirable anyways.
leoyvens added a commit that referenced this issue Sep 26, 2019
Treat it the same as the Infura log limit, and reduce the range.

I still want to do #1130 and try separating the log calls,
but this is a quick way for affected subgraphs make progress again,
and is desirable anyways.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants