Optimize historical event scanning #1130

Jannis · 2019-08-24T13:26:54Z

Do you want to request a feature or report a bug?

Performance bug.

What is the current behavior?

Scanning for common events like Transfer is currently slow. Scanning for events from many data sources (e.g. created by templates) at once is also slow. Both used in combination... you can guess for yourself.

The reason for this is that all event signatures that we are scanning for are hashed together for the eth_getLogs filter. The more event signatures are hashed together, the less efficient the filter is. Also, as soon as there is more than one contract / data source involved, the filter we're building is no longer tied to a single contract address and therefor much less efficient.

What is the expected behavior?

Scanning for any event should be fast, regardless of how many data sources there are. It should not take days to handle Transfer events.

The text was updated successfully, but these errors were encountered:

leoyvens · 2019-09-16T14:25:46Z

Really we should do the bloom filter check locally by first pulling in all block headers, which we probably already do anyways.

leoyvens · 2019-09-19T13:08:22Z

Expanding on the previous comment, scanning logs through eth_getLogs has a few issues:

Robustness. We've seen different issues arise with eth_getLogs behaviour changing under our feet such as Subgraph gets stuck when eth_getLogs fails because there are too many results #776 and Stuck when a block in the log filter can no longer be found #1111. It's a thing Ethereum node services tend to tinker with, because they have more general requirements than ours, so reducing our exposure to this seems worth it.
Performance. eth_getLogs is fast but we wish it was faster in cases where the filter is broad or when scanning through the initial blocks which are not relevant for most subgraphs. A local implementation will be at least as fast, more predictable and also tunable for our use case, just skipping the roundtrip should be enough to see a nice speedup.
Code simplicity. The eth_getLogs interface takes block numbers not hashes, which results in this complexity in the code. In order to implement Historical Block Scanning For Dynamic Data Sources #902 we need a function that can scan and re-scan block ranges correctly and quickly, we could probably work with the complexity of the existing situation but doing a full bottom-up refactoring may be worthwhile.

Proposed Solution

The block ingestor would always keep a complete and consistent chain of block headers in the DB. This way, we can scan the bloom filter locally and find the relevant block pointers. Note that we don't really use the logs returned from eth_getLogs, we only need to know if a block is potentially relevant. The ethereum_blocks table would be re-created to have the header fields in the ddl instead of in jsonb, and a unique constraint on the block number, having a local consistent view of the chain should be a generally useful thing. Storing all block headers should require under 1GB, even with indexes.

A challenge is how to get all the headers in the database. By batching eth_getBlockByNumber calls, I could get a rate of 200 / blocks a second, which is pretty good, at that rate we can get the entire mainnet in about 12 hours. But we don't want to have to wait 12 hours for a fresh node to be usable. A couple solutions here:

Fall back to eth_getLogs while we are syncing. We'll certainly need to do this for private networks with a lot of blocks.
Provide a dump of block headers, at least for the mainnet and the main testnets, to jump start the sync.

leoyvens · 2019-09-19T16:16:39Z

Also, as soon as there is more than one contract / data source involved, the filter we're building is no longer tied to a single contract address and therefor much less efficient.

This part of the issue we can solve while still using eth_getLogs, by separating the calls for each contract.

Jannis · 2019-09-26T14:10:15Z

Just now we're hitting a (so far undocumented) limit in Alchemy, where expensive eth_getLogs filters cause requests to time out with a 503 Service Unavailable error. Their suggestion was it could have to do with filters that include many (e.g. 400) contract addresses at once.

So splitting these calls up by contract addresses (or by smaller groups of addresses) and then merging the results afterwards would be a good next step.

Treat it the same as the Infura log limit, and reduce the range. I still want to do #1130 and try separating the log calls, but this is a quick way for affected subgraphs make progress again, and is desirable anyways.

Jannis added bug chains/ethereum performance adoption-blocker labels Aug 24, 2019

Jannis self-assigned this Aug 24, 2019

Jannis assigned leoyvens and unassigned Jannis Sep 26, 2019

leoyvens mentioned this issue Sep 26, 2019

Detect Alchemy timeouts on eth_getLogs #1242

Merged

leoyvens mentioned this issue Sep 30, 2019

Smarter use of eth_getLogs #1260

Merged

leoyvens closed this as completed in #1260 Oct 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize historical event scanning #1130

Optimize historical event scanning #1130

Jannis commented Aug 24, 2019

leoyvens commented Sep 16, 2019

leoyvens commented Sep 19, 2019

leoyvens commented Sep 19, 2019

Jannis commented Sep 26, 2019

Optimize historical event scanning #1130

Optimize historical event scanning #1130

Comments

Jannis commented Aug 24, 2019

leoyvens commented Sep 16, 2019

leoyvens commented Sep 19, 2019

Proposed Solution

leoyvens commented Sep 19, 2019

Jannis commented Sep 26, 2019