-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding evm_chain_id, address and event_sig to data and topic indexes #12786
Conversation
I see you updated files related to |
3def1d5
to
271f7a2
Compare
I see you added a changeset file but it does not contain a tag. Please edit the text include at least one of the following tags:
|
424461b
to
9f60f80
Compare
on evm.logs (evm_chain_id, address, event_sig, (topics[3])); | ||
|
||
create index evm_logs_idx_topic_four | ||
on evm.logs (evm_chain_id, address, event_sig, (topics[4])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to data words one, two, and four.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed them because data word queries are used only by CCIP and we rely only on 3 and 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you feel it's too risky we can consider bringing them back. I just wanted to reduce number of redundant indexes from that table
Good idea! Makes sense. I don't think we should remove the indices for data words one, two, and four though. Even in CCIP, I see a place where it presently uses data word four: But we want LogPoller to be a general core service, not something where we have to add a new index whenever a product needs it. I guess only indexing the first five is arbitrary, but at least it's easier to document than "we only support indexing on words three and five". We're looking at ways to make this more dynamic in ChainReader, so that different products can specify which indices should be accelerated in the job spec. But we probably won't bother making that change for evm unless we have to... it will mostly just apply to non-evm chains where we expect indexing to be more complex. |
Totally agree. I missed those other indexes because they are probably not heavily used and I relied on db stats when working on that PR. Let's use 1-5 as you suggested. In the long term, it would be great to define indexes on a product level, as you mentioned. This way every product will have a set of indexes that match its queries |
core/store/migrate/migrations/0232_log_poller_word_topic_indexes.sql
Outdated
Show resolved
Hide resolved
core/store/migrate/migrations/0232_log_poller_word_topic_indexes.sql
Outdated
Show resolved
Hide resolved
Quality Gate failedFailed conditions See analysis details on SonarQube Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
* Adding evm_chain_id, address and event_sig to data and topic indexes smartcontractkit/chainlink#12786 * Improved fetching Commit Reports from database #726
Context
CCIP uses data_word_five for storing sequence numbers which are integers, unique per every OnRamp for fetching CCIPSendRequested (there is a two OnRamps per CCIP lane)
We rely on that heavily, because messages are fetched every OCR2 round during Execution Phase with the following query:
We have indexes for topics and data using a single column only, examples
Postgres uses the
evm_logs_idx_topic_five
index to filter only bymin
andmax
and then goes through the entire result set to filter out not matching records. Unfortunately, indexes for topics and data words are not efficient, when there are a lot of duplicated values for data_word or topic withinevm.logs
table. It works well for a small number of chains and lanes. However, as lanes and chains grow in time, LogPoller’s performance is vastly degraded, because it matches too many records within that index.CCIP example: we use that index for fetching matching onramp messages for the Commit Root that is being executed. CommitRoot has at most 256 leaves (containing up to 256 messages).
2 chains and 2 lanes return 512 tuples from the index. Postgres fetches that dataset from the index and has to scan them and filter by additional filters present in the query
(address, eventSig, evm_chain_id)
to finally return 256 rows. Whereas it’s not a problem for a small number of lanes it degrades over time when more lanes and chains are added40 chains, 400 lanes, Commit Root fully packed (256 messages) returns 200k records that have to be scanned sequentially only to return 256 records. We did test that on 30 chains / 250 lanes, around 15 messages per Commit Root and it caused major index degradation over time. Including evm_chain_id, addr, event_sig to index makes it almost O(1). Please see the execution plans below to better understand that problem.
This degradation is visible in growing number of tuples / selects. That being said, the more logs emitted, the heavier query becomes
Solution
Use compound indexes for topics and data_words. All LogPoller's queries are filtered by (evm_chain_id, addr, event_sig), so including them in the index helps it identify exact words properly. Without that change, Postgres tries to use a single-column index and fetches an extremely large number of tuples that need to be scanned sequentially.
I've also noticed that data_word indexes are used only by CCIP so I've removed ones that are no longer required for querying logs.
Previous chart but when running with new indexes (mind the zeros!)
Current state - lack of word_five index ~ 261ms
Execution plan
Adding index on a single column ~ 96ms
Execution plan
Compound index ~ 1ms
Execution Plan