Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding evm_chain_id, address and event_sig to data and topic indexes #12786

Merged
merged 9 commits into from
Apr 16, 2024

Conversation

mateusz-sekara
Copy link
Collaborator

@mateusz-sekara mateusz-sekara commented Apr 11, 2024

Context

CCIP uses data_word_five for storing sequence numbers which are integers, unique per every OnRamp for fetching CCIPSendRequested (there is a two OnRamps per CCIP lane)

event CCIPSendRequested(Internal.EVM2EVMMessage message);
struct EVM2EVMMessage {
    uint64 sourceChainSelector; 
    address sender;
    address receiver; 
    uint64 sequenceNumber;
    ...
}

We rely on that heavily, because messages are fetched every OCR2 round during Execution Phase with the following query:

SELECT * FROM evm.logs 
			WHERE evm_chain_id = 1337
			AND address = :address 
			AND event_sig = :event_sig
			AND substring(data from 32*5+1 for 32) >= min
			AND substring(data from 32*5+1 for 32) <= max
			AND block_number <= %s
			ORDER BY (block_number, log_index)

We have indexes for topics and data using a single column only, examples

create index evm_logs_idx_data_word_three
    on evm.logs ("substring"(data, 65, 32));
create index evm_logs_idx_topic_two
    on evm.logs ((topics[2]));

Postgres uses the evm_logs_idx_topic_five index to filter only by min and max and then goes through the entire result set to filter out not matching records. Unfortunately, indexes for topics and data words are not efficient, when there are a lot of duplicated values for data_word or topic within evm.logs table. It works well for a small number of chains and lanes. However, as lanes and chains grow in time, LogPoller’s performance is vastly degraded, because it matches too many records within that index.

CCIP example: we use that index for fetching matching onramp messages for the Commit Root that is being executed. CommitRoot has at most 256 leaves (containing up to 256 messages).

2 chains and 2 lanes return 512 tuples from the index. Postgres fetches that dataset from the index and has to scan them and filter by additional filters present in the query (address, eventSig, evm_chain_id) to finally return 256 rows. Whereas it’s not a problem for a small number of lanes it degrades over time when more lanes and chains are added

40 chains, 400 lanes, Commit Root fully packed (256 messages) returns 200k records that have to be scanned sequentially only to return 256 records. We did test that on 30 chains / 250 lanes, around 15 messages per Commit Root and it caused major index degradation over time. Including evm_chain_id, addr, event_sig to index makes it almost O(1). Please see the execution plans below to better understand that problem.

This degradation is visible in growing number of tuples / selects. That being said, the more logs emitted, the heavier query becomes
image

Solution

Use compound indexes for topics and data_words. All LogPoller's queries are filtered by (evm_chain_id, addr, event_sig), so including them in the index helps it identify exact words properly. Without that change, Postgres tries to use a single-column index and fetches an extremely large number of tuples that need to be scanned sequentially.

I've also noticed that data_word indexes are used only by CCIP so I've removed ones that are no longer required for querying logs.

Previous chart but when running with new indexes (mind the zeros!)
image

Current state - lack of word_five index ~ 261ms

Execution plan
 Sort  (cost=2147.70..2147.71 rows=3 width=406) (actual time=261.701..261.716 rows=97 loops=1)
   Sort Key: (ROW(logs.block_number, logs.log_index))
   Sort Method: quicksort  Memory: 123kB
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..1.21 rows=1 width=16) (actual time=0.061..0.062 rows=1 loops=1)
           ->  Index Scan using idx_evm_log_poller_blocks_order_by_block on log_poller_blocks  (cost=0.43..77036.27 rows=98594 width=16) (actual time=0.060..0.061 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '11155111'::numeric)
   ->  Index Scan using idx_evm_logs_ordered_by_block_and_created_at on logs  (cost=0.56..2146.47 rows=3 width=406) (actual time=0.101..261.614 rows=97 loops=1)
         Index Cond: ((evm_chain_id = '11155111'::numeric) AND (address = '\x81660dc846f0528a7ce003c1f7774d7c4135f344'::bytea) AND (event_sig = '\xd0c3c799bf9e2639de44391e7f524d229b2b55f5b1ea94b2bf7da42f7243dddd'::bytea) AND (block_number <= $0))
         Filter: (("substring"(data, 129, 32) >= '\x0000000000000000000000000000000000000000000000000000000000000001'::bytea) AND ("substring"(data, 129, 32) <= '\x0000000000000000000000000000000000000000000000000000000000000061'::bytea))
         Rows Removed by Filter: 228359
 Planning Time: 0.238 ms
 Execution Time: 261.781 ms
(13 rows)

Adding index on a single column ~ 96ms

CREATE INDEX evm_logs_idx_data_word_five ON evm.logs (substring(data from 129 for 32));
Execution plan
 Sort  (cost=1190.89..1190.89 rows=3 width=406) (actual time=91.086..91.096 rows=97 loops=1)
   Sort Key: (ROW(logs.block_number, logs.log_index))
   Sort Method: quicksort  Memory: 123kB
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..1.21 rows=1 width=16) (actual time=0.068..0.069 rows=1 loops=1)
           ->  Index Scan using idx_evm_log_poller_blocks_order_by_block on log_poller_blocks  (cost=0.43..77036.27 rows=98594 width=16) (actual time=0.067..0.067 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '11155111'::numeric)
   ->  Bitmap Heap Scan on logs  (cost=1177.58..1189.65 rows=3 width=406) (actual time=88.277..90.893 rows=97 loops=1)
         Recheck Cond: ((evm_chain_id = '11155111'::numeric) AND (address = '\x81660dc846f0528a7ce003c1f7774d7c4135f344'::bytea) AND (event_sig = '\xd0c3c799bf9e2639de44391e7f524d229b2b55f5b1ea94b2bf7da42f7243dddd'::bytea) AND (block_number <= $0) AND ("substring"(data, 129, 32) >= '\x0000000000000000000000000000000000000000000000000000000000000001'::bytea) AND ("substring"(data, 129, 32) <= '\x0000000000000000000000000000000000000000000000000000000000000061'::bytea))
         Rows Removed by Index Recheck: 6013
         Heap Blocks: exact=10 lossy=456
         ->  BitmapAnd  (cost=1177.58..1177.58 rows=3 width=0) (actual time=87.654..87.655 rows=0 loops=1)
               ->  Bitmap Index Scan on idx_evm_logs_ordered_by_block_and_created_at  (cost=0.00..53.23 rows=578 width=0) (actual time=82.674..82.674 rows=228456 loops=1)
                     Index Cond: ((evm_chain_id = '11155111'::numeric) AND (address = '\x81660dc846f0528a7ce003c1f7774d7c4135f344'::bytea) AND (event_sig = '\xd0c3c799bf9e2639de44391e7f524d229b2b55f5b1ea94b2bf7da42f7243dddd'::bytea) AND (block_number <= $0))
               ->  Bitmap Index Scan on evm_logs_idx_data_word_five  (cost=0.00..1124.10 rows=73554 width=0) (actual time=0.463..0.463 rows=5566 loops=1)
                     Index Cond: (("substring"(data, 129, 32) >= '\x0000000000000000000000000000000000000000000000000000000000000001'::bytea) AND ("substring"(data, 129, 32) <= '\x0000000000000000000000000000000000000000000000000000000000000061'::bytea))
 Planning Time: 0.414 ms
 Execution Time: 92.380 ms
(18 rows)

Compound index ~ 1ms

create index evm_logs_idx_data_word_five
    on evm.logs (address, event_sig, evm_chain_id, "substring"(data, 129, 32));
Execution Plan
 Sort  (cost=39.24..39.25 rows=3 width=406) (actual time=0.351..0.357 rows=97 loops=1)
   Sort Key: (ROW(logs.block_number, logs.log_index))
   Sort Method: quicksort  Memory: 123kB
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..1.21 rows=1 width=16) (actual time=0.057..0.058 rows=1 loops=1)
           ->  Index Scan using idx_evm_log_poller_blocks_order_by_block on log_poller_blocks  (cost=0.43..77036.27 rows=98594 width=16) (actual time=0.056..0.056 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '11155111'::numeric)
   ->  Index Scan using evm_logs_idx_data_word_five on logs  (cost=0.56..38.01 rows=3 width=406) (actual time=0.111..0.264 rows=97 loops=1)
         Index Cond: ((evm_chain_id = '11155111'::numeric) AND (address = '\x81660dc846f0528a7ce003c1f7774d7c4135f344'::bytea) AND (event_sig = '\xd0c3c799bf9e2639de44391e7f524d229b2b55f5b1ea94b2bf7da42f7243dddd'::bytea) AND ("substring"(data, 129, 32) >= '\x0000000000000000000000000000000000000000000000000000000000000001'::bytea) AND ("substring"(data, 129, 32) <= '\x0000000000000000000000000000000000000000000000000000000000000061'::bytea))
         Filter: (block_number <= $0)
 Planning Time: 0.423 ms
 Execution Time: 0.410 ms

Copy link
Contributor

I see you updated files related to core. Please run pnpm changeset in the root directory to add a changeset.

Copy link
Contributor

I see you added a changeset file but it does not contain a tag. Please edit the text include at least one of the following tags:

#nops : For any feature that is NOP facing and needs to be in the official Release Notes for the release.
#added : For any new functionality added.
#changed : For any change to the existing functionality. 
#removed : For any functionality/config that is removed.
#updated : For any functionality that is updated.
#deprecation_notice : For any upcoming deprecation functionality.
#breaking_change : For any functionality that requires manual action for the node to boot.
#db_update : For any feature that introduces updates to database schema.
#wip : For any change that is not ready yet and external communication about it should be held off till it is feature complete.

on evm.logs (evm_chain_id, address, event_sig, (topics[3]));

create index evm_logs_idx_topic_four
on evm.logs (evm_chain_id, address, event_sig, (topics[4]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to data words one, two, and four.?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed them because data word queries are used only by CCIP and we rely only on 3 and 5

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel it's too risky we can consider bringing them back. I just wanted to reduce number of redundant indexes from that table

@reductionista
Copy link
Contributor

Good idea! Makes sense.

I don't think we should remove the indices for data words one, two, and four though. Even in CCIP, I see a place where it presently uses data word four:
https://github.com/smartcontractkit/ccip/blob/ccip-develop/core/services/ocr2/plugins/ccip/internal/ccipdata/v1_2_0/commit_store.go#L286-L293

But we want LogPoller to be a general core service, not something where we have to add a new index whenever a product needs it. I guess only indexing the first five is arbitrary, but at least it's easier to document than "we only support indexing on words three and five".

We're looking at ways to make this more dynamic in ChainReader, so that different products can specify which indices should be accelerated in the job spec. But we probably won't bother making that change for evm unless we have to... it will mostly just apply to non-evm chains where we expect indexing to be more complex.

@mateusz-sekara
Copy link
Collaborator Author

mateusz-sekara commented Apr 12, 2024

Good idea! Makes sense.

I don't think we should remove the indices for data words one, two, and four though. Even in CCIP, I see a place where it presently uses data word four: https://github.com/smartcontractkit/ccip/blob/ccip-develop/core/services/ocr2/plugins/ccip/internal/ccipdata/v1_2_0/commit_store.go#L286-L293

But we want LogPoller to be a general core service, not something where we have to add a new index whenever a product needs it. I guess only indexing the first five is arbitrary, but at least it's easier to document than "we only support indexing on words three and five".

We're looking at ways to make this more dynamic in ChainReader, so that different products can specify which indices should be accelerated in the job spec. But we probably won't bother making that change for evm unless we have to... it will mostly just apply to non-evm chains where we expect indexing to be more complex.

Totally agree. I missed those other indexes because they are probably not heavily used and I relied on db stats when working on that PR. Let's use 1-5 as you suggested. In the long term, it would be great to define indexes on a product level, as you mentioned. This way every product will have a set of indexes that match its queries

@cl-sonarqube-production
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
8.89% Technical Debt Ratio on New Code (required ≤ 4%)
B Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube

Catch issues before they fail your Quality Gate with our IDE extension SonarLint SonarLint

@mateusz-sekara mateusz-sekara added this pull request to the merge queue Apr 16, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 16, 2024
@mateusz-sekara mateusz-sekara added this pull request to the merge queue Apr 16, 2024
Merged via the queue into develop with commit fbb705c Apr 16, 2024
104 of 105 checks passed
@mateusz-sekara mateusz-sekara deleted the lp-smarter-indexes branch April 16, 2024 15:02
RensR added a commit to smartcontractkit/ccip that referenced this pull request Apr 23, 2024
* Adding evm_chain_id, address and event_sig to data and topic indexes
smartcontractkit/chainlink#12786
* Improved fetching Commit Reports from database
#726
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants