Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving deletes performance by limiting number of records to scan #12007

Merged
merged 3 commits into from
Feb 14, 2024

Conversation

mateusz-sekara
Copy link
Collaborator

@mateusz-sekara mateusz-sekara commented Feb 13, 2024

Motivation

We've noticed that Postgres has performance issues when running delete queries during reorgs. Although, the reorgs we noticed were very shallow (e.g. up to 10 blocks), Postgres wasn't able to come up with an efficient execution plan. It seems that the large number of logs in the table (~ 12kk in entire evm.logs), caused problems with estimating how many records will be fetched during that query.
Based on different experiments, applying an upper filter helps Postgres a lot. Besides that, we've also tested the brin index, but it didn't help in that case.

State of the environment during tests

=> select count(*) from evm.logs where evm_chain_id = 11155111;
 4377938
=> select count(*) from evm.log_poller_blocks where evm_chain_id = 11155111;
 100160

Deleting 100 logs from evm.logs

Lack of an upper limit impacts Postgres' query analyzer in a way in which it can't properly estimate the dataset size. Postgres decides to sequentially scan the entire index for specific evm_chain_id to gather matching logs for deletion. (upper bound limit for tests is just the latest (max) block_number from the db)

Before

=> begin;
explain analyze delete from evm.logs where evm_chain_id = 11155111 and block_number >= 5269551;
rollback;
BEGIN
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Delete on logs  (cost=0.00..855138.32 rows=2721241 width=6) (actual time=2645.049..2645.050 rows=0 loops=1)
   ->  Seq Scan on logs  (cost=0.00..855138.32 rows=2721241 width=6) (actual time=2638.169..2644.723 rows=87 loops=1)
         Filter: ((block_number >= 5269551) AND (evm_chain_id = '11155111'::numeric))
         Rows Removed by Filter: 12961268
 Planning Time: 0.114 ms
 Execution Time: 2645.075 ms
(6 rows)

After

=> begin;
explain analyze delete from evm.logs where evm_chain_id = 11155111 and block_number >= 5269551 and block_number <= 5269651;
rollback;
BEGIN
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Delete on logs  (cost=0.56..153.09 rows=41 width=6) (actual time=0.429..0.429 rows=0 loops=1)
   ->  Index Scan using evm_logs_idx on logs  (cost=0.56..153.09 rows=41 width=6) (actual time=0.049..0.109 rows=87 loops=1)
         Index Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number >= 5269551) AND (block_number <= 5269651))
 Planning Time: 0.119 ms
 Execution Time: 0.450 ms
(5 rows)

After (nested query with upper bound)

begin;
explain analyze delete from evm.logs where evm_chain_id = 11155111 and block_number >= 5269551 and block_number <= ((SELECT MAX(block_number) FROM evm.logs WHERE evm_chain_id = 11155111));
rollback;
BEGIN
                                                                                QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on logs  (cost=1295.57..75840.88 rows=21790 width=6) (actual time=0.482..0.483 rows=0 loops=1)
   InitPlan 2 (returns $1)
     ->  Result  (cost=1.18..1.19 rows=1 width=8) (actual time=0.056..0.056 rows=1 loops=1)
           InitPlan 1 (returns $0)
             ->  Limit  (cost=0.56..1.18 rows=1 width=8) (actual time=0.054..0.054 rows=1 loops=1)
                   ->  Index Only Scan Backward using evm_logs_idx on logs logs_1  (cost=0.56..2681558.66 rows=4358040 width=8) (actual time=0.053..0.053 rows=1 loops=1)
                         Index Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number IS NOT NULL))
                         Heap Fetches: 1
   ->  Bitmap Heap Scan on logs  (cost=1294.38..75839.69 rows=21790 width=6) (actual time=0.125..0.158 rows=87 loops=1)
         Recheck Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number >= 5269551) AND (block_number <= $1))
         Heap Blocks: exact=9
         ->  Bitmap Index Scan on evm_logs_idx  (cost=0.00..1288.93 rows=21790 width=0) (actual time=0.114..0.114 rows=87 loops=1)
               Index Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number >= 5269551) AND (block_number <= $1))
 Planning Time: 0.311 ms
 Execution Time: 0.525 ms
(15 rows)

Deleting 100 blocks from evm.log_poller_blocks

There is no big difference for evm.log_poller_block, probably because of the smaller dataset compared to logs. However, based on the query plans, you might spot that with an upper limit, it's easier for Postgres to pick the proper index, and query plan is more efficient. Also, the estimated cost during planning is lower.

Before

=> begin;

explain analyze delete from evm.log_poller_blocks where evm_chain_id = 11155111 and block_number >= 5269551;

rollback;
BEGIN
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on log_poller_blocks  (cost=2417.10..18855.81 rows=94114 width=6) (actual time=0.436..0.436 rows=0 loops=1)
   ->  Bitmap Heap Scan on log_poller_blocks  (cost=2417.10..18855.81 rows=94114 width=6) (actual time=0.068..0.093 rows=101 loops=1)
         Recheck Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number >= 5269551))
         Heap Blocks: exact=2
         ->  Bitmap Index Scan on idx_evm_log_poller_blocks_order_by_block  (cost=0.00..2393.57 rows=94114 width=0) (actual time=0.047..0.048 rows=101 loops=1)
               Index Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number >= 5269551))
 Planning Time: 0.091 ms
 Execution Time: 0.463 ms
(8 rows)

After

=> begin;

explain analyze delete from evm.log_poller_blocks where evm_chain_id = 11155111 and block_number >= 5269551 and block_number <= 5269651;

rollback;
BEGIN
                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on log_poller_blocks  (cost=0.43..8.45 rows=1 width=6) (actual time=0.469..0.469 rows=0 loops=1)
   ->  Index Scan using idx_evm_log_poller_blocks_order_by_block on log_poller_blocks  (cost=0.43..8.45 rows=1 width=6) (actual time=0.061..0.095 rows=101 loops=1)
         Index Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number >= 5269551) AND (block_number <= 5269651))
 Planning Time: 0.450 ms
 Execution Time: 0.499 ms
(5 rows)

After (nested query)

begin;
explain analyze delete from evm.log_poller_blocks where evm_chain_id = 11155111 and block_number >= 5269551 and block_number <= (select max(block_number) from evm.log_poller_blocks where evm_chain_id = 11155111);
rollback;

                                                                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on log_poller_blocks  (cost=16.01..1764.06 rows=512 width=6) (actual time=0.507..0.508 rows=0 loops=1)
   InitPlan 2 (returns $1)
     ->  Result  (cost=1.04..1.05 rows=1 width=8) (actual time=0.065..0.066 rows=1 loops=1)
           InitPlan 1 (returns $0)
             ->  Limit  (cost=0.43..1.04 rows=1 width=8) (actual time=0.063..0.064 rows=1 loops=1)
                   ->  Index Only Scan using idx_evm_log_poller_blocks_order_by_block on log_poller_blocks log_poller_blocks_1  (cost=0.43..63122.37 rows=102319 width=8) (actual time=0.062..0.062 rows=1 loops=1)
                         Index Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number IS NOT NULL))
                         Heap Fetches: 1
   ->  Bitmap Heap Scan on log_poller_blocks  (cost=14.96..1763.01 rows=512 width=6) (actual time=0.095..0.121 rows=101 loops=1)
         Recheck Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number >= 5269551) AND (block_number <= $1))
         Heap Blocks: exact=2
         ->  Bitmap Index Scan on idx_evm_log_poller_blocks_order_by_block  (cost=0.00..14.83 rows=512 width=0) (actual time=0.087..0.087 rows=101 loops=1)
               Index Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number >= 5269551) AND (block_number <= $1))
 Planning Time: 0.159 ms
 Execution Time: 0.546 ms
(15 rows)

Copy link
Contributor

I see that you haven't updated any CHANGELOG files. Would it make sense to do so?

if err != nil {
o.lggr.Warnw("Unable to clear reorged blocks, retrying", "err", err)
return err
}

_, err = tx.NamedExec(`DELETE FROM evm.logs WHERE block_number >= :start_block AND evm_chain_id = :evm_chain_id`, args)
_, err = tx.NamedExec(`DELETE FROM evm.logs
Copy link
Collaborator Author

@mateusz-sekara mateusz-sekara Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative approach would be to get latest block from logs in nested query (without explicit call to o.SelectLatestBlock(pg.WithQueryer(tx)))

DELETE FROM evm.logs 
   WHERE evm_chain_id = :evm_chain_id 
   AND block_number >= :start_block
   AND block_number <= (SELECT MAX(block_number) FROM evm.logs WHERE evm_chain_id = :evm_chain_id)

Seems to have similar performance, the only difference is that it's more defensive because it gets latest block from logs, instead of log_poller_block. This will handle all the cases when the block is not persisted for some reason (not sure how real this scenario is)
Previous implementation takes a block from evm.log_poller_blocks

=> begin;

explain analyze DELETE FROM evm.logs
   WHERE evm_chain_id = 11155111
   AND block_number >= 5269551
   AND block_number <= (SELECT MAX(block_number) FROM evm.logs WHERE evm_chain_id = 11155111);

rollback;
BEGIN
                                                                                QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on logs  (cost=1295.57..75840.88 rows=21790 width=6) (actual time=0.425..0.426 rows=0 loops=1)
   InitPlan 2 (returns $1)
     ->  Result  (cost=1.18..1.19 rows=1 width=8) (actual time=0.031..0.032 rows=1 loops=1)
           InitPlan 1 (returns $0)
             ->  Limit  (cost=0.56..1.18 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)
                   ->  Index Only Scan Backward using evm_logs_idx on logs logs_1  (cost=0.56..2681558.66 rows=4358040 width=8) (actual time=0.029..0.029 rows=1 loops=1)
                         Index Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number IS NOT NULL))
                         Heap Fetches: 1
   ->  Bitmap Heap Scan on logs  (cost=1294.38..75839.69 rows=21790 width=6) (actual time=0.072..0.106 rows=87 loops=1)
         Recheck Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number >= 5269551) AND (block_number <= $1))
         Heap Blocks: exact=9
         ->  Bitmap Index Scan on evm_logs_idx  (cost=0.00..1288.93 rows=21790 width=0) (actual time=0.056..0.057 rows=87 loops=1)
               Index Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number >= 5269551) AND (block_number <= $1))
 Planning Time: 0.242 ms
 Execution Time: 0.460 ms
(15 rows)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this way is safer... and maybe even more efficient since it's only 1 network request instead of 2.

@@ -1435,7 +1435,7 @@ func TestInsertLogsInTx(t *testing.T) {
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// clean all logs and blocks between test cases
defer func() { _ = o.DeleteLogsAndBlocksAfter(0) }()
defer func() { _, _ = db.NamedExec("truncate evm.logs", struct{}{}) }()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db.Exec("truncate evm.logs") would be simpler

@reductionista
Copy link
Contributor

I'm really surprised that adding the bounds made a difference in the plan, as the planner is supposed to consult the pg_stats table which includes a histogram of values in each table. In principle, it should know what the range of values is in the table... I can't argue with the EXPLAIN ANALYZE though, it clearly makes a difference somehow.

@mateusz-sekara mateusz-sekara added this pull request to the merge queue Feb 14, 2024
Merged via the queue into develop with commit a3bd1f7 Feb 14, 2024
93 checks passed
@mateusz-sekara mateusz-sekara deleted the lp-delete-query-fix branch February 14, 2024 08:56
mateusz-sekara added a commit to smartcontractkit/ccip that referenced this pull request Feb 14, 2024
momentmaker added a commit that referenced this pull request Feb 15, 2024
* develop: (74 commits)
  VRF zero confirmation delay  (#11947)
  add toml configs to paths that can cause e2e tests to run in ci (#12001)
  bump golang.org/x/... (#12042)
  [chore] Replace clock with specialized lib (#12031)
  Update style guide (#12041)
  plugins/cmd/chainlink-mercury: (re)move to chainlink-data-streams repo (#11994)
  bump go-plugin (#12033)
  Adds timeout on fuzz script execution (#12024)
  Add bytes type to abi_type (#12029)
  AUTO-8804: create chain specific modules for l1 gas calculations (#11896)
  VRF-878 Gas Optimization V2 Plus (#11982)
  Improving deletes performance by limiting number of records to scan (#12007)
  core/web: improve health CLI readabilty (#12021)
  Handle a 0 exit code from the remote runner instead of always failing (#12015)
  Add a simple Codec test (#12006)
  Allow for custom config poller onchain codec (LLO support) (#11957)
  Update Sonar properties (#11986)
  golangci-lint: revive: add early-return; fix issues (#12017)
  Implement NewPluginProvider (EVM) (#11995)
  Fix lock file version and minor NPM bumps (#11980)
  ...
asoliman92 pushed a commit that referenced this pull request Jul 31, 2024
…12007)

* Improving deletes performance by limiting number of records to scan

* Post review fixes - replacing direct latestBlock call with nested query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants