-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving deletes performance by limiting number of records to scan #12007
Conversation
I see that you haven't updated any CHANGELOG files. Would it make sense to do so? |
f60f1a0
to
00b6016
Compare
if err != nil { | ||
o.lggr.Warnw("Unable to clear reorged blocks, retrying", "err", err) | ||
return err | ||
} | ||
|
||
_, err = tx.NamedExec(`DELETE FROM evm.logs WHERE block_number >= :start_block AND evm_chain_id = :evm_chain_id`, args) | ||
_, err = tx.NamedExec(`DELETE FROM evm.logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative approach would be to get latest block from logs in nested query (without explicit call to o.SelectLatestBlock(pg.WithQueryer(tx))
)
DELETE FROM evm.logs
WHERE evm_chain_id = :evm_chain_id
AND block_number >= :start_block
AND block_number <= (SELECT MAX(block_number) FROM evm.logs WHERE evm_chain_id = :evm_chain_id)
Seems to have similar performance, the only difference is that it's more defensive because it gets latest block from logs
, instead of log_poller_block
. This will handle all the cases when the block is not persisted for some reason (not sure how real this scenario is)
Previous implementation takes a block from evm.log_poller_blocks
=> begin;
explain analyze DELETE FROM evm.logs
WHERE evm_chain_id = 11155111
AND block_number >= 5269551
AND block_number <= (SELECT MAX(block_number) FROM evm.logs WHERE evm_chain_id = 11155111);
rollback;
BEGIN
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Delete on logs (cost=1295.57..75840.88 rows=21790 width=6) (actual time=0.425..0.426 rows=0 loops=1)
InitPlan 2 (returns $1)
-> Result (cost=1.18..1.19 rows=1 width=8) (actual time=0.031..0.032 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.56..1.18 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)
-> Index Only Scan Backward using evm_logs_idx on logs logs_1 (cost=0.56..2681558.66 rows=4358040 width=8) (actual time=0.029..0.029 rows=1 loops=1)
Index Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number IS NOT NULL))
Heap Fetches: 1
-> Bitmap Heap Scan on logs (cost=1294.38..75839.69 rows=21790 width=6) (actual time=0.072..0.106 rows=87 loops=1)
Recheck Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number >= 5269551) AND (block_number <= $1))
Heap Blocks: exact=9
-> Bitmap Index Scan on evm_logs_idx (cost=0.00..1288.93 rows=21790 width=0) (actual time=0.056..0.057 rows=87 loops=1)
Index Cond: ((evm_chain_id = '11155111'::numeric) AND (block_number >= 5269551) AND (block_number <= $1))
Planning Time: 0.242 ms
Execution Time: 0.460 ms
(15 rows)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this way is safer... and maybe even more efficient since it's only 1 network request instead of 2.
00b6016
to
a9ed8d6
Compare
a9ed8d6
to
43dd3c1
Compare
@@ -1435,7 +1435,7 @@ func TestInsertLogsInTx(t *testing.T) { | |||
for _, tt := range tests { | |||
t.Run(tt.name, func(t *testing.T) { | |||
// clean all logs and blocks between test cases | |||
defer func() { _ = o.DeleteLogsAndBlocksAfter(0) }() | |||
defer func() { _, _ = db.NamedExec("truncate evm.logs", struct{}{}) }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
db.Exec("truncate evm.logs")
would be simpler
I'm really surprised that adding the bounds made a difference in the plan, as the planner is supposed to consult the |
35dd292
to
e4d83ac
Compare
Quality Gate passedIssues Measures |
…#12007) (#506) Please see more details here smartcontractkit/chainlink#12007
* develop: (74 commits) VRF zero confirmation delay (#11947) add toml configs to paths that can cause e2e tests to run in ci (#12001) bump golang.org/x/... (#12042) [chore] Replace clock with specialized lib (#12031) Update style guide (#12041) plugins/cmd/chainlink-mercury: (re)move to chainlink-data-streams repo (#11994) bump go-plugin (#12033) Adds timeout on fuzz script execution (#12024) Add bytes type to abi_type (#12029) AUTO-8804: create chain specific modules for l1 gas calculations (#11896) VRF-878 Gas Optimization V2 Plus (#11982) Improving deletes performance by limiting number of records to scan (#12007) core/web: improve health CLI readabilty (#12021) Handle a 0 exit code from the remote runner instead of always failing (#12015) Add a simple Codec test (#12006) Allow for custom config poller onchain codec (LLO support) (#11957) Update Sonar properties (#11986) golangci-lint: revive: add early-return; fix issues (#12017) Implement NewPluginProvider (EVM) (#11995) Fix lock file version and minor NPM bumps (#11980) ...
…12007) * Improving deletes performance by limiting number of records to scan * Post review fixes - replacing direct latestBlock call with nested query
Motivation
We've noticed that Postgres has performance issues when running delete queries during reorgs. Although, the reorgs we noticed were very shallow (e.g. up to 10 blocks), Postgres wasn't able to come up with an efficient execution plan. It seems that the large number of logs in the table (~ 12kk in entire
evm.logs
), caused problems with estimating how many records will be fetched during that query.Based on different experiments, applying an upper filter helps Postgres a lot. Besides that, we've also tested the brin index, but it didn't help in that case.
State of the environment during tests
Deleting 100 logs from
evm.logs
Lack of an upper limit impacts Postgres' query analyzer in a way in which it can't properly estimate the dataset size. Postgres decides to sequentially scan the entire index for specific
evm_chain_id
to gather matching logs for deletion. (upper bound limit for tests is just the latest (max) block_number from the db)Before
After
After (nested query with upper bound)
Deleting 100 blocks from
evm.log_poller_blocks
There is no big difference for
evm.log_poller_block
, probably because of the smaller dataset compared to logs. However, based on the query plans, you might spot that with an upper limit, it's easier for Postgres to pick the proper index, and query plan is more efficient. Also, the estimated cost during planning is lower.Before
After
After (nested query)