Reduce the amount of IO that LedgerCleanupService performs #29239

steviez · 2022-12-13T07:25:04Z

Problem

Currently, the cleanup service counts the number of shreds in the database by iterating the entire SlotMeta column and reading the number of received shreds for each slot. This gives us a fairly accurate count at the expense of performing a good amount of IO.

Summary of Changes

Instead of counting the individual slots, use the live_files() rust-rocksdb entrypoint that we expose in Blockstore. This API allows us to get the number of entries (shreds) in the data shred column family by reading file metadata. This is much more efficient from IO perspective.

Fixes #28403

steviez · 2022-12-13T07:41:24Z

Have a node running right now. It was previously running tip of master and got to a "full" ledger such that LedgerCleanupService was actually needing to do something. At the moment, I have about 1 day of runtime with the new change; the below graphs show 4 days of runtime total so new behavior kicks in at 12 Dec 10:00.

The first graph here shows the returned shred count (old was using counting of SlotMeta, new uses metadata from rocksdb API). The new number reported has a little more variation; I think some of this can be attributed to this API returning data about SST's (and not memtable). Also, this is an observation and not a problem in my eyes.

This second graph shows the disk utilization before (pink) and after (blue) cleanup runs. There is a little bit of a dip around the crossover point, but there is also some variation in the graph before I started using new behavior. The y-axis scale is also pretty small so we're talking about only a couple GB here. As the comment in code calls out, the new behavior is marginally more aggressive in cleaning so I would expect a couple GB vertical offset.

core/src/ledger_cleanup_service.rs

yhchiang-sol

The PR looks good! Only minor comments.

Btw, do you happen to have updated numbers from your experiments? Or the previous numbers are already from the updated PR?

core/src/ledger_cleanup_service.rs

steviez · 2023-01-18T08:41:52Z

Btw, do you happen to have updated numbers from your experiments? Or the previous numbers are already from the updated PR?

Was planning on re-running with rebased on latest; going to utilize the nodes I had been using for testing the SlotMeta one. I consider getting another datapoint as a hard requirement before shipping.

It'd be nice to get a graph with I/O isolated to the blockstore (like we did on the GCP nodes when we put everything on separate drives), but with the dev servers having everything on one drive, it is harder to get a clean measurement. I don't consider this a hard requirement for shipping this PR tho; from inspection, it is very obvious that we are saving IO by not reading the SlotMeta column

yhchiang-sol

The PR looks good! Thanks for adding tests! (esp. the test is tricky to write since there're things that are still in mem-tables, manual flush in tests is a good workaround.)

I don't consider this a hard requirement for shipping this PR tho; from inspection, it is very obvious that we are saving IO by not reading the SlotMeta column

Definitely not a hard requirement, but it would be great if we can explicitly say how much this PR improves things.

steviez · 2023-01-18T21:40:25Z

Definitely not a hard requirement, but it would be great if we can explicitly say how much this PR improves things.

Agreed. We do know how large the SlotMeta column will be, and we do know how often the scans occur. So, we can say how quantify how much / how often we would be reading. I did the math for that in the GH issue that this PR will close

Pull request has been modified.

Currently, the cleanup service counts the number of shreds in the database by iterating the entire SlotMeta column and reading the number of received shreds for each slot. This gives us a fairly accurate count at the expense of performing a good amount of IO. Instead of counting the individual slots, use the live_files() rust-rocksdb entrypoint that we expose in Blockstore. This API allows us to get the number of entries (shreds) in the data shred column family by reading file metadata. This is much more efficient from IO perspective.

steviez · 2023-01-23T08:16:29Z

I had my test node running over the weekend. I let it get the ledger up to capacity on tip of master, and then updated node to use this branch on 2023.01.22 @ 19:00 UTC. Here is the disk usage for the weekend; it shows the ramp as well as shows that the total disk utilization is pretty flat before/after the change:

There is just a slight increase over this 24 hour period in total space; however, that is a function of block size as a control node (the purple trace) shows a similar trend up:

The total number of shreds found before cleanup is slightly noisier, which is a result (and somewhat expected) of us using a much cheaper estimate instead of exact counting for number of shreds:

However, the variation observed here is an extra 250k shreds, which is is 0.125 % of the 200M default ledger size. The inconsequential nature of this variation shows in that our total ledger size in the previous graphs is still pretty similar.

So, the data still looks good! I improved the error handling slightly (for a case that should never exists where highest_slot < lowest_slot from Blockstore functions), and additionally promoted the warn's I added to error. Restarting my validator and letting this run through CI again; these are minor enough that I'll push if validator looks good with the change reflected

steviez force-pushed the lcs_file_meta branch from 0e83747 to 7858c35 Compare December 13, 2022 07:28

steviez force-pushed the lcs_file_meta branch from 7858c35 to 3639d23 Compare December 13, 2022 09:19

steviez commented Dec 13, 2022

View reviewed changes

core/src/ledger_cleanup_service.rs Outdated Show resolved Hide resolved

steviez force-pushed the lcs_file_meta branch from 3639d23 to 4a4d952 Compare December 13, 2022 09:32

steviez commented Dec 13, 2022

View reviewed changes

core/src/ledger_cleanup_service.rs Show resolved Hide resolved

steviez commented Dec 13, 2022

View reviewed changes

core/src/ledger_cleanup_service.rs Outdated Show resolved Hide resolved

steviez requested a review from yhchiang-sol December 13, 2022 10:16

steviez marked this pull request as ready for review December 13, 2022 10:16

steviez force-pushed the lcs_file_meta branch 2 times, most recently from f9be872 to 3cb3baf Compare December 16, 2022 07:09

steviez commented Dec 16, 2022

View reviewed changes

core/src/ledger_cleanup_service.rs Show resolved Hide resolved

steviez force-pushed the lcs_file_meta branch from 3cb3baf to aa133dd Compare December 16, 2022 07:22

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 2, 2023

steviez removed the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 3, 2023

steviez force-pushed the lcs_file_meta branch 2 times, most recently from 83b7d6f to 35533ef Compare January 17, 2023 00:20

yhchiang-sol reviewed Jan 18, 2023

View reviewed changes

core/src/ledger_cleanup_service.rs Show resolved Hide resolved

core/src/ledger_cleanup_service.rs Show resolved Hide resolved

core/src/ledger_cleanup_service.rs Show resolved Hide resolved

core/src/ledger_cleanup_service.rs Show resolved Hide resolved

yhchiang-sol previously approved these changes Jan 18, 2023

View reviewed changes

steviez force-pushed the lcs_file_meta branch from 35533ef to 50bb1ac Compare January 20, 2023 21:40

steviez force-pushed the lcs_file_meta branch from 50bb1ac to 2dfad7a Compare January 23, 2023 08:15

steviez merged commit 206a1c7 into solana-labs:master Jan 23, 2023

steviez deleted the lcs_file_meta branch January 23, 2023 10:39

steviez mentioned this pull request Jun 26, 2024

Deprecate --rocksdb-shred-compaction fifo anza-xyz/agave#1882

Merged

mergify bot mentioned this pull request Jun 28, 2024

v2.0: Deprecate --rocksdb-shred-compaction fifo (backport of #1882) anza-xyz/agave#1907

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the amount of IO that LedgerCleanupService performs #29239

Reduce the amount of IO that LedgerCleanupService performs #29239

steviez commented Dec 13, 2022 •

edited

Loading

steviez commented Dec 13, 2022 •

edited

Loading

yhchiang-sol left a comment

steviez commented Jan 18, 2023

yhchiang-sol left a comment

steviez commented Jan 18, 2023

steviez commented Jan 23, 2023 •

edited

Loading

Reduce the amount of IO that LedgerCleanupService performs #29239

Reduce the amount of IO that LedgerCleanupService performs #29239

Conversation

steviez commented Dec 13, 2022 • edited Loading

Problem

Summary of Changes

steviez commented Dec 13, 2022 • edited Loading

yhchiang-sol left a comment

Choose a reason for hiding this comment

steviez commented Jan 18, 2023

yhchiang-sol left a comment

Choose a reason for hiding this comment

steviez commented Jan 18, 2023

steviez commented Jan 23, 2023 • edited Loading

steviez commented Dec 13, 2022 •

edited

Loading

steviez commented Dec 13, 2022 •

edited

Loading

steviez commented Jan 23, 2023 •

edited

Loading