feat: badger: add a has check before writing to reduce duplicates #10680

Stebalien · 2023-04-17T17:40:01Z

Related Issues

Proposed Changes

Check if we have blocks before writing them (in badger, only). By default, badger just appends the block which will grow the size of the datastore until garbage collected. Worse, badger garbage collection is expensive and not fully effective.

Additional Info

Downsides:

This code is racy. I.e., if we write the same block multiple times in parallel, we'll end up with multiple versions. We could prevent this with transactions, but that has other performance implications.
This adds a serial read before every write, reducing performance.

However, the average block sync time (on my machine) is 10.5s before and after this patch, so it shouldn't matter.

Checklist

Before you mark the PR ready for review, please make sure that:

Commits have a clear commit message.
PR title is in the form of of <PR type>: <area>: <change being made>
- example: fix: mempool: Introduce a cache for valid signatures
- PR type: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, test
- area, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps
Tests exist for new functionality or change in behavior
CI is green

Stebalien · 2023-04-17T17:41:36Z

I'm seeing 9-10s block times, which seem normal. But I'll continue testing to see if anything unusual happens.

Stebalien · 2023-04-17T18:01:07Z

Looks good. Average 10.5 blocktimes both ways.

Stebalien · 2023-04-17T18:43:42Z

Really, I'm seeing 10.2s block times with this patch, so it may even be speeding things up a bit by not writing duplicate state.

vyzo

Can we update and return a sentinel error if exists?
It might be more efficient, as it saves a round trip in badger.

Stebalien · 2023-04-17T19:57:10Z

Not sure I understand. This patch is checking if we have something before writing it. That's not an error, just something to be optimized.

vyzo · 2023-04-18T07:38:07Z

Not an error, I was suggesting using badger.Update and doing the occurs check in there, doing nothing and possibly returning a sentinel error if it exists.

Basically move the occurs check into the Update that follows this code.

raulk · 2023-04-18T14:07:04Z

blockstore/badger/blockstore.go

+		// Check if we have it before writing it.
+		switch err := db.View(func(txn *badger.Txn) error {
+			_, err := txn.Get(k)
+			return err
+		}); err {
+		case badger.ErrKeyNotFound:
+		case nil:
+			// Already exists, skip the put.
+			return nil
+		default:
+			return err
+		}
+
+		// Then write it.


Should be relatively more efficient to do this check inside db.Update below and discarding the transaction if the key exists, instead of starting a new transaction. Although in practice it probably doesn't matter much.

I considered that but that forces a read/write dependency (a transaction). That does guarantee no overwrites, but I was concerned about performance (IIRC, transactions in badger had some scaling issues).

Stebalien · 2023-04-19T20:47:44Z

I've been running this for several days now and haven't had any issues.

magik6k · 2023-04-20T07:04:46Z

why is this check needed, isn’t vm Flush code only saving new cids? (I guess it’s dag diff, not blockstore based?)
Can we cache the has checks?
Can we test this on a large full node, where the index doesn’t fit in memory?

Stebalien · 2023-04-20T15:54:54Z

why is this check needed, isn’t vm Flush code only saving new cids? (I guess it’s dag diff, not blockstore based?)

VM flush used to stop flushing once it encountered a block already present in the blockstore but that changed with the FVM. Now, dag traversal now happens FVM side so we instead flush all reachable and newly newly written blocks without checking the store. We could check if we already have them, but that would require crossing the rust/go FFI boundary, which tends to be very slow.

I also significantly prefer the new VM flush logic as it's resilient to missing blocks.

Can we test this on a large full node, where the index doesn’t fit in memory?

You're right, we should. I'll try to find someone with such a node.

raulk · 2023-04-20T17:01:23Z

why is this check needed, isn’t vm Flush code only saving new cids? (I guess it’s dag diff, not blockstore based?)

The store bloat comes from putting those new CIDs over and over again in Badger. Badger is an LSM tree and it happily write the entry to L0/L1 and the value log again, even if it already exists. For mutable data, this makes sense (as eventual compation takes care of preserving the latest value, or as many values you ask it to keep), but for immutable data, it never makes sense to rewrite it. It's entirely redundant.

Can we cache the has checks?

The write workloads are extremely user-dependent here. A has cache would make sense if we think that the combination of blockstore write workloads from StateCompute, StateReplay, and block validation, will produce a good hit rate. Because the first two are entirely user-driven, I'm not convinced. What would a good value be?

Can we test this on a large full node, where the index doesn’t fit in memory?

You're worried that the cost of the has check where the index spills to disk will add significant latency? Intuitively I think the risk is small / non-existent as Badger will prioritise keeping the top levels of the index in memory (i.e. most recent data)? This is specifically what block validation will use. It may be noticeable for StateCompute and StateReplay, but those are user workloads? Nevertheless, we should test, yep!

ankit-gautam23

LGTM

magik6k

Probably fine to merge and see how it performs.

Ideally we'd check on a full-node first (can give you access to one if you don't have already)

Stebalien · 2023-04-25T18:56:10Z

Ideally we'd check on a full-node first (can give you access to one if you don't have already)

I've tested it on a snapshot node, but not a node with significant historical state. I don't think this will slow that node down, but I'm happy to test if you give me access.

Stebalien requested a review from a team as a code owner April 17, 2023 17:40

feat: badger: add a has check before writing to reduce duplicates

0cff56a

Stebalien force-pushed the steb/blockstore-has-check branch from c3a7488 to 0cff56a Compare April 17, 2023 17:57

Stebalien requested review from arajasek and vyzo April 17, 2023 18:43

vyzo reviewed Apr 17, 2023

View reviewed changes

raulk approved these changes Apr 18, 2023

View reviewed changes

raulk mentioned this pull request Apr 18, 2023

feat: discard state writes in StateCompute and StateReplay. #10457

Closed

7 tasks

ankit-gautam23 approved these changes Apr 21, 2023

View reviewed changes

magik6k approved these changes Apr 25, 2023

View reviewed changes

arajasek merged commit 7903224 into master Apr 26, 2023

arajasek deleted the steb/blockstore-has-check branch April 26, 2023 20:17

elvin-du mentioned this pull request Aug 21, 2023

feat: badger: add a has check before writing to reduce duplicates / 减少重复写入 filecoin-project/venus#6108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: badger: add a has check before writing to reduce duplicates #10680

feat: badger: add a has check before writing to reduce duplicates #10680

Stebalien commented Apr 17, 2023 •

edited

Loading

Stebalien commented Apr 17, 2023

Stebalien commented Apr 17, 2023

Stebalien commented Apr 17, 2023

vyzo left a comment

Stebalien commented Apr 17, 2023

vyzo commented Apr 18, 2023 •

edited

Loading

raulk Apr 18, 2023

Stebalien Apr 18, 2023

Stebalien commented Apr 19, 2023

magik6k commented Apr 20, 2023

Stebalien commented Apr 20, 2023

raulk commented Apr 20, 2023

ankit-gautam23 left a comment

magik6k left a comment

Stebalien commented Apr 25, 2023

feat: badger: add a has check before writing to reduce duplicates #10680

feat: badger: add a has check before writing to reduce duplicates #10680

Conversation

Stebalien commented Apr 17, 2023 • edited Loading

Related Issues

Proposed Changes

Additional Info

Checklist

Stebalien commented Apr 17, 2023

Stebalien commented Apr 17, 2023

Stebalien commented Apr 17, 2023

vyzo left a comment

Choose a reason for hiding this comment

Stebalien commented Apr 17, 2023

vyzo commented Apr 18, 2023 • edited Loading

raulk Apr 18, 2023

Choose a reason for hiding this comment

Stebalien Apr 18, 2023

Choose a reason for hiding this comment

Stebalien commented Apr 19, 2023

magik6k commented Apr 20, 2023

Stebalien commented Apr 20, 2023

raulk commented Apr 20, 2023

ankit-gautam23 left a comment

Choose a reason for hiding this comment

magik6k left a comment

Choose a reason for hiding this comment

Stebalien commented Apr 25, 2023

Stebalien commented Apr 17, 2023 •

edited

Loading

vyzo commented Apr 18, 2023 •

edited

Loading