Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move to native badger blockstore; leverage zero-copy View() to deserialize in-place #4681

Merged
merged 47 commits into from
Nov 10, 2020

Conversation

raulk
Copy link
Member

@raulk raulk commented Nov 1, 2020

This PR introduces a native badger blockstore that bypasses the datastore abstraction, thus removing layers of indirection, overhead, and allocs from one of the hottest paths in the system.

The native badger blockstore implements the View method, which allows us to directly access an mmap value to deserialize in-place, thus avoiding unnecessary byte slice allocs and copies. See ipfs/go-ipld-cbor#75 for related change.

All usages of the chain datastore have been updated to use the chain blockstore. Low-level commands like state-prune have been adjusted to operate on badger directly.

TODO/decide

  • Benchmark/comparison. See gist: https://gist.github.com/raulk/1d2a2856b8a77d06cd4a1ba2b68a8fdc/.
  • With this change, we can no longer use the "measure datastore" wrapper.
    • Was this being used at all?
    • If yes, we should add a blockstore metrics wrapper (which is anyway planned).
  • Ideally kill the "/blocks/" prefix/namespace. This increases the footprint for nothing. Also, do not serialize keys in base32. Unfortunately, both these things require a migration.
    • @magik6k proposed introducing a version entry on the blockstore so that the code is ready. We'd introduce a lotus shed tool to rewrite the blockstore. That might work, but there are other strategies that might allow us to do this in-place.
    • Keeping off-scope of this issue for now.
  • Future: do we want to separate the state and the blockstores?
    • @raulk: not now; consider when we introduce the two-tier blockstore.
  • Implement a CBOR object cache, likely using Ristretto, or Freecache. See logical segregation of blockstores + freecache-cached chain and state blockstores #4771.

@raulk raulk marked this pull request as draft November 1, 2020 13:21
// Reduce this from 64MiB to 16MiB. That means badger will hold on to
// 20MiB by default instead of 80MiB. This does not appear to have a
// significant performance hit.
opts.MaxTableSize = 16 << 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with this is the number of TableFiles that will be created. Lotus for a while now run with default from badger (64M) instead of 20M because of that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our badger2 datastore adapter set this value here:

https://github.com/ipfs/go-ds-badger2/blob/a4499bb2f2431c7c430d8a5ffe65c69339e52a00/datastore.go#L107

But it turns out we take those defaults and then entirely override them here:

opts.Options = dgbadger.DefaultOptions("").WithTruncate(true).

This also means that we lose other defaults that made sense, e.g. ValueLogLoadingMode = FileIO instead of mmap, because we reverted to badger defaults:

https://pkg.go.dev/github.com/dgraph-io/badger#Options.WithValueLogLoadingMode

This could explain some things...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored the MaxTableSize value here: 0b8a21e.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find the ValueLogLoadingMode = FileIO having any benefit (and the results that suggested it are super old) so unless you have some results confirminig it, I would stay with dgraph/badger defaults.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileIO sabotages the benefits of View(), so we're back to mmap.

lib/blockstore/badger/badger_test_suite.go Outdated Show resolved Hide resolved
lib/blockstore/badger/badger.go Outdated Show resolved Hide resolved
@raulk raulk marked this pull request as ready for review November 3, 2020 21:13
Copy link
Contributor

@Kubuxu Kubuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGWM but lack of support of View from CachingBS and IDWrapperBS will stop us from using the View.

@raulk
Copy link
Member Author

raulk commented Nov 9, 2020

@Kubuxu is right, and this is evident in profiles. This PR is incomplete until I add the View() method to the various overlay blockstore concoctions we have.

image

@raulk raulk marked this pull request as draft November 10, 2020 09:09
lib/bufbstore/buf_bstore.go Outdated Show resolved Hide resolved
@raulk raulk marked this pull request as ready for review November 10, 2020 17:51
bs.chargeGas(bs.pricelist.OnIpldGet())
return v.View(c, func(b []byte) error {
// we have successfully retrieved the value; charge for it, even if the user-provided function fails.
bs.chargeGas(newGasCharge("OnIpldViewEnd", 0, 0).WithExtra(len(b)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two are only signaling charges, they have no consensus importantce so it is fine if they are inside the View.

Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just 1 comment

chain/vm/vm.go Outdated Show resolved Hide resolved
Copy link
Contributor

@arajasek arajasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved magiks suggestion

chain/vm/vm.go Outdated Show resolved Hide resolved
@arajasek arajasek merged commit 735c04f into master Nov 10, 2020
@arajasek arajasek deleted the badger-viewable branch November 10, 2020 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chain/state store improvements: remove unnecessary indirection; leverage zero-copy mmap access to values
5 participants