Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chaindag: don't keep backfill block table in memory #3429

Merged
merged 3 commits into from
Feb 26, 2022

Conversation

arnetheduck
Copy link
Member

This PR names and documents the concept of the archive: a range of slots
for which we have degraded functionality in terms of historical access -
in particular:

  • we don't support rewinding to states in this range
  • we don't keep an in-memory representation of the block dag

The archive de-facto exists in a trusted-node-synced node, but this PR
gives it a name and drops the in-memory digest index.

In order to satisfy GetBlocksByRange requests, we ensure that we have
blocks for the entire archive period via backfill. Future versions may
relax this further, adding a "pre-archive" period that is fully pruned.

During by-slot searches in the archive (both for libp2p and rest
requests), an extra database lookup is used to covert the given slot
to a root - future versions will avoid this using era files which
natively are indexed by slot. That said, the lookup is quite
fast compared to the actual block loading given how trivial the table
is - it's hard to measure, even.

A collateral benefit of this PR is that checkpoint-synced nodes will see
100-200MB memory usage savings, thanks to the dropped in-memory cache -
future pruning work will bring this benefit to full nodes as well.

  • document chaindag storage architecture and assumptions
  • look up parent using block id instead of full block in clearance
    (future-proofing the code against a future in which blocks come from era
    files)
  • simplify finalized block init, always writing the backfill portion to
    db at startup (to ensure lookups work as expected)
  • preallocate some extra memory for finalized blocks, to avoid immediate
    realloc

@github-actions
Copy link

github-actions bot commented Feb 22, 2022

Unit Test Results

     12 files  ±0     821 suites  ±0   37m 47s ⏱️ + 5m 15s
1 671 tests ±0  1 625 ✔️ ±0    46 💤 ±0  0 ±0 
9 755 runs  ±0  9 655 ✔️ ±0  100 💤 ±0  0 ±0 

Results for commit a109dfc. ± Comparison against base commit 7de3f00.

♻️ This comment has been updated with latest results.

This PR names and documents the concept of the archive: a range of slots
for which we have degraded functionality in terms of historical access -
in particular:

* we don't support rewinding to states in this range
* we don't keep an in-memory representation of the block dag

The archive de-facto exists in a trusted-node-synced node, but this PR
gives it a name and drops the in-memory digest index.

In order to satisfy `GetBlocksByRange` requests, we ensure that we have
blocks for the entire archive period via backfill. Future versions may
relax this further, adding a "pre-archive" period that is fully pruned.

During by-slot searches in the archive (both for libp2p and rest
requests), an extra database lookup is used to covert the given `slot`
to a `root` - future versions will avoid this using era files which
natively are indexed by `slot`. That said, the lookup is quite
fast compared to the actual block loading given how trivial the table
is - it's hard to measure, even.

A collateral benefit of this PR is that checkpoint-synced nodes will see
100-200MB memory usage savings, thanks to the dropped in-memory cache -
future pruning work will bring this benefit to full nodes as well.

* document chaindag storage architecture and assumptions
* look up parent using block id instead of full block in clearance
(future-proofing the code against a future in which blocks come from era
files)
* simplify finalized block init, always writing the backfill portion to
db at startup (to ensure lookups work as expected)
* preallocate some extra memory for finalized blocks, to avoid immediate
realloc
@arnetheduck arnetheduck merged commit 40a4c01 into unstable Feb 26, 2022
@arnetheduck arnetheduck deleted the hello-archive branch February 26, 2022 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants