Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruned full/bridge nodes: minimum viable pruning #2615

Open
musalbas opened this issue Aug 26, 2023 · 4 comments
Open

Pruned full/bridge nodes: minimum viable pruning #2615

musalbas opened this issue Aug 26, 2023 · 4 comments
Labels
external Issues created by non node team members kind:discussion

Comments

@musalbas
Copy link
Member

musalbas commented Aug 26, 2023

#OperationSaveStorageSpace

This is issue describes the minimum viable pruning needed for full and bridge nodes, which are milestones 1 and 2 in #2033.

First, let us define a storage window in terms of numbers of blocks, such that blocks before that period aren't kept. For example if the storage window is 172,800, this would be 30 days worth of blocks, assuming a 15 second block time. Defining it in days would be better but may not be trivial.

For light nodes, they should only sample blocks within the storage window.

For full and bridge nodes, we should add a flag --pruned [true/false] that persists a config setting that is true by default on init.

When this flag is true, only blocks within the storage window are sampled, and the CAR files of blocks outside of the storage window are automatically deleted from the store. Headers can still be kept and synced.

The most complicated part will be pruning the badger inverted index. To do so, we can split badger into multiple databases, for each rolling storage window (eg, a database for the last 100k blocks, another database for the next 100k blocks before that). This would mean a that getting entries from the inverted index would require up to two reads, but that is okay as it can be parallelized, and we plan to remove the inverted index anyway. This also means that it might not be possible to switch a non-pruned node to a pruned node, because the inverted indexes are stored differently, unless we add logic for this, but it's not necessary to support this for minimum viable pruning.

Pruned nodes should advertise themselves on a new discovery topic for pruned full nodes. Then, shrexeds/nd should have logic to discover non-pruned nodes in addition to pruned nodes if they try to access data from blocks older than the storage window.

@github-actions github-actions bot added needs:triage external Issues created by non node team members labels Aug 26, 2023
@musalbas
Copy link
Member Author

For bridge nodes, they should probably sync only block headers from core nodes, for blocks outside of the storage window

@musalbas
Copy link
Member Author

This also means that it might not be possible to switch a non-pruned node to a pruned node, because the inverted indexes are stored differently, unless we add logic for this, but it's not necessary to support this for minimum viable pruning.

correction: non-pruned nodes could also prune these indexes, given that sampling window is 30 days, and indexes should only be needed for sampling (assuming we remove or reconstruct index for shrex-ND)

@distractedm1nd
Copy link
Collaborator

Bridge nodes should always sync from height 1 regardless of recency window, correct?

@musalbas
Copy link
Member Author

musalbas commented Sep 22, 2023

No, pruned bridge nodes shouldn't Put() blocks outside of recency window.

What do you mean by "sync" in this case though?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external Issues created by non node team members kind:discussion
Projects
None yet
Development

No branches or pull requests

3 participants