-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitigate archival node data size growth #6119
Labels
A-storage
Area: storage and databases
C-enhancement
Category: An issue proposing an enhancement or a PR with one.
Node
Node team
P-high
Priority: High
T-node
Team: issues relevant to the node experience team
Comments
bowenwang1996
added
C-enhancement
Category: An issue proposing an enhancement or a PR with one.
A-storage
Area: storage and databases
T-node
Team: issues relevant to the node experience team
labels
Jan 19, 2022
mina86
added a commit
to mina86/rust-rocksdb
that referenced
this issue
Jan 21, 2022
Weirdly, RocksDB C interface does not have a way to just set the `enabled` flag of the bottommost compression options so I’ve decided to replicate that interface in Rust as well. As far as I understand, this unfortunately means that to set bottommost compression one needs to set the type and then also use one of the other methods for configuring the compression. Issue: rust-rocksdb#260 Issue: near/nearcore#6119
near-bulldozer bot
pushed a commit
that referenced
this issue
Mar 21, 2022
Add recompress-storage command which reads all the data from storage and saves it in a new location effectively recompressing all the SST files. This is meant as a temporary tool to speed up migration to zstd compression. Without it, especially on archival nodes, recompressing data may take a long time as most SST files can stay untouched for epochs. Issue: #6119
mina86
added a commit
to mina86/nearcore
that referenced
this issue
Mar 21, 2022
mina86
added a commit
to mina86/nearcore
that referenced
this issue
Mar 22, 2022
mina86
added a commit
to mina86/nearcore
that referenced
this issue
Mar 22, 2022
Add --clean-partial-chunks and --clear-trie-changes options to clear out the two respective columns. Data in ColPartialChunks can be recomputed and data in ColTrieChanges is only used by non-archival nodes and can be deleted when running archival node. Issue: near#6119
mina86
added a commit
to mina86/nearcore
that referenced
this issue
Mar 22, 2022
Add --clean-partial-chunks and --clear-trie-changes options to clear out the two respective columns. Data in ColPartialChunks can be recomputed and data in ColTrieChanges is only used by non-archival nodes and can be deleted when running archival node. Issue: near#6119 Issue: near#6242 Issue: near#6250
mina86
added a commit
to mina86/nearcore
that referenced
this issue
Mar 23, 2022
near-bulldozer bot
pushed a commit
that referenced
this issue
Apr 7, 2022
When recompressing database of an archival node, skip ColPartialChunks, ColInvalidChunks and ColTrieChanges columns which can be safely deleted. Data in the first one can be reconstructed from ColChunks, ColInvalidChunks is only needed at head and the last is never read by archival nodes. Mostly for testing, if someone wants to keep those columns, offer --keep-partial-chunks, --keep-invalid-chunks and --keep-trie-changes switches. They are always on when dealing with non-archival node. Issue: #6119 Issue: #6242 Issue: #6250
mina86
added a commit
to mina86/nearcore
that referenced
this issue
Apr 7, 2022
This is commit 2030c75 upstream. Add recompress-storage command which reads all the data from storage and saves it in a new location effectively recompressing all the SST files. This is meant as a temporary tool to speed up migration to zstd compression. Without it, especially on archival nodes, recompressing data may take a long time as most SST files can stay untouched for epochs. Issue: near#6119
mina86
added a commit
to mina86/nearcore
that referenced
this issue
Apr 7, 2022
mina86
added a commit
to mina86/nearcore
that referenced
this issue
Apr 7, 2022
mina86
added a commit
to mina86/nearcore
that referenced
this issue
Apr 7, 2022
This is commit da7a465 upstream. When recompressing database of an archival node, skip ColPartialChunks, ColInvalidChunks and ColTrieChanges columns which can be safely deleted. Data in the first one can be reconstructed from ColChunks, ColInvalidChunks is only needed at head and the last is never read by archival nodes. Mostly for testing, if someone wants to keep those columns, offer --keep-partial-chunks, --keep-invalid-chunks and --keep-trie-changes switches. They are always on when dealing with non-archival node. Issue: near#6119 Issue: near#6242 Issue: near#6250
mina86
added a commit
to mina86/nearcore
that referenced
this issue
Apr 8, 2022
The target is used by recompress_storage. Initially the command used no target but then target was added which caused the log lines to be ignored by default. Issue: near#6119
mina86
added a commit
to mina86/nearcore
that referenced
this issue
Apr 8, 2022
Store::iter strips reference count from reference counted columns and writing returned value to a new database leads to corrupted data since the count is no longer present. The correct way to deal with raw values saved in the database is to use Store::iter_without_rc_logic. Switch recompress_storage to do that. Issue: near#6119
near-bulldozer bot
pushed a commit
that referenced
this issue
Apr 8, 2022
Store::iter strips reference count from reference counted columns and writing returned value to a new database leads to corrupted data since the count is no longer present. The correct way to deal with raw values saved in the database is to use Store::iter_without_rc_logic. Switch recompress_storage to do that. Issue: #6119
near-bulldozer bot
pushed a commit
that referenced
this issue
Apr 11, 2022
The target is used by recompress_storage. Initially the command used no target but then target was added which caused the log lines to be ignored by default. Issue: #6119
mina86
added a commit
that referenced
this issue
Apr 13, 2022
Introduce a test which verifies that recompress_storage command does not corrupt the database and that nodes can continue working after it’s used. Issue: #6119
mina86
added a commit
that referenced
this issue
Apr 13, 2022
Introduce a test which verifies that recompress_storage command does not corrupt the database and that nodes can continue working after it’s used. Issue: #6119
near-bulldozer bot
pushed a commit
that referenced
this issue
Apr 16, 2022
Introduce a test which verifies that recompress-storage command doesn’t corrupt the database and that nodes can continue working after it’s used. Issue: #6119
mina86
added a commit
to near/node-docs
that referenced
this issue
Apr 20, 2022
exalate-issue-sync
bot
added
T-nodeX
and removed
T-node
Team: issues relevant to the node experience team
labels
Jun 28, 2022
matklad
added
T-node
Team: issues relevant to the node experience team
and removed
T-nodeX
labels
Aug 4, 2022
Michał Nazarewicz commented: Last point is now tracked in https://pagodaplatform.atlassian.net/browse/ND-162 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-storage
Area: storage and databases
C-enhancement
Category: An issue proposing an enhancement or a PR with one.
Node
Node team
P-high
Priority: High
T-node
Team: issues relevant to the node experience team
Context can be found [here](https://near.zulipchat.com/#narrow/stream/308695-nearinc.2Fprivate/topic/archival.20node.20data.20size). TLDR: archival node data usage grows at a very fast pace and since we require node operators to use SSDs to operate an archival node, it creates a considerable amount of financial burden for them, which is highly undesirable and unsustainable. Some ideas to mitigate the growth include the following:
The text was updated successfully, but these errors were encountered: