Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archivel nodes should GC ColPartialChunks #6242

Closed
mina86 opened this issue Feb 3, 2022 · 0 comments
Closed

Archivel nodes should GC ColPartialChunks #6242

mina86 opened this issue Feb 3, 2022 · 0 comments
Assignees
Labels
A-storage Area: storage and databases Node Node team T-node Team: issues relevant to the node experience team

Comments

@mina86
Copy link
Contributor

mina86 commented Feb 3, 2022

ColPartialChunks is the largest column in archival nodes while at the same time all information stored in it are available in ColChunks column. It would be big win if we could GC the column in archival nodes.

@bowenwang1996 bowenwang1996 added A-storage Area: storage and databases T-node Team: issues relevant to the node experience team labels Feb 4, 2022
@mina86 mina86 self-assigned this Feb 8, 2022
mina86 added a commit to mina86/nearcore that referenced this issue Feb 28, 2022
mina86 added a commit to mina86/nearcore that referenced this issue Feb 28, 2022
Extract two more methods from `process_partial_encoded_chunk_request`
which correspond to `if` bodies that used to be in it.  This makes each
function shorter and thus easier to read especially as in the future
more branches will be added to the method.

Furthermore, move sending of the message to the method changing
`maybe_send_partial_encoded_chunk_response` into method which prepares
the response only.

Issue: near#6242
near-bulldozer bot pushed a commit that referenced this issue Mar 1, 2022
Extract two more methods from `process_partial_encoded_chunk_request`
which correspond to `if` bodies that used to be in it.  This makes each
function shorter and thus easier to read especially as in the future
more branches will be added to the method.

Furthermore, move sending of the message to the method changing
`maybe_send_partial_encoded_chunk_response` into method which prepares
the response only.

This is a pure refactoring with no changes to the behaviour.

Issue: #6242
mina86 added a commit that referenced this issue Mar 2, 2022
Replace BaseNode.get_all_heights method with BaseNode.get_all_blocks which
returns hashes alongside heights of all the blocks known to a node.  This
feature will be used in future commit.

Issue: #6242
mina86 added a commit that referenced this issue Mar 2, 2022
Replace BaseNode.get_all_heights method with BaseNode.get_all_blocks which
returns hashes alongside heights of all the blocks known to a node.  This
feature will be used in future commit.

Issue: #6242
near-bulldozer bot pushed a commit that referenced this issue Mar 2, 2022
…#6370)

Extend sanity/block_sync_archival.py test documentation describing in
more detail what it’s doing and mentioning that both nodes are
archival.  Furthermore, avoid starting the observer node at the
beginning just to kill it immediately; it’s now started only once the
validator generates the blocks.  Finally, add explicit comparison of
all the blocks in the validator and observer nodes.

Issue: #6242
mina86 added a commit that referenced this issue Mar 2, 2022
The ShardManager keeps cache of encoded chunks going back 1024 heights
which means that nodes which request partial chunks for a recent block
will be satisfied by data in the cache.  This means that since only
a hundred blocks are generated in block_sync_archival.py test, the
code path where data is read from storage is never executed.

Extend the test such that it generates 1500 blocks to make sure that
both code paths are executed.

Issue: #6242
mina86 added a commit that referenced this issue Mar 2, 2022
The ShardManager keeps cache of encoded chunks going back 1024 heights
which means that nodes which request partial chunks for a recent block
will be satisfied by data in the cache.  This means that since only
a hundred blocks are generated in block_sync_archival.py test, the
code path where data is read from storage is never executed.

Extend the test such that it generates 1500 blocks to make sure that
both code paths are executed.

Issue: #6242
mina86 added a commit that referenced this issue Mar 2, 2022
…6376)

The ShardManager keeps cache of encoded chunks going back 1024 heights
which means that nodes which request partial chunks for a recent block
will be satisfied by data in the cache.  This means that since only
a hundred blocks are generated in block_sync_archival.py test, the
code path where data is read from storage is never executed.

Extend the test such that it generates 1500 blocks to make sure that
both code paths are executed.

Issue: #6242
mina86 added a commit that referenced this issue Mar 2, 2022
Add ability to respond to PartialEncodedChunkRequest from ShardChunk
objects in addition to PartialEncodedChunk.  In practice this is currently
dead code since there is no scenario in which the former is in the storage
while the latter isn’t but the plan is to start garbage collecting
ColPartialChunks column at which point we’ll have to serve requests from
data in ColChunks column.

Issue: #6242
mina86 added a commit that referenced this issue Mar 9, 2022
…unk (#6377)

Add ability to respond to PartialEncodedChunkRequest from ShardChunk
objects in addition to PartialEncodedChunk.  In practice this is currently
dead code since there is no scenario in which the former is in the storage
while the latter isn’t but the plan is to start garbage collecting
ColPartialChunks column at which point we’ll have to serve requests from
data in ColChunks column.

Issue: #6242
mina86 added a commit to mina86/nearcore that referenced this issue Mar 14, 2022
Add near_partial_encoded_chunk_request_processing_time metric which
returns how much time processing partial encoded chunk requests took.
The metric is split by the method used to create a response and also
whether in the end the response has been prepared or not.

Issue: near#6242
near-bulldozer bot pushed a commit that referenced this issue Mar 15, 2022
…6431)

Add near_partial_encoded_chunk_request_processing_time metric which
returns how much time processing partial encoded chunk requests took.
The metric is split by the method used to create a response and also
whether in the end the response has been prepared or not.

Issue: #6242
mina86 added a commit that referenced this issue Mar 16, 2022
Specify max_block_production_delay in addition to min delay in node’s
configuration in block_sync_archival.py test. This speeds up
generation of blocks by the node and shortens the test’s run time.

Issue: #6242
mina86 added a commit to mina86/nearcore that referenced this issue Mar 17, 2022
By mistake archive_gc_partial_chunks setting has been added to the
test in previous commit changing it.  The option is meant for future
commits and currently causes test failures.  Fix that.

Issue: near#6242
bowenwang1996 pushed a commit that referenced this issue Mar 17, 2022
By mistake archive_gc_partial_chunks setting has been added to the
test in previous commit changing it.  The option is meant for future
commits and currently causes test failures.  Fix that.

Issue: #6242
mina86 added a commit that referenced this issue Mar 17, 2022
Add code for observing the partial chunks request processing time metrics
to make sure that the expected code paths are executed when handling the
request.

Issue: #6242
near-bulldozer bot pushed a commit that referenced this issue Mar 18, 2022
Add code for observing the partial chunks request processing time metrics
to make sure that the expected code paths are executed when handling the
request.

Issue: #6242
mina86 added a commit to mina86/nearcore that referenced this issue Mar 22, 2022
Add --clean-partial-chunks and --clear-trie-changes options to clear out
the two respective columns.  Data in ColPartialChunks can be recomputed
and data in ColTrieChanges is only used by non-archival nodes and can be
deleted when running archival node.

Issue: near#6119
Issue: near#6242
Issue: near#6250
near-bulldozer bot pushed a commit that referenced this issue Apr 6, 2022
Start garbage collecting ColPartialChunks and ColInvalidChunks on
archival nodes.  The former is quite sizeable column and its data can
be recovered from ColChunks.  The latter is only needed when operating
at head.

Note that this is likely insufficient for the garbage collection to
happen in reasonable time (since with current default options we’re
garbage collecting only two heights at a time).  It’s best to clean
out the two columns.

Issue: #6242
near-bulldozer bot pushed a commit that referenced this issue Apr 7, 2022
When recompressing database of an archival node, skip
ColPartialChunks, ColInvalidChunks and ColTrieChanges columns which
can be safely deleted.  Data in the first one can be reconstructed
from ColChunks, ColInvalidChunks is only needed at head and the last
is never read by archival nodes.

Mostly for testing, if someone wants to keep those columns,
offer --keep-partial-chunks, --keep-invalid-chunks
and --keep-trie-changes switches.  They are always on when dealing
with non-archival node.

Issue: #6119
Issue: #6242
Issue: #6250
mina86 added a commit to mina86/nearcore that referenced this issue Apr 7, 2022
…6356)

Extract two more methods from `process_partial_encoded_chunk_request`
which correspond to `if` bodies that used to be in it.  This makes each
function shorter and thus easier to read especially as in the future
more branches will be added to the method.

Furthermore, move sending of the message to the method changing
`maybe_send_partial_encoded_chunk_response` into method which prepares
the response only.

This is a pure refactoring with no changes to the behaviour.

This is commit 62aa75a upstream.

Issue: near#6242
mina86 added a commit to mina86/nearcore that referenced this issue Apr 7, 2022
…unk (near#6377)

This is commit 09041ec upstream.

Add ability to respond to PartialEncodedChunkRequest from ShardChunk
objects in addition to PartialEncodedChunk.  In practice this is currently
dead code since there is no scenario in which the former is in the storage
while the latter isn’t but the plan is to start garbage collecting
ColPartialChunks column at which point we’ll have to serve requests from
data in ColChunks column.

Issue: near#6242
mina86 added a commit to mina86/nearcore that referenced this issue Apr 7, 2022
…ear#6431)

This is commit e92e894 upstream.

Add near_partial_encoded_chunk_request_processing_time metric which
returns how much time processing partial encoded chunk requests took.
The metric is split by the method used to create a response and also
whether in the end the response has been prepared or not.

Issue: near#6242
mina86 added a commit to mina86/nearcore that referenced this issue Apr 7, 2022
This is commit 6be2e0e upstream.

Start garbage collecting ColPartialChunks and ColInvalidChunks on
archival nodes.  The former is quite sizeable column and its data can
be recovered from ColChunks.  The latter is only needed when operating
at head.

Note that this is likely insufficient for the garbage collection to
happen in reasonable time (since with current default options we’re
garbage collecting only two heights at a time).  It’s best to clean
out the two columns.

Issue: near#6242
mina86 added a commit to mina86/nearcore that referenced this issue Apr 7, 2022
This is commit da7a465 upstream.

When recompressing database of an archival node, skip
ColPartialChunks, ColInvalidChunks and ColTrieChanges columns which
can be safely deleted.  Data in the first one can be reconstructed
from ColChunks, ColInvalidChunks is only needed at head and the last
is never read by archival nodes.

Mostly for testing, if someone wants to keep those columns,
offer --keep-partial-chunks, --keep-invalid-chunks
and --keep-trie-changes switches.  They are always on when dealing
with non-archival node.

Issue: near#6119
Issue: near#6242
Issue: near#6250
mina86 added a commit to mina86/nearcore that referenced this issue Apr 8, 2022
Since commit 6be2e0e: ‘gc partial chunks on archival nodes (near#6439)’,
archival nodes set chunk_tail without setting tail.  However, store
validation expects both of those to be set or unset.  Change the code
to allow unset tail on archival nodes.

Issue: near#6242
near-bulldozer bot pushed a commit that referenced this issue Apr 11, 2022
#6563)

Since commit 6be2e0e: ‘gc partial chunks on archival nodes (#6439)’,
archival nodes set chunk_tail without setting tail.  However, store
validation expects both of those to be set or unset.  Change the code
to allow unset tail on archival nodes.

Issue: #6242
mina86 added a commit to mina86/nearcore that referenced this issue Apr 14, 2022
…ation

Partial encoded chunks can be calculated on the fly when requested and we
are now garbage collecting them in archival nodes.  There’s no point in
populating the column during 9→10 database version migration.

Issue: near#6242
pompon0 pushed a commit that referenced this issue Apr 15, 2022
#6563)

Since commit 6be2e0e: ‘gc partial chunks on archival nodes (#6439)’,
archival nodes set chunk_tail without setting tail.  However, store
validation expects both of those to be set or unset.  Change the code
to allow unset tail on archival nodes.

Issue: #6242
near-bulldozer bot pushed a commit that referenced this issue Apr 16, 2022
…ation (#6615)

Partial encoded chunks can be calculated on the fly when requested and we
are now garbage collecting them in archival nodes.  There’s no point in
populating the column during 9→10 database version migration.

Issue: #6242
@mina86 mina86 closed this as completed Apr 20, 2022
@gmilescu gmilescu added the Node Node team label Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Area: storage and databases Node Node team T-node Team: issues relevant to the node experience team
Projects
None yet
Development

No branches or pull requests

3 participants