Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

db: Add delayed blocks pruning #12497

Closed
wants to merge 28 commits into from
Closed

db: Add delayed blocks pruning #12497

wants to merge 28 commits into from

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Oct 14, 2022

The block pinning is a feature required by the RPC Spec V2.
The block pinning feature should be called for subscribers of the chainHead_unstable_follow method.

This PR adds the ability to delay block pruning by a constant amount.
The blocks that were supposed to get pruned at the finalization N will get pruned at N + delay.

Also, expose a new pruning option for the CLI. The number of blocks to delay
pruning is not within the user's control, as the RPC API should provide uniform
characteristics.
This marks the delayed option as default to facilitate the RPC spec.

The current number for delaying is 32, which means that users have approximately
3 minutes (32 * 6 seconds per finalization) to fetch block bodies / headers / make
storage and runtime calls.

Migrating to the new delay pruning is compatible (ie. updating the substrate/polkadot
node to include this option by default is compatible). However, to facilitate going back
and forth between --delay-pruning true to --delay-pruning false this patch
ensures the canonicalization gap is filled on startup:

  • because of the delay-pruning true option, there can be a gap of ~32 blocks between the
    last canonicalized and last finalized blocks
  • this cap needs to be filled if the next option is delay-pruning false, because it will try to
    canonicalize block N and the last canonicalized is N - 32.

Testing

Memory Consumption

The following data is obtained from a substrate --dev node.
Within a separate process, the ps -aux is queried to obtain VSZ and RSS memory.
Also, the localhost:9615/metrics is queried to obtain a detailed memory footprint in bytes.
The query is submitted once every 6 seconds.

The following graphs illustrate the memory consumption in bytes, over multiple iterations.

RSS

newplot

Substrate State Cache Metric

newplot(1)

State DB Non Canonical Cache Metric

newplot(4)

Closes #12476.
Closes #12475.

Part of #12071.

CC: @paritytech/tools-team.

polkadot companion: paritytech/polkadot#6250
cumulus companion: paritytech/cumulus#1825

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv added A0-please_review Pull request needs code review. B3-apinoteworthy C1-low PR touches the given topic and has a low impact on builders. labels Oct 14, 2022
@lexnv lexnv self-assigned this Oct 14, 2022
@arkpar
Copy link
Member

arkpar commented Oct 17, 2022

Would be good to compare memory usage for a polkadot node running with and without the delay.

@@ -30,12 +30,16 @@ pub struct PruningParams {
/// or for all of the canonical blocks (i.e 'archive-canonical').
#[clap(alias = "pruning", long, value_name = "PRUNING_MODE")]
pub state_pruning: Option<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there needs to be separate option for this. It does not really control pruning history length, but rather how soon unfinalized branches are discared. @bkchr wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could make that work. Do you mean to remove the BlocksPruning::Delayed and add it to the PruningMode from state-db?

We could then in the db use if let PruningMode::Delayed(delayed) = self.storage.state_db.pruning_mode() to determine the delayed value with which to call into state_db.canonicalize_block and to prune blocks.

That would also mean we'd have to add a new pruning id here:

const PRUNING_MODE_DELAYED: &[u8] = b"delayed";

client/db/src/lib.rs Outdated Show resolved Hide resolved
@arkpar
Copy link
Member

arkpar commented Oct 17, 2022

Looks good module some naming and CLI options

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@tomaka
Copy link
Contributor

tomaka commented Oct 19, 2022

My two cents, which you can ignore, is that the client code shouldn't contain references about the JSON-RPC API in its documentation.

The way it should be IMO is: the client provides features, and these features happen "coincidentally" to be what the JSON-RPC API server needs.

If you write in the client's documentation that a certain feature exists for the JSON-RPC API, what tends to happen is that the client code gets modified to suit the precise needs of the JSON-RPC API, even when it wouldn't make sense to do if you took a step back. As a consequence, the client gets harder to modify because in order to modify the client you also need to be familiar with how the JSON-RPC API works.

Since I'm not working on the client, feel free to ignore this concern.

@arkpar
Copy link
Member

arkpar commented Oct 26, 2022

I think delay should rather be a separate setting. Otherwise we'd need separate delay options for --block-pruning and --state-pruning. You should be able to have 1000 blocks of state history, but discard unfinalized blocks after 32 blocks. So for state pruning at least that's two separate values.

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
///
/// This option is enabled by default.
#[clap(alias = "delayed-pruning", long)]
pub delayed_canonicalization: Option<bool>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You could consider making this just bool, use ArgAction::SetFalse and rename to something like --disable-delayed-pruning. We get rid of the Option and since it is enabled by default, using --delayed-pruning true is of limited purpose anyway.

// Mark block 6 as finalized.
// Because we delay prune by 2, when we finalize block 6 we are actually
// pruning at block 4. The displaced leaves for block 4 are computed
// at hight (block number - 1) = 3. This is the time when the fork
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// at hight (block number - 1) = 3. This is the time when the fork
// at height (block number - 1) = 3. This is the time when the fork

@bkchr
Copy link
Member

bkchr commented Nov 8, 2022

I don't really get what the difference to --state-pruning 32 --blocks-pruning 32 is?

In general I don't get how this is related to pinning?

@arkpar
Copy link
Member

arkpar commented Nov 9, 2022

I don't really get what the difference to --state-pruning 32 --blocks-pruning 32 is?

--state-pruning sets how many recent finalized states are available
--blocks-pruning sets how many recent finalized block bodies are available

This PR however is not related to finalized state/bodies at all. This PR adds a delay at which unfinalized states and bodies are discarded. This concerns blocks that are on alternative forks which will never be finalized. Instead of discarding such branches immediately on finality, they are now discarded with a delay that should be sufficient for RPC clients to finish processing these soon-to-be-discarded blocks.

In general I don't get how this is related to pinning?

It is not strictly pinning, but it covers the same use case. It eliminates the issue when the node notifies over RPC about a new block and that block is discarded a moment later because it's sibling gets finalized.

@bkchr
Copy link
Member

bkchr commented Nov 9, 2022

--state-pruning sets how many recent finalized states are available
--blocks-pruning sets how many recent finalized block bodies are available

Thank you for explaining this again! I looked into the code, but missed that we always prune displaced stuff directly.

In general I still don't think that this is a great solution. Yes it works, but it will break the moment someone plays around with the CLI args or whatever. This is just hoping that everything works and no one plays around with this stuff. I think this is a bad solution. As written in the other pr of you @lexnv, I would favor of keeping every block pinned as long as it is in some notification. This would solve the problem for RPC and also everything else that may is interested in a block, but couldn't process a notification fast enough. Yes this opens the possibility of some more memory usage, but in general it shouldn't be that much more as we have with this solution.

@arkpar
Copy link
Member

arkpar commented Nov 9, 2022

In general I still don't think that this is a great solution. Yes it works, but it will break the moment someone plays around with the CLI args or whatever. This is just hoping that everything works and no one plays around with this stuff.

I don't get this argument. It's not like all other CLI options are 100% idiot proof.

I'd argue that if you let the clients pin the blocks as they please with no control of used memory, they are more likely to screw the node up and "break things".

With this solution, if they use the wrong CLI option, the worst that can happen is that the block data is not returned. An error that the clients must be able to handle in any case.

@bkchr
Copy link
Member

bkchr commented Nov 9, 2022

don't get this argument. It's not like all other CLI options are 100% idiot proof.

Yes for sure.

I'd argue that if you let the clients pin the blocks as they please with no control of used memory, they are more likely to screw the node up and "break things".

I was more advocating for a more general solution in regards to pinning. Aka we pin blocks internally in the node when we send out a notification to components running in the node. The RPC server would be one of these consumers of these notifications. Any component that currently listens for block notifications, could run into these problems of receiving a notification about a block that is maybe already pruned when trying to process this. Imagine something like some offchain indexing service that takes maybe some longer for a certain block.

That the rpc server internally should not hold these pinned blocks for too long, is some sort of orthogonal problem to me. This can just be solved based on some timer or similar that removes the pinning after some time. But this is a problem of the RPC server implementation, nothing the DB should be aware of. The DB should be some "dumb store" of blocks that prunes stuff when finality kicks in. However, you should have the possibility to prevent pruning of certain blocks, because your application logic requires this. Maybe at some point someone needs to keep a block longer pinned than 32 blocks worth of time (which is actually also a bad time, because not every chain has 6 second blocks).

@lexnv
Copy link
Contributor Author

lexnv commented Nov 9, 2022

I was more advocating for a more general solution in regards to pinning.

Could this "delayed pruning" solution be considered an intermediate trade-off, that helps us deliver the MVP
functionality of the RPC spec v2? And immediately after releasing the RPC V2, I'll take it to come back
with the more general approach regarding pinning? Would this be an appropriate tradeoff?

I don't necessarily think that this solution is a one-way door that could not be improved / reverted to make room
for the proper pinning API. What do you think wrt to this aspect?

@bkchr
Copy link
Member

bkchr commented Nov 9, 2022

Could this "delayed pruning" solution be considered an intermediate trade-off, that helps us deliver the MVP
functionality of the RPC spec v2?

Do we have any kind of pressure for delivering this? If not, I would favor doing it the correct way directly. For delivering the MVP it should probably also be fine to make the pinning RPC api no-op and just use an archive node.

@tomaka
Copy link
Contributor

tomaka commented Nov 9, 2022

Do we have any kind of pressure for delivering this?

Well, our entire frontend story is basically blocked on this.

@bkchr
Copy link
Member

bkchr commented Nov 9, 2022

Okay, but for the MVP an archive node should be fine or not?

@bkchr
Copy link
Member

bkchr commented Nov 9, 2022

@lexnv could also continue working on the RPC layer and I find someone for doing the pinning stuff in parallel.

@lexnv
Copy link
Contributor Author

lexnv commented Nov 9, 2022

@bkchr That would be amazing indeed, we'd achieve the best of both worlds in terms of time deliverables and feature completeness.
Thanks a lot for allocating resources for this endeavor.

That being said, I don't have any strong opinions about this PR if we manage to implement the pinning API properly. What do you think wrt this?

@tomaka
Copy link
Contributor

tomaka commented Nov 9, 2022

Okay, but for the MVP an archive node should be fine or not?

I'm not sure I understand what you're asking, but if only archive nodes implement the new JSON-RPC API then it's a bit crappy. The idea is that the chainHead API is available no matter the node you connect to.

@lexnv lexnv mentioned this pull request Nov 9, 2022
@jsdw
Copy link
Contributor

jsdw commented Nov 9, 2022

My vote would be to merge this (assming you're happy with the code) as a first step which unblocks the rest of the work, and refine our approach to pinning in a future iteration. It may take time and thought, and increase code complexity to "do it properly" (as well as time taken to reaching consensus on things if the approach is more complex).

Ultimately, getting some iteration of the chainHead API out there sooner will allow us to actually implement and try it out in client libraries like Subxt sooner too.

@bkchr
Copy link
Member

bkchr commented Nov 9, 2022

Okay, but for the MVP an archive node should be fine or not?

I'm not sure I understand what you're asking, but if only archive nodes implement the new JSON-RPC API then it's a bit crappy. The idea is that the chainHead API is available no matter the node you connect to.

If we are speaking here about a MVP, something to play around/test the api etc, I don't see why requiring to run an archive node for this is a bad idea. I'm not saying that we should stick to this forever, only for now to move on. AFAIK we don't even know how much we need to change in the new JSON api. All of that is something we want to find out by having the implementation in Substrate. So, my proposal:

  1. Close this pr.
  2. @lexnv continues working on the other APIs.
  3. I find someone to implement the pinning in the backend.
  4. Pinning is implemented and then we also can make the RPC api use this pinning in the backend (before the RPC api would be a no-op).

@tomaka
Copy link
Contributor

tomaka commented Nov 9, 2022

If we are speaking here about a MVP, something to play around/test the api etc, I don't see why requiring to run an archive node for this is a bad idea

Smoldot has implemented this API for nearly a year now. If people want to play around the API they can just use smoldot. Nobody does it. Why? Because why would you bother rewriting your frontend code to use the new API if it doesn't work 95% of the time.
The entire point of implementing this new API in Substrate is that it works 100% of the time. Implementing this in archive nodes only is IMO completely pointless.

@bkchr
Copy link
Member

bkchr commented Nov 9, 2022

@tomaka I don't know why no one is using the api. I was just commenting here based on what I had read/heard around implementing the api.

Nevertheless, the point I'm trying to bring over is that we continue implementing the api. AFAIK there are still RPC apis missing. When this is done and then someone maybe finally uses the API for an UI, they will maybe find some shortcomings. I can already foresee that stuff like this will be missing: paritytech/json-rpc-interface-spec#21

While all of the above is happening, someone will implement the pinning in the backend in parallel. I don't think that it is so complicated.

I don't get what the problem is of what I'm saying above. I just don't want to move some "delayed pruning" feature into the db because some upper-lying application logic will may use it. I don't think that putting all these different kind of behaviors into the backend and hoping that everything sticks together in the way we want this is a bad idea.

@stale
Copy link

stale bot commented Dec 10, 2022

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Dec 10, 2022
@lexnv lexnv mentioned this pull request Dec 19, 2022
4 tasks
@stale stale bot closed this Dec 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add API for block pinning
7 participants