Add missing fields to the light sync state #7225

expenses · 2020-09-28T13:04:50Z

Part of #6804.

polkadot companion: paritytech/polkadot#1801

…-state

client/chain-spec/src/chain_spec.rs

client/service/src/chain_ops/build_sync_spec.rs

expenses · 2020-09-28T13:10:47Z

client/service/src/chain_ops/build_sync_spec.rs


 	Ok(sc_chain_spec::LightSyncState {
-		header
+		finalized_block_header: finalized_header,
+		babe_finalized_block1_slot_number: 0,


Need a way to get this as well.

andresilva

Not sure this is the way to go. I'll try to provide context on what the problem is and how we should go about fixing it.

Both BABE and GRANDPA maintain some data structures relating to the management of epochs (i.e. periods in which a given set of authorities is active). There's no need to go into details on the logic behind those data structures, but the key point is that they are populated as part of processing blocks. The data structures are:

BABE - EpochChanges
GRANDPA - AuthoritySet

In the scope of the light client, where we skip an initial sync and start from a block defined in the chainspec, this means that these data structures will not be populated with any data. As such, both BabeBlockImport and GrandpaBlockImport will fail to validate subsequent blocks (e.g. the errors you were seeing regarding BABE not finding epoch data). The solution is that we need to persist the data that exists in EpochChanges and AuthoritySet when we are creating the "light sync state", which we can then use when restoring from that state.

IMO as an initial implementation we can start by just dumping the data structures completely and persisting the whole thing (they are scale-encodable as we are already persisting them to the database). Once that is done we can prune these data structures to remove any unnecessary data (so that we don't persist everything). Both the data structures are easily available as part of service initialization: https://github.com/paritytech/substrate/blob/master/bin/node/cli/src/service.rs#L120 and https://github.com/paritytech/substrate/blob/master/bin/node/cli/src/service.rs#L112.

andresilva · 2020-09-29T11:49:43Z

We could persist all the data that is needed in separate fields (as it seems the intention in this PR so far), but we'd still need to create some EpochChanges and AuthoritySet data structures with that data afterwards. Given that, I think it's easier to just persist EpochChanges and AuthoritySet to start with.

…-state

…nges

expenses · 2020-09-29T13:10:57Z

@andresilva done. The sync state increases the size of the chain spec by about 700kb, but that's not too bad that it's 3.8mb to begin with.

expenses · 2020-09-30T13:57:32Z

@andresilva How should we be getting the babe slot number for the first block? I thought that you'd be able to get this from the epoch changes, but you can't because it gets pruned.

andresilva · 2020-09-30T14:11:23Z

@andresilva done. The sync state increases the size of the chain spec by about 700kb, but that's not too bad that it's 3.8mb to begin with.

Yeah, we'll have at least two lists of authority sets with public keys in there (BABE and GRANDPA). We might be able to prune some of the data you're seeing though (but let's worry about this later).

@andresilva How should we be getting the babe slot number for the first block? I thought that you'd be able to get this from the epoch changes, but you can't because it gets pruned.

We can get it from BABE's runtime module https://github.com/paritytech/substrate/blob/master/frame/babe/src/lib.rs#L169. But do we need it? I know that Pierre mentioned this in the data we need to persist, but if we are persisting the latest epoch data I'm not sure we need this. At least the client code in substrate does not use this in any way, I believe this would only be needed in the substrate-lite implementation.

expenses · 2020-09-30T14:37:40Z

@andresilva we're trying to support substrate-lite here, but @tomaka if we don't use it in substrate, why is it needed in substrate-lite?

expenses · 2020-09-30T14:39:10Z

Also @tomaka am I right in thinking that babe_finalized_block_epoch_information is infomation about the epoch that contains the finalized header? We won't be able to get this from EpochChanges because it gets pruned once finalized.

tomaka · 2020-10-01T10:28:07Z

We can get it from BABE's runtime module https://github.com/paritytech/substrate/blob/master/frame/babe/src/lib.rs#L169. But do we need it? I know that Pierre mentioned this in the data we need to persist, but if we are persisting the latest epoch data I'm not sure we need this.

I use it in substrate-lite to determine at which slot an epoch starts and ends.
The slot number of block 1 is the slot at which epoch 0 starts. Then epoch 1 starts at slot_of_block_1 + slots_per_epoch, and so on. I can calculate when epoch N starts and ends just by knowing N and slot_of_block_1.
I guess that in Substrate the start slot number is stored in the epoch's information?

Also @tomaka am I right in thinking that babe_finalized_block_epoch_information is infomation about the epoch that contains the finalized header?

Indeed. If a block uses the last slot of an epoch, we know that the next block will always be a different epoch. However, since not all slots necessarily have a block, it is possible for a block to not use the last slot of an epoch, yet the next block is in a different epoch anyway.

expenses · 2020-10-01T11:42:10Z

I guess that in Substrate the start slot number is stored in the epoch's information?

Yep: https://substrate.dev/rustdocs/v2.0.0-rc6/sc_consensus_babe/struct.Epoch.html#structfield.start_slot

tomaka · 2020-10-01T11:44:53Z

This is becoming off-topic, but I would need to learn a bit more about the long term plan concerning Babe slots. I know we're supposed to have some small NTP-like protocol, or something like that, but I don't know the details.

Using the slot of block 1 also assumes that the number of slots per epoch never changes. This is true at the moment, but maybe we'll add this possibility in the future?

tomaka · 2020-10-01T12:04:21Z

Anyway, my point is that this PR should do the thing that makes sense, not the thing that brings compatibility with substrate-lite. It's substrate-lite which has to adjust. But it's unclear to me what makes sense.

expenses · 2020-10-01T13:14:36Z

Ok, I'm making progress getting the sync state to load in substrate (not lite). Now I'm getting:

Oct 01 15:10:15.402  WARN 💔 Error importing block 
0x3bf33ab015a8ae4ff09c8bc6e4361c82347cc87fefbba3478ee3b29cb0b71da1:
Err(Other(ClientImport("Parent block of 0x3bf3…1da1 has no associated weight")))

@andresilva Should we saving babe block weights into the sync state as well?

client/chain-spec/src/lib.rs

client/service/src/builder.rs

…-state

expenses · 2020-10-12T12:26:17Z

@andresilva on ashley-improve-sync-state-WIP-loading im not getting a lot of this warning:

2020-10-12 14:25:18 💔 Error importing block 0x8ce0f445b3f00a3585b5571492cbda40bc04bb09f64179e0723628a6bf9448a7: Err(Other(ClientImport("Unexpected epoch change")))

expenses · 2020-10-14T12:02:21Z

Ok, so I've updated ashley-improve-sync-state-WIP-loading to not generate CHTs in LightStorage::note_finalized and it now syncs the provided chain spec with a sync state for block #118347 to the head without panicking.

expenses · 2020-10-14T12:07:40Z

It is still panicking on a second chain spec on branch ashley-improve-sync-state-WIP-loading-panicking that starts from block #198161.

…-state

expenses · 2020-10-15T07:38:22Z

This PR is ready and I'd like to get it merged soon. The two things to check in reviews are:

Are we completely sure that the sync state contains everything we need?
Is the way I've set up this RPC call correct?

andresilva

just a minor nit but overall lgtm.

andresilva · 2020-10-14T17:59:19Z

client/finality-grandpa/src/authorities.rs

@@ -53,7 +53,7 @@ impl<H, N> Clone for SharedAuthoritySet<H, N> {

 impl<H, N> SharedAuthoritySet<H, N> {
 	/// Acquire a reference to the inner read-write lock.
-	pub(crate) fn inner(&self) -> &RwLock<AuthoritySet<H, N>> {
+	pub fn inner(&self) -> &RwLock<AuthoritySet<H, N>> {


Do you think we can instead add a method clone_inner() or something like that, which clones the inner AuthoritySet and returns it? It is safer not to expose the lock outside this crate.

andresilva · 2020-10-15T08:41:57Z

I thought that there could only be one pending change at a time on the finalized chain?

There can be multiple pending changes on the unfinalized chain (if GRANDPA has stalled). On Kusama we will be rotating the session every hour, which potentially leads to an authority set change in GRANDPA. Since the changes are only enacted on finality, if GRANDPA stops working for 6h we can end up with 6 pending authority set changes all on the same branch (which we need to finalize in-order afterwards). Probably I didn't do a very good job at explaining you how this works previously (why I need to write some docs).

tomaka · 2020-10-15T08:47:58Z

on the unfinalized chain

But this PR is only about returning the state of the finalized chain, no? A data structure containing the state of the finalized chain doesn't have to contain more than one pending change.

expenses · 2020-10-15T09:02:12Z

@tomaka Well no, at the moment we just encode the whole AuthoritySet at the time of the RPC call. If finality is lagging behind significantly, then it will contain unfinalized pending changes. In the future we could trim this down.

expenses · 2020-10-15T09:33:59Z

Ok, I'd like to get another approval and then I'll merge this.

andresilva · 2020-10-15T09:58:50Z

@tomaka Well no, at the moment we just encode the whole AuthoritySet at the time of the RPC call. If finality is lagging behind significantly, then it will contain unfinalized pending changes. In the future we could trim this down.

To get something working I think it was easier to just persist BABE's EpochChanges and GRANDPA's AuthoritySet data structures. This is because we can then just blindly encode / decode and have something working. This has some disadvantages though: it includes unnecessary data and it's harder to integrate with other clients (e.g. substrate-lite doesn't use these data structures so cannot easily use these fields). I think the next step should be to trim them down to the bare essentials and in which case we will probably end up only persisting those pieces of data and not the complete data structure (e.g. for GRANDPA it would be the set id, current authorities and some pending standard change that may exist). But IMO this can be done in a follow-up PR.

tldr: the actual sync state data will probably change after this PR, IMO this is fine as we can get something working in the meantime without having to deal with all the details of what needs to be persisted.

expenses · 2020-10-15T10:00:37Z

tldr: the actual sync state data will probably change after this PR, IMO this is fine as we can get something working in the meantime without having to deal with all the details of what needs to be persisted.

~~As long as the types don't change, I'm okay with this.~~

Actually, I suppose that's fine.

expenses · 2020-10-15T10:10:37Z

bot merge

ghost · 2020-10-15T10:10:42Z

Trying merge.

andresilva · 2020-10-15T10:13:23Z

tldr: the actual sync state data will probably change after this PR, IMO this is fine as we can get something working in the meantime without having to deal with all the details of what needs to be persisted.

As long as the types don't change, I'm okay with this.

If we want to make it easier to be consumed by substrate-lite then we probably want to change the types (to "plain" data as you had initially). TBH I wasn't thinking about interoperability when I suggested persisting EpochChanges and AuthoritySet just thinking of the easiest / fastest way to get the sync state to work. But IMO if we're going to make the substrate light client work first with the sync state (and look into substrate-lite afterwards) then the current approach is OK since there are some unknowns and we will probably only be sure once we have the whole thing working properly.

cheme · 2020-10-15T10:20:06Z

client/sync-state-rpc/src/lib.rs

+			finalized_block_header: finalized_header,
+			babe_epoch_changes: self.shared_epoch_changes.lock().clone(),
+			babe_finalized_block_weight: finalized_block_weight,
+			grandpa_authority_set: self.shared_authority_set.clone_inner(),


can this be prune (grandpa set or babe epochs) concurrently ?
btw even if possible the race seems highly improbable (probably when code get generic there will be
a trait method like synch_state_at(block_hash) -> Option<S> for substrate component that require synch state and this will not be a possibillity.

It should be fine since both BABE and GRANDPA will acquire a lock when writing to these data structures (e.g. if there's a block being imported).

so maybe we should lock both structure before getting the finalized block hash? (it would make thing easier to read at least)

expenses added 4 commits September 23, 2020 17:06

Initial draft

d3e8c97

Add an iterator that helps us get most items

1eb1297

Merge remote-tracking branch 'origin/master' into ashley-improve-sync…

a118cea

…-state

Revert changes to grandpa

ccfb11d

github-actions bot added the A3-in_progress Pull request is in progress. No review needed at this stage. label Sep 28, 2020

expenses commented Sep 28, 2020

View reviewed changes

client/chain-spec/src/chain_spec.rs Outdated Show resolved Hide resolved

expenses commented Sep 28, 2020

View reviewed changes

client/service/src/chain_ops/build_sync_spec.rs Outdated Show resolved Hide resolved

expenses commented Sep 28, 2020

View reviewed changes

client/service/src/chain_ops/build_sync_spec.rs Outdated Show resolved Hide resolved

expenses commented Sep 28, 2020

View reviewed changes

expenses requested review from andresilva, tomaka, cheme and arkpar September 28, 2020 13:42

expenses added B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Sep 28, 2020

andresilva reviewed Sep 29, 2020

View reviewed changes

expenses added 2 commits September 29, 2020 14:48

Merge remote-tracking branch 'origin/master' into ashley-improve-sync…

b3e8f09

…-state

Change fields to just be the grandpa authority set and babe epoch cha…

35a580a

…nges

Only use the fields we need from the shared authority set

a8f0966

expenses mentioned this pull request Oct 9, 2020

Companion PR for 'Add missing fields to the light sync state #7225' paritytech/polkadot#1801

Merged

bkchr reviewed Oct 10, 2020

View reviewed changes

client/chain-spec/src/lib.rs Outdated Show resolved Hide resolved

client/service/src/builder.rs Outdated Show resolved Hide resolved

expenses added 2 commits October 12, 2020 13:05

Move sync state rpc stuff into sc-sync-state-rpc

d94adf1

Merge remote-tracking branch 'origin/master' into ashley-improve-sync…

d56cafc

…-state

github-actions bot removed the A7-needspolkadotpr label Oct 12, 2020

Remove as_json_value and remove unwraps from sc-sync-state-rpc

49f1d47

expenses requested a review from bkchr October 12, 2020 14:56

expenses mentioned this pull request Oct 13, 2020

Set up type definitions for the sync state paritytech/smoldot#62

Merged

Merge remote-tracking branch 'origin/master' into ashley-improve-sync…

b6af3e9

…-state

expenses requested a review from rakanalh October 15, 2020 07:40

andresilva approved these changes Oct 15, 2020

View reviewed changes

Add clone_inner to SharedAuthoritySet

750448b

ghost merged commit 65cc9af into master Oct 15, 2020

ghost deleted the ashley-improve-sync-state branch October 15, 2020 10:10

cheme reviewed Oct 15, 2020

View reviewed changes

expenses mentioned this pull request Oct 15, 2020

Load sync states from the chain spec on the light client #7329

Closed

tomaka mentioned this pull request Feb 10, 2022

Mark sync_state_genSyncSpec JSON-RPC as safe #10832

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing fields to the light sync state #7225

Add missing fields to the light sync state #7225

expenses commented Sep 28, 2020 •

edited

Loading

expenses Sep 28, 2020

andresilva left a comment

andresilva commented Sep 29, 2020 •

edited

Loading

expenses commented Sep 29, 2020

expenses commented Sep 30, 2020

andresilva commented Sep 30, 2020

expenses commented Sep 30, 2020

expenses commented Sep 30, 2020

tomaka commented Oct 1, 2020 •

edited

Loading

expenses commented Oct 1, 2020

tomaka commented Oct 1, 2020 •

edited

Loading

tomaka commented Oct 1, 2020 •

edited

Loading

expenses commented Oct 1, 2020

expenses commented Oct 12, 2020

expenses commented Oct 14, 2020

expenses commented Oct 14, 2020 •

edited

Loading

expenses commented Oct 15, 2020

andresilva left a comment

andresilva Oct 14, 2020

expenses Oct 15, 2020

andresilva commented Oct 15, 2020 •

edited

Loading

tomaka commented Oct 15, 2020

expenses commented Oct 15, 2020

expenses commented Oct 15, 2020

andresilva commented Oct 15, 2020

expenses commented Oct 15, 2020 •

edited

Loading

expenses commented Oct 15, 2020

ghost commented Oct 15, 2020

andresilva commented Oct 15, 2020

cheme Oct 15, 2020

andresilva Oct 15, 2020

cheme Oct 15, 2020

Add missing fields to the light sync state #7225

Add missing fields to the light sync state #7225

Conversation

expenses commented Sep 28, 2020 • edited Loading

expenses Sep 28, 2020

Choose a reason for hiding this comment

andresilva left a comment

Choose a reason for hiding this comment

andresilva commented Sep 29, 2020 • edited Loading

expenses commented Sep 29, 2020

expenses commented Sep 30, 2020

andresilva commented Sep 30, 2020

expenses commented Sep 30, 2020

expenses commented Sep 30, 2020

tomaka commented Oct 1, 2020 • edited Loading

expenses commented Oct 1, 2020

tomaka commented Oct 1, 2020 • edited Loading

tomaka commented Oct 1, 2020 • edited Loading

expenses commented Oct 1, 2020

expenses commented Oct 12, 2020

expenses commented Oct 14, 2020

expenses commented Oct 14, 2020 • edited Loading

expenses commented Oct 15, 2020

andresilva left a comment

Choose a reason for hiding this comment

andresilva Oct 14, 2020

Choose a reason for hiding this comment

expenses Oct 15, 2020

Choose a reason for hiding this comment

andresilva commented Oct 15, 2020 • edited Loading

tomaka commented Oct 15, 2020

expenses commented Oct 15, 2020

expenses commented Oct 15, 2020

andresilva commented Oct 15, 2020

expenses commented Oct 15, 2020 • edited Loading

expenses commented Oct 15, 2020

ghost commented Oct 15, 2020

andresilva commented Oct 15, 2020

cheme Oct 15, 2020

Choose a reason for hiding this comment

andresilva Oct 15, 2020

Choose a reason for hiding this comment

cheme Oct 15, 2020

Choose a reason for hiding this comment

expenses commented Sep 28, 2020 •

edited

Loading

andresilva commented Sep 29, 2020 •

edited

Loading

tomaka commented Oct 1, 2020 •

edited

Loading

tomaka commented Oct 1, 2020 •

edited

Loading

tomaka commented Oct 1, 2020 •

edited

Loading

expenses commented Oct 14, 2020 •

edited

Loading

andresilva commented Oct 15, 2020 •

edited

Loading

expenses commented Oct 15, 2020 •

edited

Loading