elastic scaling: add core selector to cumulus #5372

alindima · 2024-08-15T10:14:04Z

Partially implements #5048

adds a core selection runtime API to cumulus and a generic way of configuring it for a parachain
modifies the slot based collator to utilise the claim queue and the generic core selection

What's left to be implemented (in a follow-up PR):

add the UMP signal for core selection into the parachain-system pallet

View the RFC for more context: polkadot-fellows/RFCs#103

…api if available

alindima · 2024-08-15T10:15:54Z

cumulus/primitives/core/src/lib.rs

@@ -332,6 +332,10 @@ pub mod rpsr_digest {
 	}
 }

+/// The default claim queue offset to be used if it's not configured/accessible in the parachain
+/// runtime
+pub const DEFAULT_CLAIM_QUEUE_OFFSET: u8 = 1;


open question: should this be 0 or 1.

0 would preserve backwards compatibility with the current collators (they are all building right at the top of the claim queue).
1 would be what core-sharing parachains should use.

1 should be the default value. With 0 they can't fully use the 2s of execution if the core is shared and can't find any reason why this would be useful in practice.

I think the default needs to be configurable by the parachain ... or we derive it from async backing settings. In other words, we either can make a default that does the right thing for all chains or we should not provide one, but make it configurable with simple doc explaining when to use what. (async backing/6s block times yet or not?)

bkchr · 2024-08-15T12:00:30Z

adds a claim queue offset storage item to the parachain-system pallet

Can you please explain why we need this? AFAIR the parachain runtime does not really cares on which core it is build, it is just more about ensuring that someone is not sending a block build for core X to be included on core Y.

bkchr · 2024-08-15T12:19:00Z

IMO passing a pre-digest that contains the core would be a better way here. This would also make it quite easy to find out which core a block was build for.

alindima · 2024-08-15T12:21:23Z

adds a claim queue offset storage item to the parachain-system pallet

Can you please explain why we need this? AFAIR the parachain runtime does not really cares on which core it is build, it is just more about ensuring that someone is not sending a block build for core X to be included on core Y.

The RFC has more details about it. @sandreim may have more inside info but from my understanding:

The parachain runtime will indirectly specify the core on which the candidate is valid on (to prevent the situation you mentioned). The way it specifies the core is by sending a UMP message with a sequence number (which can be the parachain block number) and a claim queue offset. The relay chain and the collator node will then use this info to determine the core index. This was done to avoid passing the claim queue snapshot into the parachain's inherent data.

So indeed the parachain doesn't really care on which core it is built, but it needs to somehow make sure that it commits to one core. That's one reason why we don't specify the core in the UMP message.
But this sequence number is not enough, because the claim queue has multiple entries for each core. So the parachain should pick on which depth of the claim queue it's building

cumulus/client/consensus/aura/src/collators/mod.rs

sandreim · 2024-08-15T11:15:57Z

cumulus/primitives/core/src/lib.rs

@@ -332,6 +332,10 @@ pub mod rpsr_digest {
 	}
 }

+/// The default claim queue offset to be used if it's not configured/accessible in the parachain
+/// runtime
+pub const DEFAULT_CLAIM_QUEUE_OFFSET: u8 = 1;


1 should be the default value. With 0 they can't fully use the 2s of execution if the core is shared and can't find any reason why this would be useful in practice.

sandreim · 2024-08-15T12:30:31Z

IMO passing a pre-digest that contains the core would be a better way here. This would also make it quite easy to find out which core a block was build for.

Can you detail how it would work ? The only way the runtime could know the exact core index would require to see the claim queue assignments and that is more expensive in terms of storage proof compared to the current proposal.

bkchr · 2024-08-15T12:36:47Z

IMO passing a pre-digest that contains the core would be a better way here. This would also make it quite easy to find out which core a block was build for.

Can you detail how it would work ? The only way the runtime could know the exact core index would require to see the claim queue assignments and that is more expensive in terms of storage proof compared to the current proposal.

The block author just provides a pre-digest. Like for example the BABE randomness or the AURA author index. The runtime can read it from there. From there it will be able to expose the information at validation. I don't see the need for having the parachain runtime know anything about the claim queue.

sandreim · 2024-08-15T13:25:48Z

IMO passing a pre-digest that contains the core would be a better way here. This would also make it quite easy to find out which core a block was build for.

Can you detail how it would work ? The only way the runtime could know the exact core index would require to see the claim queue assignments and that is more expensive in terms of storage proof compared to the current proposal.

The block author just provides a pre-digest. Like for example the BABE randomness or the AURA author index. The runtime can read it from there. From there it will be able to expose the information at validation. I don't see the need for having the parachain runtime know anything about the claim queue.

I am not familiar with that code, but I will look into it. If we do this, does the collator actually have any control over the core index ? Also, how would the parachain runtime verify if the core index is correct ?

bkchr · 2024-08-15T14:47:14Z

Also, how would the parachain runtime verify if the core index is correct ?

I first thought it doesn't need, but we need to have at least the selector. I mean in the runtime doesn't need to care about the index as long as it outputs the correct index. I left some questions on the RFC. I think if they are answered, I have a better understanding. 👍

paritytech-cicd-pr · 2024-08-16T12:12:51Z

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 1/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7055802

paritytech-cicd-pr · 2024-08-16T12:12:51Z

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 2/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7055803

…en-collator-set

skunert · 2024-08-16T14:16:36Z

I am also having some problems fully wrapping my head around what the impact of a claim queue offset is.
Let say we look at para A with elastic scaling and 2s block times. Claim qeueue offset of 2. If I have the following claim queue:

Core	Pos 0	Pos 1	Pos 2	Pos 3
0	Para B	Para B	Para A	Para A
1	Para A	Para A	Para A	Para A
2	Para B	Para B	Para A	Para A

Lets imagine an offset of 2. During authoring I check the claim queue at position 2 and see that I have cores 0, 1, 2. Now I submit 3 blocks at this relay parent. If for some reason, previous validators skipped a block, would the claims for core 1 at position 0 and 1 interfere with the candidate chain? Basically collators before me skipped some blocks and one of the blocks I submit on core 1 get an earlier claim but might depend on blocks I submitted on core 0 and 2? Or my mental model does not match what is actually happening?

alindima · 2024-08-16T14:28:45Z

I am also having some problems fully wrapping my head around what the impact of a claim queue offset is. Let say we look at para A with elastic scaling and 2s block times. Claim qeueue offset of 2. If I have the following claim queue:

Core Pos 0 Pos 1 Pos 2 Pos 3
0 Para B Para B Para A Para A
1 Para A Para A Para A Para A
2 Para B Para B Para A Para A
Lets imagine an offset of 2. During authoring I check the claim queue at position 2 and see that I have cores 0, 1, 2. Now I submit 3 blocks at this relay parent. If for some reason, previous validators skipped a block, would the claims for core 1 at position 0 and 1 interfere with the candidate chain? Basically collators before me skipped some blocks and one of the blocks I submit on core 1 get an earlier claim but might depend on blocks I submitted on core 0 and 2? Or my mental model does not match what is actually happening?

Collators don't submit to a specific core. They submit to the nth assigned core in the claim queue.
But the relay chain block author only backs a candidate on claims in the claim queue which are in the top position (offset 0)
So the candidates you authored for cores 0 and 2 will be backed sooner than you expected (both on core 1).

So if the previous collators skipped authoring, you can use their slots in the claim queue.

@sandreim correct me if I'm wrong

…en-collator-set

burdges · 2024-09-17T13:06:30Z

all elastic scaling cores are handled by the same collator

That'd be a serious limit on elastic scaling. You can make a parachain with faster collators, but you cannot speed up validators so easily. I suppose the conversation moved on from there though.

sandreim

LGTM but left a few comments.

cumulus/client/consensus/aura/src/collators/slot_based/block_builder_task.rs

sandreim · 2024-09-17T15:25:34Z

cumulus/client/consensus/aura/src/collators/slot_based/block_builder_task.rs

+				self.last_data = Some((relay_parent, data));
+				Ok(&mut self.last_data.as_mut().expect("last_data was just set above").1)


This feels a bit clunky. Can we just check/update if needed before and then in this match we'd just return a mutable reference to the inner ?

I've tried making this look nicer but this was the best I could :D

cumulus/client/consensus/aura/src/collators/slot_based/block_builder_task.rs

sandreim · 2024-09-17T15:35:54Z

prdoc/pr_5372.prdoc

+  - name: rococo-parachain-runtime
+    bump: minor
+  - name: polkadot-parachain-bin
+    bump: major


How is this a major bump ? Anyone using the binary is broken ?

it's a major bump on the polkadot-parachain-lib because we add a new trait bound to a public trait.
When I initially created this prdoc the polkadot-parachain-lib was part of the bin crate so that's why I bumped the bin crate. Fixed it now

sandreim · 2024-09-17T15:37:27Z

cumulus/pallets/parachain-system/src/lib.rs

@@ -186,6 +191,25 @@ pub mod ump_constants {
 	pub const MESSAGE_SIZE_FEE_BASE: FixedU128 = FixedU128::from_rational(1, 1000); // 0.001
 }

+/// Trait for selecting the next core to build the candidate for.
+pub trait SelectCore {
+	fn select_core_for_child() -> (CoreSelector, ClaimQueueOffset);


Why for_child suffix ?

so that it better expresses what this API returns. It doesn't return the core selector for the current block. It returns the core selector for the next block.

skunert

Looks really good, only nits. I like the trait based config approach we have here now.

cumulus/pallets/parachain-system/src/lib.rs

cumulus/client/consensus/aura/src/collators/slot_based/block_builder_task.rs

skunert · 2024-09-18T18:42:22Z

cumulus/client/consensus/aura/src/collators/slot_based/block_builder_task.rs

+			if !claimed_cores.insert(*core_index) {
+				tracing::debug!(
+					target: LOG_TARGET,
+					"Core {:?} was already claimed at this relay chain slot",


This is basically the condition that is described above when we have not enough cores scheduled to support the paras slot duration. (at least without merging of para blocks)

cumulus/client/consensus/aura/src/collators/slot_based/block_builder_task.rs

cumulus/client/consensus/aura/src/collators/mod.rs

skunert · 2024-09-18T19:20:29Z

cumulus/client/relay-chain-interface/src/lib.rs

@@ -363,4 +374,11 @@ where
 	async fn version(&self, relay_parent: PHash) -> RelayChainResult<RuntimeVersion> {
 		(**self).version(relay_parent).await
 	}
+
+	async fn claim_queue(


With the introduction of this can we remove availability_cores from the interface? Did not check, but should not be used anymore now right?

yes, we could. But is there a good reason for that? There's still information that the availability-cores API contains that you cannot get elsewhere: like when a core is occupied or not. Maybe this will come in handy at some point in the future.

…en-collator-set

alindima added 4 commits August 14, 2024 15:34

expose claim queue on cumulus

c6c166d

add hardcoded claim queue offset and switch to using the claim queue …

d7c595e

…api if available

continue

bcae16d

logs

7b3594a

alindima added I5-enhancement An additional feature request. T9-cumulus This PR/Issue is related to cumulus. labels Aug 15, 2024

alindima commented Aug 15, 2024

View reviewed changes

alindima requested review from eskimor, skunert and sandreim August 15, 2024 10:45

sandreim reviewed Aug 15, 2024

View reviewed changes

alindima added 6 commits August 16, 2024 14:20

add runtime API to westend/rococo system parachains

94bf1df

use ClaimQueueSnapshot

12aea2a

rollback changes to rococo-parachain

f1e2721

add prdoc

f9268dd

fix prdoc

b825c22

fix compilation

7e2dc9a

alindima added 2 commits August 16, 2024 15:27

fix prdoc again

48d19b9

Merge remote-tracking branch 'origin/master' into alindima/cumulus-op…

6d0fbc1

…en-collator-set

alindima added 5 commits September 13, 2024 14:20

Merge remote-tracking branch 'origin/master' into alindima/cumulus-op…

e5c366c

…en-collator-set

fix compilation

8305048

address some comments

982ce7f

unused imports

dec01b7

make the core selection logic generic

4c6fb64

alindima changed the title ~~elastic scaling: add claim queue offset to cumulus~~ elastic scaling: add core selector to cumulus Sep 16, 2024

alindima added 3 commits September 16, 2024 16:02

fix compilation

6ab29d9

Merge remote-tracking branch 'origin/master' into alindima/cumulus-op…

2c0c003

…en-collator-set

fix compilation again

e8c7e96

alindima requested review from sandreim, eskimor and bkchr September 16, 2024 14:44

alindima added 4 commits September 17, 2024 09:42

fix compilation take 3

a2c4d9b

rollback weight generation

67b17fa

fix compilation take 4

b568cf4

remove unused

010f7b9

sandreim approved these changes Sep 17, 2024

View reviewed changes

skunert approved these changes Sep 18, 2024

View reviewed changes

alindima added 5 commits September 19, 2024 11:56

Merge remote-tracking branch 'origin/master' into alindima/cumulus-op…

08cd77d

…en-collator-set

feedback and prdoc

b6d1fb1

Merge remote-tracking branch 'origin/master' into alindima/cumulus-op…

99ff6dd

…en-collator-set

update prdoc

46fcf45

Merge branch 'master' into alindima/cumulus-open-collator-set

c268696

alindima enabled auto-merge September 23, 2024 07:10

alindima added this pull request to the merge queue Sep 23, 2024

Merged via the queue into master with commit b9eb68b Sep 23, 2024
199 of 209 checks passed

alindima deleted the alindima/cumulus-open-collator-set branch September 23, 2024 08:17

redzsina assigned sandreim Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

elastic scaling: add core selector to cumulus #5372

elastic scaling: add core selector to cumulus #5372

alindima commented Aug 15, 2024 •

edited

Loading

alindima Aug 15, 2024

sandreim Aug 15, 2024

eskimor Sep 4, 2024

bkchr commented Aug 15, 2024

bkchr commented Aug 15, 2024

alindima commented Aug 15, 2024

sandreim Aug 15, 2024

sandreim commented Aug 15, 2024 •

edited

Loading

bkchr commented Aug 15, 2024

sandreim commented Aug 15, 2024

bkchr commented Aug 15, 2024

paritytech-cicd-pr commented Aug 16, 2024

paritytech-cicd-pr commented Aug 16, 2024

skunert commented Aug 16, 2024

alindima commented Aug 16, 2024

burdges commented Sep 17, 2024

sandreim left a comment

sandreim Sep 17, 2024

alindima Sep 20, 2024

sandreim Sep 17, 2024

alindima Sep 20, 2024

sandreim Sep 17, 2024

alindima Sep 20, 2024

skunert left a comment

skunert Sep 18, 2024

alindima Sep 20, 2024

skunert Sep 18, 2024

alindima Sep 20, 2024

		self.last_data = Some((relay_parent, data));
		Ok(&mut self.last_data.as_mut().expect("last_data was just set above").1)

elastic scaling: add core selector to cumulus #5372

elastic scaling: add core selector to cumulus #5372

Conversation

alindima commented Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkchr commented Aug 15, 2024

bkchr commented Aug 15, 2024

alindima commented Aug 15, 2024

Choose a reason for hiding this comment

sandreim commented Aug 15, 2024 • edited Loading

bkchr commented Aug 15, 2024

sandreim commented Aug 15, 2024

bkchr commented Aug 15, 2024

paritytech-cicd-pr commented Aug 16, 2024

paritytech-cicd-pr commented Aug 16, 2024

skunert commented Aug 16, 2024

alindima commented Aug 16, 2024

burdges commented Sep 17, 2024

sandreim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skunert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alindima commented Aug 15, 2024 •

edited

Loading

sandreim commented Aug 15, 2024 •

edited

Loading