Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic scaling: use up to 12 cores #6858

Open
sandreim opened this issue Dec 12, 2024 · 6 comments
Open

Elastic scaling: use up to 12 cores #6858

sandreim opened this issue Dec 12, 2024 · 6 comments
Assignees

Comments

@sandreim
Copy link
Contributor

We need to run some tests with 12 cores and 500ms blocks.

There are no explicit limits on how many cores can be assigned to a parachain in the runtime, but on the node side there is at least this one

@alexggh
Copy link
Contributor

alexggh commented Dec 12, 2024

A first try that doesn't completely work could be found here: master...alexggh/hackaton.

@alindima alindima self-assigned this Dec 16, 2024
@alindima alindima moved this from Backlog to In Progress in parachains team board Dec 16, 2024
@alindima
Copy link
Contributor

The current status with this branch is that the collator builds 12 blocks but is not capable to use async backing (somehow does not end up backing 12 new candidates right on the next block, so it ends up getting 12 blocks and then waiting 6 seconds to back another set of 12).

I managed to debug this and it boils down to this constant not being big enough to handle this many unincluded candidates. Therefore, the following parent searches after the first set of unincluded candidates are built, will keep returning the same candidate until the previous set is included. And because the core selector uses the block number for choosing the core, all further candidates would use the same core, so the slot based collator skips them.
Bumping this constant to 24 fixed it. We should have this limit dynamically configured based on the maximum parachain velocity.

@alindima
Copy link
Contributor

CC: @skunert 🔝

@alindima
Copy link
Contributor

I discovered another issue that surfaces from time to time which makes the throughput drop for a relay chain block:

We're using a u8 for the core selector, so it will wrap around every 256th parahain block. To choose the core index to build on, we modulo the core selector with the number of assigned cores (in this case 12). In this specific scenario of using 12 blocks, it can happen that we get to the 0 core index multiple times in the same relay chain block. Example:

block number core index
250 10
251 11
252 0
253 1
254 2
255 3
0 (wrapped around) 0

Once we wrapped around, we will not build any more blocks on this relay chain block, because we already built a block for core 0 (block number 252).

I think the impact of this is pretty low and acceptable, since it only happens once every 256 blocks.

@alindima
Copy link
Contributor

And also seen the occasional known throughput decrease when building on relay chain forks: #4547

@alindima
Copy link
Contributor

PR that bumps this limit and adds a zombienet-sdk test: #6983

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants