-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constrain parachain block validity on a specific core #103
Constrain parachain block validity on a specific core #103
Conversation
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
|
||
At present time misbehaving collator nodes, or anyone who has acquired a valid collation can prevent a parachain from effecitvely using elastic scaling by providing the same collation to all backing groups assigned to the parachain. This happens before the next parachain block is authored and will prevent the chain of candidates to be formed, reducing the throughput of the parachain to a single core. | ||
|
||
The session index of candidates is important for the disputes protocol as it is used to lookup validator keys and check dispute vote signatures. By adding a `SessionIndex` in the `CandidateDescriptor`, validators no longer have to trust the `Sessionindex` provided by the validator raising a dispute. It can happen that the dispute concerns a relay chain block not yet imported by a validator. In this case validators can safely assume the session index refers to the session the candidate has appeared in, otherwise the chain would have rejected candidate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that we now changed this RFC to also add the session index I would change the title of the RFC (which only mentions the core index commitment).
Either this or I would mention here that this change is not needed for elastic scaling but we are taking advantage and bundling these unrelated changes into one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah and the motivations reads itself a little bit bumpy. Just a little introduction on that you try to solve two issues whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dispute concerns a relay chain block not yet imported by a validator
If this is the case, doesn't this also means that the CandidateDescriptor
is unknown and thus, we need to trust the validator giving us a valid descriptor? And the attack here basically is that the validator would maybe check against the wrong set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not just slash the validator for it asking you to check an invalid session index? It signs the dispute and if the session index is wrong, we should be able to proof this to the runtime or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By having the SessionIndex in the descriptor, it is either valid or it does not exist on any chain (as the chain checks on import). If it does not exist on any chain, then the dispute is just spam and won't resolve anyways, hence there is no risk.
Can you not just slash the validator for it asking you to check an invalid session index? It signs the dispute and if the session index is wrong, we should be able to proof this to the runtime or?
Not easily. You would need to be able to prove the session of a block of another fork.
And the attack here basically is that the validator would maybe check against the wrong set?
yes.
It is also just "nice" as it makes candidates more self-contained: With the SessionIndex provided in the descriptor you can validate the state transition without any knowledge of the fork it appeared in.
It is true that the SessionIndex
could be made up, but so could be the persisted validation data. Essentially what a checker checks when validating is: This is valid, assuming you find a chain that would accept the candidate/has accepted the candidate: Both persisted validation data and SessionIndex are verified on chain, thus it is not necessary to prove their validity off-chain: "Assuming this actually exists, I can confirm it is valid."
I am mostly sanity checking myself here. Any more concerns, please bring them up. @burdges @rphmeier thoughts?
|
||
The UMP queue layout is changed to allow the relay chain to receive both the XCM messages and `UMPSignal` messages. An empty message (empty `Vec<u8>`) is used to mark the end XCM messages and the start of `UMPSignal` messages. | ||
|
||
This way of representing the new messages has been chosen over introducing an enum wrapper to minimize breaking changes of XCM message decoding in tools like Subscan for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will still break for them. They will not be able to decode the rest.
|
||
At present time misbehaving collator nodes, or anyone who has acquired a valid collation can prevent a parachain from effecitvely using elastic scaling by providing the same collation to all backing groups assigned to the parachain. This happens before the next parachain block is authored and will prevent the chain of candidates to be formed, reducing the throughput of the parachain to a single core. | ||
|
||
The session index of candidates is important for the disputes protocol as it is used to lookup validator keys and check dispute vote signatures. By adding a `SessionIndex` in the `CandidateDescriptor`, validators no longer have to trust the `Sessionindex` provided by the validator raising a dispute. It can happen that the dispute concerns a relay chain block not yet imported by a validator. In this case validators can safely assume the session index refers to the session the candidate has appeared in, otherwise the chain would have rejected candidate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah and the motivations reads itself a little bit bumpy. Just a little introduction on that you try to solve two issues whatever.
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job!
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
| Core 2 | **Para A** | Para B | **Para A** | | ||
| Core 3 | Para B | **Para A** | **Para A** | | ||
|
||
The purpose of `ClaimQueueOffset` is to select the column from the above table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Polkadot right now allows to use a claim ahead in the queue? I still don't really get why we need it. The parachain runtime should ensure that always a new core is being used, for that using selector is enough.
Do we use right now the relay chain block of the pov was build on to determine the claim queue and then to determine the core? Or what are we doing right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the core index is determined by the backers who have voted. We map validator index to group index to core index.
The claim queue offset is required because of the CoreSelector % number_of_assignments
operation that selects the core. Because parachains can have a varying number of cores assigned to them (on-demand) at different offsets in the claim queue, the core index you get might be different depending on which claim is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I get this. However, you want to us all possible claims? So, you probably don't want to waste any cores. Which means you will build on all the cores. With prospective parachains you also can calculate which should be the next claim to use. Thus, we should not require the offset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I get this. However, you want to us all possible claims? So, you probably don't want to waste any cores. Which means you will build on all the cores.
Yes
With prospective parachains you also can calculate which should be the next claim to use. Thus, we should not require the offset?
That would not be enough. Let's see what happens. In backing we need to determine if the candidate is valid on the core the backers are assigned to. We take the core selector and get compute the core index and assume we start to fill the claims at top of the queue , which is exactly like claim queue offset 0, or we could have a more evolved tracking mechanism and we'd know which is the next free claim on a higher queue depth.
The problem is that the collator has to provide the core index in the descriptor. Keep in mind that the modulo core_selector operation gives different results depending on number of claims. So the parachain runtime has to actually pick a claim. That is why we need this offset so that validator know what the parachain has picked. If we don't have this parameter, collators need to make same assumption as validators and the runtime otherwise they get different core index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this a little bit more and I think I found a way to not require the offset
. My main problem with this offset
is that it is static. You also write in the RFC itself that it isn't that great.
So, to my proposal. I would propose we store the latest core_selector
of each parachain in the relay chain state. Then we can use this latest_core_selector
(fetched at the candidate context relay chain block) together with the claim_queue
to determine if the core_index
in the candidate is correct. First we would calculate the core_selector_diff = core_selector.wrapping_sub(latest_core_selector)
. Then we would fetch the claim_queue
at the candidate context relay chain block. We would count all cores assigned to our parachain until the count reaches core_selector_diff
and then we would have our core_index
.
I think we can then also get rid off max_candidate_len
from the async backing parameters as the maximum length would be determined by the "claim queue lookahead".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping a claim queue snapshot at valid relayparents in the runtime allows us to compute the core index correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snapshots are fine. Possible future optimization (if necessary): Accompany each entry in the claim queue with the block number it came into existence. Then by only knowing the block number of the relay parent, we can filter to only the relevant entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One easy fix is to duplicate the claim for a order if the claim queue is empty. The parachain gets two chances if it uses cq offset
0
and has enough time to do well with cq offset1
.
And if the claim queue is not empty? They loose their claim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One easy fix is to duplicate the claim for a order if the claim queue is empty. The parachain gets two chances if it uses cq offset
0
and has enough time to do well with cq offset1
.And if the claim queue is not empty? They loose their claim?
If the parachain is using offset 1, no. They have enough time to build the candidate.
If the parachain is using offset 0, then it can only do synchronous backing if my understanding is right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if the claim queue is not empty? They loose their claim?
No. The claim would not even be lost without the duplication on empty queues, but if the queue was empty before your claim immediately goes to position 0, which means only synchronous backing. Position 0 is therefore kind of "special".
There is no such problem if the queue was not empty. If at least one item was present before, you start at position 1, which already provides capability for full blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the general structure of the RFC. I'm only requesting some more clarifications, especially around the offset/selector. Thank you for the work!
I also want to see that the UMPSignal
is optional. Maybe we will never need, but there is also no harm in having it, especially as for single core chains there is no downside in omitting it.
the start of `UMPSignal` messages. | ||
|
||
This way of representing the new messages has been chosen over introducing an enum wrapper to | ||
minimize breaking changes of XCM message decoding in tools like Subscan for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is true. They will probably just skip the empty message and choke on the unknown umpsignal. So, they will require some way to handle this any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't say things don't break, only that the impact of breakage is smaller compared to the alternative.
|
||
## Drawbacks | ||
|
||
The only drawback is that further additions to the descriptor are limited to the amount of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. As long as the start of the descriptor until the version field stays the same, we can implement some custom decoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is true, but the RFC assumes all future changes we make to the descriptor are backward compatible to not break other parts of the stack.
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
/rfc propose |
Hey @bkchr, here is a link you can use to create the referendum aiming to approve this RFC number 0103. Instructions
It is based on commit hash 01719f7b8e74839056a3285e722658823743c81f. The proposed remark text is: |
Voting for this referenda is ongoing. Vote for it here |
PR can be merged. Write the following command to trigger the bot
|
/rfc process 0xdcbb1a70e58737edfbfdb0b866cf977bebafcea08479808340ae03e492922b3e |
The on-chain referendum has approved the RFC. |
Partially implements #5048 - adds a core selection runtime API to cumulus and a generic way of configuring it for a parachain - modifies the slot based collator to utilise the claim queue and the generic core selection What's left to be implemented (in a follow-up PR): - add the UMP signal for core selection into the parachain-system pallet View the RFC for more context: polkadot-fellows/RFCs#103 --------- Co-authored-by: command-bot <>
Following the discussion on #92, this is a proposal to introduce the required core index commitments to make elastic scaling work securely with open collator sets.