-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not force collators to update after enabling async backing #1920
Conversation
polkadot/node/network/collator-protocol/src/validator_side/mod.rs
Outdated
Show resolved
Hide resolved
polkadot/node/network/collator-protocol/src/validator_side/mod.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I think this should actually work. For @rphmeier comment, I don't think we need to check the parent here for synchronous backing as it is checked in candidate validation anyway. This requires fetching the collation, but adversaries also have other means for letting us fetch a collation that turns out invalid, so checking this beforehand does not help that much.
(Nothing gets worse than it was before.)
This pull request has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/polkadot-digest-18-oct-2023/4318/1 |
if !per_relay_parent.collations.is_seconded_limit_reached(relay_parent_mode) { | ||
return Err(AdvertisementError::SecondedLimitReached) | ||
} | ||
|
||
if let Some((candidate_hash, parent_head_data_hash)) = prospective_candidate { | ||
let is_seconding_allowed = !relay_parent_mode.is_enabled() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!relay_parent_mode.is_enabled() ||
is there in case two nodes communicate with V2 but async backing is still off, meaning we don't need to check for can_second
-- there's only one slot for candidate per relay parent anyway.
Also, seems like this branch won't be executed at all even with async backing enabled if collator chooses to communicate with v1, which opens some room for spamming the node with validation work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, seems like this branch won't be executed at all even with async backing enabled if collator chooses to communicate with v1, which opens some room for spamming the node with validation work.
You mean because we would add the collation without further checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous version of this PR use_non_prospective_chains_mode
was defined based on what collator told us, in the original code it relies on the runtime request from validators.
You mean because we would add the collation without further checks?
The idea is to only request collation once the parent parachain block becomes backable. Until then, we don't do any unnecessary work. With this code collator can
- Choose to use v1 networking
- Send maximum number of sequentially based collations and validators will start requesting & validating without waiting for the "backable parent" part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in the v1 networking version it should only accept parent_head_data_hash == current_head_hash
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in the v1 networking version it should only accept
parent_head_data_hash == current_head_hash
.
Should be done now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this line should be reverted:
polkadot-sdk/polkadot/node/core/backing/src/lib.rs
Lines 1230 to 1232 in 21b3284
} else { | |
// Relay parent is unknown or async backing is disabled. | |
false |
If validator and collator both use v2 while async backing is disabled this will block all advertisements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done @slumber
The CI pipeline was cancelled due to failure one of the required jobs. |
The CI pipeline was cancelled due to failure one of the required jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it works based on insert_advertisement
call in handle_advertisement
.
if !per_relay_parent.collations.is_seconded_limit_reached(relay_parent_mode) { | ||
return Err(AdvertisementError::SecondedLimitReached) | ||
} | ||
|
||
if let Some((candidate_hash, parent_head_data_hash)) = prospective_candidate { | ||
let is_seconding_allowed = !relay_parent_mode.is_enabled() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this line should be reverted:
polkadot-sdk/polkadot/node/core/backing/src/lib.rs
Lines 1230 to 1232 in 21b3284
} else { | |
// Relay parent is unknown or async backing is disabled. | |
false |
If validator and collator both use v2 while async backing is disabled this will block all advertisements.
…ch/polkadot-sdk into bkchr-not-force-collator-update
(CollationVersion::V1, _) => | ||
request_persisted_validation_data( | ||
ctx.sender(), | ||
candidate_receipt.descriptor().relay_parent, | ||
candidate_receipt.descriptor().para_id, | ||
) | ||
.await?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we would've requested pvd from chain even for V2 protocol in case async backing is disabled.
This is correct, because prospective parachains subsystem doesn't do any job otherwise.
With new version this is not the case anymore, and here it implies that V2 -> async backing is enabled, which is wrong assumption.
So it should be
(CollationVersion::V2, Some(ProspectiveCandidate { parent_head_data_hash, .. }))
if relay_parent_mode.is_enabled() => // <-- this
request_prospective_validation_data(
ctx.sender(),
relay_parent,
parent_head_data_hash,
pending_collation.para_id,
)
.await?,
* tsv-disabling: (39 commits) Handling of disabled validators in backing subsystem (#1259) Switch trie cache random seed (#1935) Expose prometheus metrics for minimal-relay-chain node in collators (#1942) Do not force collators to update after enabling async backing (#1920) Pin PRDoc image to v0.0.5 until we are ready for v0.0.6 (#1947) [prdoc] Start BEEFY gadget by default for Polkadot nodes (#1945) Update bridges subtree (#1944) bump zombienet version (#1931) [FRAME] Message Queue use proper overweight limit (#1873) Cumulus: Allow aura to use initialized collation request receiver (#1911) Use prebuilt try-runtime binary in CI (#1898) Update kusama/polkadot bootnodes (#1895) Introduce XcmFeesToAccount fee manager (#1234) upgraded review bot to v2.1.0 (#1908) Trading trait and deal with metadata in Mutate trait for nonfungibles_v2 (#1561) Add Runtime Missing Crate Descriptions (#1909) Switch to the release env (#1910) Bump paritytech/review-bot from 2.0.1 to 2.1.0 (#1924) Bump actions/checkout from 4.1.0 to 4.1.1 (#1925) Start BEEFY client by default for Polkadot nodes (#1913) ...
The validators are checking if async backing is enabled by checking the version of the runtime api. If the runtime api is upgraded by a runtime upgrade, the validators start to also enable the async backing logic. However, just because async backing is enabled, it doesn't mean that all collators and parachain runtimes have upgraded. This pull request fixes an issue about advertising collations to the relay chain when it has async backing enabled, but the collator is still using the old networking protocol. The implementation is actually backwards compatible as we can not expect that everyone directly upgrades. However, the collation advertisement logic was requiring V2 networking messages after async backing was enabled, which was wrong. This is now fixed by this pull request. Closes: #1923 --------- Co-authored-by: eskimor <eskimor@users.noreply.github.com>
The validators are checking if async backing is enabled by checking the version of the runtime api. If the runtime api is upgraded by a runtime upgrade, the validators start to also enable the async backing logic. However, just because async backing is enabled, it doesn't mean that all collators and parachain runtimes have upgraded. This pull request fixes an issue about advertising collations to the relay chain when it has async backing enabled, but the collator is still using the old networking protocol. The implementation is actually backwards compatible as we can not expect that everyone directly upgrades. However, the collation advertisement logic was requiring V2 networking messages after async backing was enabled, which was wrong. This is now fixed by this pull request.
Closes: #1923