Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collation fetching fairness #4880

Open
wants to merge 106 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
f4738dc
Collation fetching fairness
tdimitrov Jun 26, 2024
c7074da
Comments
tdimitrov Jun 26, 2024
73eee87
Fix tests and add some logs
tdimitrov Jun 26, 2024
fa321ce
Fix per para limit calculation in `is_collations_limit_reached`
tdimitrov Jun 27, 2024
96392a5
Fix default `TestState` initialization: claim queue len should be equ…
tdimitrov Jun 27, 2024
0f28aa8
clippy
tdimitrov Jun 27, 2024
e5ea548
Update `is_collations_limit_reached` - remove seconded limit
tdimitrov Jun 28, 2024
9abc898
Fix pending fetches and more tests
tdimitrov Jul 1, 2024
c07890b
Remove unnecessary clone
tdimitrov Jul 1, 2024
e50440e
Comments
tdimitrov Jul 1, 2024
42b05c7
Better var names
tdimitrov Jul 1, 2024
2f5a466
Fix `pick_a_collation_to_fetch` and add more tests
tdimitrov Jul 2, 2024
ff96ef9
Fix test: `collation_fetching_respects_claim_queue`
tdimitrov Jul 2, 2024
e837689
Add `collation_fetching_fallback_works` test + comments
tdimitrov Jul 2, 2024
91cdd13
More tests
tdimitrov Jul 3, 2024
9f2d59b
Fix collation limit fallback
tdimitrov Jul 3, 2024
a10c86d
Separate `claim_queue_support` from `ProspectiveParachainsMode`
tdimitrov Jul 3, 2024
b39858a
Fix comments and add logs
tdimitrov Jul 3, 2024
b30f340
Update test: `collation_fetching_prefer_entries_earlier_in_claim_queue`
tdimitrov Jul 3, 2024
c0f18b9
Fix `pick_a_collation_to_fetch` and more tests
tdimitrov Jul 3, 2024
703ed6d
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Jul 3, 2024
fba7ca6
Fix `pick_a_collation_to_fetch` - iter 1
tdimitrov Jul 4, 2024
d4f4ce2
Fix `pick_a_collation_to_fetch` - iter 2
tdimitrov Jul 4, 2024
5f52712
Remove a redundant runtime version check
tdimitrov Jul 4, 2024
6c73e24
formatting and comments
tdimitrov Jul 4, 2024
752f3cc
pr doc
tdimitrov Jul 4, 2024
f0069f1
add license
tdimitrov Jul 4, 2024
6b9f0b3
clippy
tdimitrov Jul 4, 2024
5f6dcdd
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Jul 4, 2024
b8c1b85
Update prdoc/pr_4880.prdoc
tdimitrov Jul 5, 2024
f26362f
Limit collations based on seconded count instead of number of fetches
tdimitrov Jul 7, 2024
d6857fc
Undo rename: is_seconded_limit_reached
tdimitrov Jul 7, 2024
cde28cd
fix collation tests
tdimitrov Jul 8, 2024
4c3db2a
`collations_fetching_respects_seconded_limit` test
tdimitrov Jul 8, 2024
b2bbdfe
nits
tdimitrov Jul 8, 2024
e220cb4
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Aug 26, 2024
01d121e
Remove duplicated dependency after merge
tdimitrov Aug 26, 2024
7b3c002
Remove `ProspectiveParachainsMode` from collator-protocol, validator-…
tdimitrov Jul 10, 2024
5dffdde
Fix compilation errors in collation tests
tdimitrov Jul 10, 2024
1c1744b
`is_seconded_limit_reached` uses the whole view
tdimitrov Jul 11, 2024
aaccab1
Fix `is_seconded_limit_reached` check
tdimitrov Aug 30, 2024
b1df2e3
Trace logs useful for debugging tests
tdimitrov Sep 11, 2024
ce3a95e
Handle unconnected candidates
tdimitrov Sep 11, 2024
fe3c09d
Rework pre-prospective parachains tests to work with claim queue
tdimitrov Sep 11, 2024
b9ab579
Fix `collation_fetches_without_claimqueue`
tdimitrov Sep 12, 2024
fe623bc
Test - `collation_fetching_prefer_entries_earlier_in_claim_queue`
tdimitrov Sep 13, 2024
d216689
Remove collations test file - all tests are moved in prospective_para…
tdimitrov Sep 13, 2024
ea99c7a
fixup - collation_fetching_prefer_entries_earlier_in_claim_queue
tdimitrov Sep 16, 2024
ee155f5
New test - `collation_fetching_considers_advertisements_from_the_whol…
tdimitrov Sep 16, 2024
55b7902
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Sep 17, 2024
515a784
Update PRdoc and comments
tdimitrov Sep 17, 2024
4ef6919
Combine `seconded_per_para` and `claims_per_para` from collations in …
tdimitrov Sep 17, 2024
bd7174f
No need to handle missing claim queue anymore
tdimitrov Sep 17, 2024
df6165e
Remove dead code and fix some comments
tdimitrov Sep 17, 2024
4c5c271
Remove `is_seconded_limit_reached` and use the code directly due to t…
tdimitrov Sep 17, 2024
b0e4627
Fix comments
tdimitrov Sep 17, 2024
d1cf41d
`pending_for_para` -> `is_pending_for_para`
tdimitrov Sep 17, 2024
df3a215
Fix `0011-async-backing-6-seconds-rate.toml` - set `lookahead` to 3 o…
tdimitrov Sep 19, 2024
b70807b
Set `lookahead` in polkadot/zombienet_tests/elastic_scaling/0002-elas…
tdimitrov Sep 20, 2024
f047036
paras_now -> assigned_paras
tdimitrov Sep 30, 2024
94e4fc3
Remove a duplicated parameter in `update_view`
tdimitrov Sep 30, 2024
386488b
Remove an outdated comment
tdimitrov Sep 30, 2024
ff312c9
Fix `seconded_and_pending_for_para_in_view`
tdimitrov Oct 2, 2024
88d0307
`claim_queue_state` becomes `unfulfilled_claim_queue_entries` - the b…
tdimitrov Oct 2, 2024
af78352
For consistency use `chain_ids` only from `test_state`
tdimitrov Oct 2, 2024
d636091
Limit the number of advertisements accepted by each peer for spam pro…
tdimitrov Oct 3, 2024
2bb82eb
Zombienet test
tdimitrov Oct 7, 2024
c782058
Rearrange imports
tdimitrov Oct 7, 2024
903f7f4
Newline and outdated comment
tdimitrov Oct 7, 2024
cefbce8
Undo `lookahead = 3` in zombienet tests
tdimitrov Oct 14, 2024
cb69361
Consider what's scheduled on the core when determining assignments
tdimitrov Oct 14, 2024
1142a90
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Oct 14, 2024
4438349
Fix a clippy warning
tdimitrov Oct 14, 2024
e82c386
Update PRdoc
tdimitrov Oct 15, 2024
4b2d4c5
Apply suggestions from code review
tdimitrov Oct 15, 2024
1c91371
".git/.scripts/commands/fmt/fmt.sh"
Oct 15, 2024
be34132
Code review feedback
tdimitrov Oct 15, 2024
5c7b2ac
Fix a typo in prdoc
tdimitrov Oct 15, 2024
62c6473
`seconded_and_pending_for_para_in_view` looks up to the len of the cl…
tdimitrov Oct 16, 2024
9e3f62d
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Oct 16, 2024
a4bc21f
rerun CI
tdimitrov Oct 16, 2024
6c103df
Fix zombienet test
tdimitrov Oct 18, 2024
15e3a74
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Oct 18, 2024
d6b35ca
Relax expected block counts for each para
tdimitrov Oct 18, 2024
586b56b
Bump lookahead and decrease timeout
tdimitrov Oct 18, 2024
a04d480
Fix ZN pipeline - try 1
tdimitrov Oct 18, 2024
13d5d15
Fix ZN pipeline - try 2
tdimitrov Oct 18, 2024
86870d0
Fix ZN pipeline - try 3
tdimitrov Oct 18, 2024
7b822af
Fix ZN pipeline - try 4
tdimitrov Oct 18, 2024
558c82e
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Oct 21, 2024
06c0fd0
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Oct 24, 2024
ade7f9b
Rename ZN test
tdimitrov Oct 24, 2024
8ba2a80
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Oct 28, 2024
ab70567
Handle merge conflicts
tdimitrov Oct 28, 2024
d24fdc1
When counting occupied slots from the claim queue consider relay pare…
tdimitrov Nov 6, 2024
f55390e
Add a test
tdimitrov Nov 7, 2024
a2093ee
Small style fixes in tests
tdimitrov Nov 7, 2024
ded6fb5
Fix a todo
tdimitrov Nov 7, 2024
cda9330
Fix `paths_to_relay_parent`
tdimitrov Nov 7, 2024
505eb24
Additional test for `paths_to_relay_parent`
tdimitrov Nov 8, 2024
94f573a
Simplifications
tdimitrov Nov 8, 2024
fa82404
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Nov 8, 2024
55e7fb2
Resolve merge conflicts
tdimitrov Nov 8, 2024
e27ddd4
Fix todos
tdimitrov Nov 8, 2024
a10c0c1
Comment
tdimitrov Nov 8, 2024
ee11c6a
Remove unneeded log line
tdimitrov Nov 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .gitlab/pipeline/zombienet/polkadot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,17 @@ zombienet-polkadot-functional-0018-shared-core-idle-parachain:
--local-dir="${LOCAL_DIR}/functional"
--test="0018-shared-core-idle-parachain.zndsl"

zombienet-polkadot-functional-0019-coretime-collation-fetching-fairness:
extends:
- .zombienet-polkadot-common
before_script:
- !reference [ .zombienet-polkadot-common, before_script ]
- cp --remove-destination ${LOCAL_DIR}/assign-core.js ${LOCAL_DIR}/functional
script:
- /home/nonroot/zombie-net/scripts/ci/run-test-local-env-manager.sh
--local-dir="${LOCAL_DIR}/functional"
--test="0019-coretime-collation-fetching-fairness.zndsl"

zombienet-polkadot-smoke-0001-parachains-smoke-test:
extends:
- .zombienet-polkadot-common
Expand Down
3 changes: 3 additions & 0 deletions polkadot/node/network/collator-protocol/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,9 @@ pub enum Error {

#[error("Response receiver for claim queue request cancelled")]
CancelledClaimQueue(oneshot::Canceled),

#[error("No state for the relay parent")]
RelayParentStateNotFound,
}

/// An error happened on the validator side of the protocol when attempting
Expand Down
203 changes: 121 additions & 82 deletions polkadot/node/network/collator-protocol/src/validator_side/collation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,25 @@
//!
//! Usually a path of collations is as follows:
//! 1. First, collation must be advertised by collator.
//! 2. If the advertisement was accepted, it's queued for fetch (per relay parent).
//! 3. Once it's requested, the collation is said to be Pending.
//! 4. Pending collation becomes Fetched once received, we send it to backing for validation.
//! 5. If it turns to be invalid or async backing allows seconding another candidate, carry on
//! 2. The validator inspects the claim queue and decides if the collation should be fetched
//! based on the entries there. A parachain can't have more fetched collations than the
//! entries in the claim queue at a specific relay parent. When calculating this limit the
//! validator counts all advertisements within its view not just at the relay parent.
//! 3. If the advertisement was accepted, it's queued for fetch (per relay parent).
//! 4. Once it's requested, the collation is said to be Pending.
//! 5. Pending collation becomes Fetched once received, we send it to backing for validation.
//! 6. If it turns to be invalid or async backing allows seconding another candidate, carry on
//! with the next advertisement, otherwise we're done with this relay parent.
//!
//! ┌──────────────────────────────────────────┐
//! └─▶Advertised ─▶ Pending ─▶ Fetched ─▶ Validated

use std::{collections::VecDeque, future::Future, pin::Pin, task::Poll};
//! ┌───────────────────────────────────┐
//! └─▶Waiting ─▶ Fetching ─▶ WaitingOnValidation

use std::{
collections::{BTreeMap, VecDeque},
future::Future,
pin::Pin,
task::Poll,
};

use futures::{future::BoxFuture, FutureExt};
use polkadot_node_network_protocol::{
Expand All @@ -36,9 +45,7 @@ use polkadot_node_network_protocol::{
PeerId,
};
use polkadot_node_primitives::PoV;
use polkadot_node_subsystem_util::{
metrics::prometheus::prometheus::HistogramTimer, runtime::ProspectiveParachainsMode,
};
use polkadot_node_subsystem_util::metrics::prometheus::prometheus::HistogramTimer;
use polkadot_primitives::{
vstaging::CandidateReceiptV2 as CandidateReceipt, CandidateHash, CollatorId, Hash, HeadData,
Id as ParaId, PersistedValidationData,
Expand Down Expand Up @@ -187,12 +194,10 @@ pub struct PendingCollationFetch {
pub enum CollationStatus {
/// We are waiting for a collation to be advertised to us.
Waiting,
/// We are currently fetching a collation.
Fetching,
/// We are waiting that a collation is being validated.
WaitingOnValidation,
/// We have seconded a collation.
Seconded,
/// We are currently fetching a collation for the specified `ParaId`.
Fetching(ParaId),
/// We are waiting that a collation is being validated for the specified `ParaId`.
WaitingOnValidation(ParaId),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally and before async backing once we've reached Seconded status we knew we are done here. Right now we can only be satisfied with collations once we reach the seconding limit but I see that we do not have a corresponding status here.

When we reach the seconding limit and no more ocllations are expected what will be the status here?

Or is it nonsensical to even refer to a CollationStatus when we reach the limit? (Dont think so)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do it. The lifetime right now is Waiting -> Fetching -> WaitingOnValidation and back to Waiting. We can have an additional Satisfied or something similar which indicates that we can't accept any more collations at this relay parent.

Then on each seconded collation we can call seconded_and_pending_for_para_in_view and if we have reached the claim queue limit we can set the state to Satisfied and know that nothing can be accepted at this relay parent.

The benefits are that once the relay parent 'is complete' (all claim queue entries are filled) we won't need to run any checks in order to reject an incoming collation.

The drawbacks are that on each seconded collation we'll have to run the same function. So unless a collator is spamming us the benefit is minimal. And since we are limiting the number of advertisements per collator we can't get spammed too much.

We can explore this further but it better be a follow up task.


impl Default for CollationStatus {
Expand All @@ -201,23 +206,16 @@ impl Default for CollationStatus {
}
}

impl CollationStatus {
/// Downgrades to `Waiting`, but only if `self != Seconded`.
fn back_to_waiting(&mut self, relay_parent_mode: ProspectiveParachainsMode) {
match self {
Self::Seconded =>
if relay_parent_mode.is_enabled() {
// With async backing enabled it's allowed to
// second more candidates.
*self = Self::Waiting
},
_ => *self = Self::Waiting,
}
}
/// The number of claims in the claim queue and seconded candidates count for a specific `ParaId`.
#[derive(Default, Debug)]
struct CandidatesStatePerPara {
/// How many collations have been seconded.
pub seconded_per_para: usize,
// Claims in the claim queue for the `ParaId`.
pub claims_per_para: usize,
}

/// Information about collations per relay parent.
#[derive(Default)]
pub struct Collations {
/// What is the current status in regards to a collation for this relay parent?
pub status: CollationStatus,
Expand All @@ -226,75 +224,116 @@ pub struct Collations {
/// This is the currently last started fetch, which did not exceed `MAX_UNSHARED_DOWNLOAD_TIME`
/// yet.
pub fetching_from: Option<(CollatorId, Option<CandidateHash>)>,
/// Collation that were advertised to us, but we did not yet fetch.
pub waiting_queue: VecDeque<(PendingCollation, CollatorId)>,
/// How many collations have been seconded.
pub seconded_count: usize,
/// Collation that were advertised to us, but we did not yet request or fetch. Grouped by
/// `ParaId`.
waiting_queue: BTreeMap<ParaId, VecDeque<(PendingCollation, CollatorId)>>,
/// Number of seconded candidates and claims in the claim queue per `ParaId`.
candidates_state: BTreeMap<ParaId, CandidatesStatePerPara>,
}

impl Collations {
pub(super) fn new(group_assignments: &Vec<ParaId>) -> Self {
let mut candidates_state = BTreeMap::<ParaId, CandidatesStatePerPara>::new();

for para_id in group_assignments {
candidates_state.entry(*para_id).or_default().claims_per_para += 1;
}

Self {
status: Default::default(),
fetching_from: None,
waiting_queue: Default::default(),
candidates_state,
}
}

/// Note a seconded collation for a given para.
pub(super) fn note_seconded(&mut self) {
self.seconded_count += 1
pub(super) fn note_seconded(&mut self, para_id: ParaId) {
tdimitrov marked this conversation as resolved.
Show resolved Hide resolved
self.candidates_state.entry(para_id).or_default().seconded_per_para += 1;
gum::trace!(target: LOG_TARGET, ?para_id, new_count=self.candidates_state.entry(para_id).or_default().seconded_per_para, "Note seconded.");
}

/// Returns the next collation to fetch from the `waiting_queue`.
/// Adds a new collation to the waiting queue for the relay parent. This function doesn't
/// perform any limits check. The caller should assure that the collation limit is respected.
pub(super) fn add_to_waiting_queue(&mut self, collation: (PendingCollation, CollatorId)) {
self.waiting_queue.entry(collation.0.para_id).or_default().push_back(collation);
}

/// Picks a collation to fetch from the waiting queue.
/// When fetching collations we need to ensure that each parachain has got a fair core time
/// share depending on its assignments in the claim queue. This means that the number of
/// collations seconded per parachain should ideally be equal to the number of claims for the
/// particular parachain in the claim queue.
///
/// This will reset the status back to `Waiting` using [`CollationStatus::back_to_waiting`].
/// To achieve this each seconded collation is mapped to an entry from the claim queue. The next
/// fetch is the first unfulfilled entry from the claim queue for which there is an
/// advertisement.
///
/// Returns `Some(_)` if there is any collation to fetch, the `status` is not `Seconded` and
/// the passed in `finished_one` is the currently `waiting_collation`.
pub(super) fn get_next_collation_to_fetch(
/// `unfulfilled_claim_queue_entries` represents all claim queue entries which are still not
/// fulfilled.
pub(super) fn pick_a_collation_to_fetch(
&mut self,
finished_one: &(CollatorId, Option<CandidateHash>),
relay_parent_mode: ProspectiveParachainsMode,
unfulfilled_claim_queue_entries: Vec<ParaId>,
) -> Option<(PendingCollation, CollatorId)> {
// If finished one does not match waiting_collation, then we already dequeued another fetch
// to replace it.
if let Some((collator_id, maybe_candidate_hash)) = self.fetching_from.as_ref() {
// If a candidate hash was saved previously, `finished_one` must include this too.
if collator_id != &finished_one.0 &&
maybe_candidate_hash.map_or(true, |hash| Some(&hash) != finished_one.1.as_ref())
gum::trace!(
target: LOG_TARGET,
waiting_queue=?self.waiting_queue,
candidates_state=?self.candidates_state,
"Pick a collation to fetch."
);

for assignment in unfulfilled_claim_queue_entries {
// if there is an unfulfilled assignment - return it
if let Some(collation) = self
.waiting_queue
.get_mut(&assignment)
.and_then(|collations| collations.pop_front())
{
gum::trace!(
target: LOG_TARGET,
waiting_collation = ?self.fetching_from,
?finished_one,
"Not proceeding to the next collation - has already been done."
);
return None
return Some(collation)
}
}
self.status.back_to_waiting(relay_parent_mode);

None
}

// Returns `true` if there is a pending collation for the specified `ParaId`.
fn is_pending_for_para(&self, para_id: &ParaId) -> bool {
match self.status {
// We don't need to fetch any other collation when we already have seconded one.
CollationStatus::Seconded => None,
CollationStatus::Waiting =>
if self.is_seconded_limit_reached(relay_parent_mode) {
None
} else {
self.waiting_queue.pop_front()
},
CollationStatus::WaitingOnValidation | CollationStatus::Fetching =>
unreachable!("We have reset the status above!"),
CollationStatus::Fetching(pending_para_id) if pending_para_id == *para_id => true,
CollationStatus::WaitingOnValidation(pending_para_id)
if pending_para_id == *para_id =>
true,
_ => false,
}
}

/// Checks the limit of seconded candidates.
pub(super) fn is_seconded_limit_reached(
&self,
relay_parent_mode: ProspectiveParachainsMode,
) -> bool {
let seconded_limit =
if let ProspectiveParachainsMode::Enabled { max_candidate_depth, .. } =
relay_parent_mode
{
max_candidate_depth + 1
} else {
1
};
self.seconded_count >= seconded_limit
// Returns the number of seconded and likely soon to be seconded collations for the specified
// `ParaId`.
pub(super) fn seconded_and_pending_for_para(&self, para_id: &ParaId) -> usize {
let seconded_for_para = self
.candidates_state
.get(&para_id)
.map(|state| state.seconded_per_para)
.unwrap_or_default();
let pending_for_para = self.is_pending_for_para(para_id) as usize;

gum::trace!(
target: LOG_TARGET,
?para_id,
seconded_for_para,
pending_for_para,
"Seconded and pending for para."
);

seconded_for_para + pending_for_para
}

// Returns the number of claims in the claim queue for the specified `ParaId`.
pub(super) fn claims_for_para(&self, para_id: &ParaId) -> usize {
self.candidates_state
.get(para_id)
.map(|state| state.claims_per_para)
.unwrap_or_default()
}
}

Expand Down
Loading
Loading