Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix additional edge cases in fast sync process #5663

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 27 additions & 8 deletions substrate/client/network/sync/src/strategy/chain_sync.rs
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ impl Metrics {
}
}

#[derive(Debug, Clone)]
enum AllowedRequests {
Some(HashSet<PeerId>),
All,
Expand Down Expand Up @@ -1555,18 +1556,21 @@ where
let attrs = self.required_block_attributes();
let blocks = &mut self.blocks;
let fork_targets = &mut self.fork_targets;
let last_finalized =
std::cmp::min(self.best_queued_number, self.client.info().finalized_number);
let info = self.client.info();
let best_number = info.best_number;
let best_hash = info.best_hash;
let last_finalized = std::cmp::min(self.best_queued_number, info.finalized_number);
let best_queued = self.best_queued_number;
let client = &self.client;
let queue_blocks = &self.queue_blocks;
let allowed_requests = self.allowed_requests.take();
let allowed_requests = self.allowed_requests.clone();
let max_parallel = if is_major_syncing { 1 } else { self.max_parallel_downloads };
let max_blocks_per_request = self.max_blocks_per_request;
let gap_sync = &mut self.gap_sync;
let disconnected_peers = &mut self.disconnected_peers;
let metrics = self.metrics.as_ref();
self.peers
let requests = self
.peers
.iter_mut()
.filter_map(move |(&id, peer)| {
if !peer.state.is_available() ||
Expand Down Expand Up @@ -1610,6 +1614,8 @@ where
max_blocks_per_request,
last_finalized,
best_queued,
best_number,
best_hash,
) {
peer.state = PeerSyncState::DownloadingNew(range.start);
trace!(
Expand Down Expand Up @@ -1665,7 +1671,13 @@ where
None
}
})
.collect()
.collect::<Vec<_>>();

if !requests.is_empty() {
liuchengxu marked this conversation as resolved.
Show resolved Hide resolved
self.allowed_requests.take();
}

requests
}

/// Get a state request scheduled by sync to be sent out (if any).
Expand Down Expand Up @@ -2040,16 +2052,23 @@ fn peer_block_request<B: BlockT>(
max_parallel_downloads: u32,
max_blocks_per_request: u32,
finalized: NumberFor<B>,
best_num: NumberFor<B>,
best_queued_num: NumberFor<B>,
best_number: NumberFor<B>,
best_hash: B::Hash,
) -> Option<(Range<NumberFor<B>>, BlockRequest<B>)> {
if best_num >= peer.best_number {
// Nothing to download from the peer via normal block requests.
if best_number == peer.best_number && best_hash == peer.best_hash {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if peer.best_number > best_number? The fix won't work and the duplicate block request will be issued?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The fix is incomplete in this case, the block requests would include both the duplicate and new blocks, so the gap sync will still be interrupted unexpectedly. I'll need some more time to think about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm running out of ideas on how to completely avoid duplicate block requests. The challenge lies in how duplicate block processing interferes with the gap sync state.

Why duplicate blocks pollute the gap sync state

  1. Block request confusion: The main issue in the current chain sync is that multiple types of block requests are made, but the handler can't distinguish between them. As a result, blocks requested from one type of request can interfere with another sync component. For example, when duplicate blocks in the range [finalized_number, best_number] are requested via peer_block_request(), once these blocks are received and enqueued, gap_sync.best_queued_number is updated to best_number.

    if let Some(gap_sync) = &mut self.gap_sync {
    if number > gap_sync.best_queued_number && number <= gap_sync.target {
    gap_sync.best_queued_number = number;
    }
    }

    This causes a problem because no further blocks will be requested from peer_gap_block_request() as gap_sync.best_queued_number == peer.best_number, leading to a failure in downloading the block history.

  2. Impact on gap_sync_complete detection: The detection of whether gap_sync is complete can be affected by importing duplicate blocks, as I explained in the second commit message.

The second point can be updated to let gap_sync_complete = self.gap_sync.is_some() && self.client.info().block_gap.is_none(); which is more reliable too. However, I still haven’t found a solution for the issue with gap_sync.best_queued_number outlined in the first point.

To ensure gap sync functions as expected, I suggest detecting whether a block response contains duplicate blocks in gap sync and preventing those duplicates from being processed further. This would prevent pollution of the gap sync state and allow block history to be downloaded correctly. Thoughts? @dmitry-markin

return None;
}

if best_queued_num >= peer.best_number {
// Will be downloaded as alternative fork instead.
return None;
} else if peer.common_number < finalized {
trace!(
target: LOG_TARGET,
"Requesting pre-finalized chain from {:?}, common={}, finalized={}, peer best={}, our best={}",
id, peer.common_number, finalized, peer.best_number, best_num,
id, peer.common_number, finalized, peer.best_number, best_queued_num,
);
}
let range = blocks.needed_blocks(
Expand Down
Loading