Share peers between syncing strategies #2814

dmitry-markin · 2023-12-26T17:34:30Z

Introduce the ability to share peers between syncing strategies and reserve them for requests. This is needed to run GapSync as a separate strategy, and, ultimately, run Sync 2.0 alongside with ChainSync.

substrate/client/network/sync/src/strategy.rs

substrate/client/network/sync/src/strategy/chain_sync.rs

substrate/client/network/sync/src/strategy/warp.rs

For block and state requests. No justification requests yet.

Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com>

This reverts commit 4bb9a7b.

altonen

Did a first pass and left some comments but I will have to go over it again.

I'm not super excited about peer_best_blocks or of the fact that we store peers both in PeerPool and in each strategy and then deal with all the possible inconsistencies that follow from that on runtime. Ideally PeerPool would store strategy-specific data but I don't know how feasible that is and I'll do some testing. I also think we may experience yet unknown issues if GapSync, ChainSync and Sync2, or whatever its name will be, have independent (and potentially differing) views of peers. If peer best and common number were stored in PeerPool, all strategies could query and update them and we'd get rid of peer_best_blocks. What do you think?

I also think allowed_requests has to go because ChainSync is now checking the availability of a peer three times and the likelyhood that at one point during future refactorings one of them will go out of sync with the others is non-zero.

altonen · 2024-01-30T15:18:28Z

substrate/client/network/sync/src/strategy.rs

+	state: Option<StateStrategy<B>>,
+	chain_sync: Option<ChainSync<B, Client>>,
+	peer_pool: Arc<Mutex<PeerPool>>,
+	peer_best_blocks: HashMap<PeerId, (B::Hash, NumberFor<B>)>,


Why is this here?

It's needed to seed the peers when switching between the strategies. Otherwise, for example, the state strategy won't be aware of the best hash/number (and won't know what peer to request a state from) until the block announcement is received from a peer.

substrate/client/network/sync/src/strategy.rs

substrate/client/network/sync/src/extra_requests.rs

substrate/client/network/sync/src/strategy.rs

substrate/client/network/sync/src/strategy/warp.rs

substrate/client/network/sync/src/strategy/state.rs

substrate/client/network/sync/src/strategy/chain_sync.rs

altonen · 2024-01-30T16:22:55Z

substrate/client/network/sync/src/strategy/chain_sync.rs

-	#[must_use]
-	fn add_peer_inner(
+	/// Process new peers assigning proper states and initiating requests.
+	fn handle_new_peers(


It's unclear to me why this function is needed and why can't the new peers be handled in ChainSync::block_requests()

I'll look if this can be simplified.

altonen · 2024-01-30T16:27:21Z

substrate/client/network/sync/src/strategy/chain_sync.rs

+					self.actions.push(ChainSyncAction::CancelRequest { peer_id });
+					self.peer_pool.lock().free_peer(&peer_id);


I wonder if here's a race condition. The peer is immediately freed but the cancellation is postponed until SyncingEngine processes the event so if some other strategy selects the peer while the request hasn't been canceled, it could result in two in-flight requests. Maybe the strategy/SyncingStrategy should be notified by SyncingEngine once the request is actually canceled so that freeing the peer is safe.

It's even worse than that. When another strategy initiates the request, the old one is automatically dropped. And when we finally cancel the request, we can cancel a legitimate request of another strategy.

I'm not super happy with any kind of ACKing and bookkeeping, but it looks like we need to keep track of pending cancellations in every strategy and free peers in PeerPool when something like on_request_cancelled() is heard from SyncingEngine.

Is there a reason why pending cancellations couldn't be stored in SyncingStrategy and once the request has been canceled, SyncingEngine calls SyncingStrategy which releases the peer?

This means intercepting request cancellation actions in SyncingStrategy on their way to SyncingEngine. Not the best design ever, but this is probably better then dealing with it in every strategy.

If we store peer's best and common blocks in PeerPool, we'd get rid of peer_best_blocks, allow SyncingEngine to replace its HashMap<PeerId, Peer<B>> with PeerPool and also allow it to free the peer once the request is canceled.

But now that I'm thinking this whole approach again, is there any fundamental reason why GapSync and ChainSync couldn't send a request to the same peer at the same time? I believe this limitation only applies to the sending end, namely the strategies, and made sense before when everything was one state machine. The request handlers shouldn't care how many requests from the same peer are in the queue. If GapSync and ChainSync are separate, is there a valid reason why they couldn't send simultaneous requests to a peer? We must ascertain that two copies of the same request are not sent because that'd get us banned, implying an existence of some kind of PeerPool for shared data, but is there a reason why they couldn't send two different requests at the same time? So the limitation of one request per peer still applies but now it's per strategy.

is there any fundamental reason why GapSync and ChainSync couldn't send a request to the same peer at the same time?

Valid point, there is nothing in the block request handler that forbids "simultaneous" requests. But if we get rid of the global PeerPool, we'll need to introduce a way to cancel specific requests, as otherwise strategies could cancel requests of each other.

Each strategy could be identified by a unique key and when strategy returns StrategyAction::SendRequest { PeerId, Request }, SyncingStrategy would convert it to SyncingAction::SendRequest { PeerId, Key, Request } and PendingResponses would keep track of responses with (PeerId, Key).

Yes, that's what I had in mind — attaching strategy IDs to requests.

Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com>

substrate/client/network/sync/src/extra_requests.rs

dmitry-markin · 2024-01-31T08:03:16Z

substrate/client/network/sync/src/strategy/chain_sync.rs

+				common_number: Zero::zero(),
+				best_hash,
+				best_number,
+				state: PeerSyncState::New,


A new peer and an Available peer are handled differently in the code, and the idea was to postpone peer initialization that can lead to requests (handle_new_peers()) until actions(), so that we can call strategies in the specific order and implement the priorities of reserving the peers (e.g., Sync 2.0, then ChainSync, then GapSync).

dmitry-markin · 2024-01-31T08:03:45Z

substrate/client/network/sync/src/strategy/chain_sync.rs

-	#[must_use]
-	fn add_peer_inner(
+	/// Process new peers assigning proper states and initiating requests.
+	fn handle_new_peers(


I'll look if this can be simplified.

substrate/client/network/sync/src/strategy/chain_sync.rs

substrate/client/network/sync/src/strategy/state.rs

skunert · 2024-02-06T09:22:11Z

substrate/client/network/sync/src/strategy/chain_sync.rs

+
+						(Some(PeerSyncState::Available), None)
+					} else {
+						if self.peer_pool.try_reserve_peer(&peer_id) {


Nit: Can we move this if to else if one line above? All the indentation in this file makes it a bit hard to follow.

skunert · 2024-02-06T09:37:13Z

substrate/client/network/sync/src/strategy/chain_sync.rs

+								 as the peer is reserved by another syncing strategy.",
+							);
+
+							(None, None)


Nit: I think you could directly return None here, then could get rid of the if let Some() few lines below.

skunert · 2024-02-06T09:43:30Z

substrate/client/network/sync/src/strategy/chain_sync.rs

-				Ok(None)
-			},
-		}
+					self.allowed_requests.add(&peer_id);


Hmm not directly related to this PR, but how does this allowed_requests work? It looks like it contains peers that we could potentially send a block request to. However, I don't fully understand why we reset it regularly to All, like here

polkadot-sdk/substrate/client/network/sync/src/strategy/chain_sync.rs

Line 1051 in 62dea92

self.allowed_requests.set_all();

I don't completely understand the logic behind it too, it looks like the only useful thing it's doing is blocking block request during state download in fast sync.

skunert · 2024-02-06T10:00:33Z

substrate/client/network/sync/src/strategy/chain_sync.rs

 		let mut matcher = self.extra_justifications.matcher();
 		std::iter::from_fn(move || {
-			if let Some((peer, request)) = matcher.next(peers) {
+			if let Some((peer_id, request)) = matcher.next(peers, peer_pool) {
+				// TODO: reserve the peer in `PeerPool`.


Leftover todo?

Edit: Ah, its done already in next call it looks like.

skunert · 2024-02-06T10:17:42Z

substrate/client/network/sync/src/strategy/chain_sync.rs

+			for mut available_peer in self.peer_pool.lock().available_peers() {
+				let peer_id = available_peer.peer_id();
+				if let Some(peer) = self.peers.get_mut(&peer_id) {
+					if peer.state.is_available() && peer.common_number >= sync.target_number() {


Here we check for peer.state.is_available() even though it is in the peer_pool as available. I understand that the pool is shared between the strategies, but I am not sure whether it is legal for the peer to be available in the peer pool but not in chain_sync.

Yes, should not be needed.

dmitry-markin · 2024-02-06T11:29:23Z

@skunert Thanks for reviewing the PR, but I'm about to publish another one that should completely supersede it. So, please don't spend more time on reviewing this one for now.

dmitry-markin · 2024-02-12T11:17:02Z

Closing in favor of #3224.

@skunert

This PR should supersede #2814 and accomplish the same with less changes. It's needed to run sync strategies in parallel, like running `ChainSync` and `GapSync` as independent strategies, and running `ChainSync` and Sync 2.0 alongside each other. The difference with #2814 is that we allow simultaneous requests to remote peers initiated by different strategies, as this is not tracked on the remote node in any way. Therefore, `PeerPool` is not needed. CC @skunert --------- Co-authored-by: Sebastian Kunert <skunert49@gmail.com>

@skunert

This PR should supersede paritytech#2814 and accomplish the same with less changes. It's needed to run sync strategies in parallel, like running `ChainSync` and `GapSync` as independent strategies, and running `ChainSync` and Sync 2.0 alongside each other. The difference with paritytech#2814 is that we allow simultaneous requests to remote peers initiated by different strategies, as this is not tracked on the remote node in any way. Therefore, `PeerPool` is not needed. CC @skunert --------- Co-authored-by: Sebastian Kunert <skunert49@gmail.com>

dmitry-markin added the T0-node This PR/Issue is related to the topic “node”. label Dec 26, 2023

dmitry-markin mentioned this pull request Dec 26, 2023

Extract warp sync strategy from ChainSync #2467

Merged

altonen reviewed Jan 9, 2024

View reviewed changes

Base automatically changed from dm-warp-sync-strategy to master January 12, 2024 16:47

dmitry-markin added 6 commits January 15, 2024 11:22

Defer new peer actions in ChainSync

796112d

Introduce PeerPool for per-strategy peer allocation

125f9a7

Prepare SyncingStrategy for parallel strategies

be88848

Reserve peers for WarpSync strategy in PeerPool

6ed85cd

Reserve peers for StateStrategy in PeerPool

3aaa9ef

WIP: Reserve peers for ChainSync in PeerPool

02259bc

For block and state requests. No justification requests yet.

dmitry-markin force-pushed the dm-gap-sync-strategy branch from 3664ba2 to 02259bc Compare January 15, 2024 09:40

dmitry-markin and others added 19 commits January 15, 2024 15:20

minor: fix compilation + rustfmt

264723b

Apply suggestions from code review

4bb9a7b

Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com>

Partially revert "Apply suggestions from code review"

a1cec8c

This reverts commit 4bb9a7b.

minor: docs

e6c2b3a

Apply review suggestions

1db2ad1

Simplify peer allocation from PeerPool for requests

17f6bbe

Update WarpStrategy and StateStrategy tests

fd53c44

WIP: fix ChainSync tests

fa64365

Fix sync restart and ChainSync tests

272fbc6

Respect PeerPool when making extra justification requests

d8008e6

Cancel a stale state request when restarting the sync

06295e2

Merge remote-tracking branch 'origin/master' into dm-gap-sync-strategy

4913702

Merge remote-tracking branch 'origin/master' into dm-gap-sync-strategy

d4edc3f

Simplify startegy actions conversion into top-level SyncingAction

cf93e7f

Add PRDoc

0add17e

Test peers reservation in WarpSync

9800af8

Test peers reservation by StateStrategy

56f92a8

Test PeerPool

cd6cfe3

WIP: test peer reservation by ChainSync

7d7f936

Test peer reservation in PeerPool by ChainSync

2605f2a

dmitry-markin changed the title ~~[draft] Share peers between syncing strategies~~ Share peers between syncing strategies Jan 30, 2024

dmitry-markin marked this pull request as ready for review January 30, 2024 14:24

dmitry-markin requested review from altonen and skunert January 30, 2024 14:29

altonen reviewed Jan 30, 2024

View reviewed changes

Apply suggestions from code review

acc7a76

Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com>

dmitry-markin commented Jan 31, 2024

View reviewed changes

dmitry-markin added 7 commits January 31, 2024 10:15

Apply suggestions from code review

da9dd82

Apply review suggestions

94a07a6

Merge remote-tracking branch 'origin/master' into dm-gap-sync-strategy

646f0a5

minor: log message

7aa59a3

Rename extra_requests.rs -> justification_requests.rs, add docs

8d6a6d7

Hide mutex behind PeerPool interface

0254680

Simplify schedule_next_peer()

62dea92

skunert reviewed Feb 6, 2024

View reviewed changes

dmitry-markin mentioned this pull request Feb 6, 2024

Prepare syncing for parallel sync strategies #3224

Merged

dmitry-markin closed this Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share peers between syncing strategies #2814

Share peers between syncing strategies #2814

dmitry-markin commented Dec 26, 2023 •

edited

Loading

altonen left a comment

altonen Jan 30, 2024

dmitry-markin Jan 31, 2024

altonen Jan 30, 2024

dmitry-markin Jan 31, 2024

altonen Jan 30, 2024

dmitry-markin Feb 1, 2024

altonen Feb 1, 2024

dmitry-markin Feb 1, 2024

altonen Feb 2, 2024

dmitry-markin Feb 2, 2024

altonen Feb 5, 2024

dmitry-markin Feb 5, 2024 •

edited

Loading

dmitry-markin Jan 31, 2024

dmitry-markin Jan 31, 2024

skunert Feb 6, 2024

skunert Feb 6, 2024

skunert Feb 6, 2024

dmitry-markin Feb 6, 2024

skunert Feb 6, 2024

skunert Feb 6, 2024

dmitry-markin Feb 6, 2024

dmitry-markin commented Feb 6, 2024

dmitry-markin commented Feb 12, 2024

		self.actions.push(ChainSyncAction::CancelRequest { peer_id });
		self.peer_pool.lock().free_peer(&peer_id);

Share peers between syncing strategies #2814

Share peers between syncing strategies #2814

Conversation

dmitry-markin commented Dec 26, 2023 • edited Loading

altonen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmitry-markin Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmitry-markin commented Feb 6, 2024

dmitry-markin commented Feb 12, 2024

dmitry-markin commented Dec 26, 2023 •

edited

Loading

dmitry-markin Feb 5, 2024 •

edited

Loading