-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share peers between syncing strategies #2814
Conversation
For block and state requests. No justification requests yet.
3664ba2
to
02259bc
Compare
Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com>
This reverts commit 4bb9a7b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass and left some comments but I will have to go over it again.
I'm not super excited about peer_best_blocks
or of the fact that we store peers both in PeerPool
and in each strategy and then deal with all the possible inconsistencies that follow from that on runtime. Ideally PeerPool
would store strategy-specific data but I don't know how feasible that is and I'll do some testing. I also think we may experience yet unknown issues if GapSync
, ChainSync
and Sync2
, or whatever its name will be, have independent (and potentially differing) views of peers. If peer best and common number were stored in PeerPool
, all strategies could query and update them and we'd get rid of peer_best_blocks
. What do you think?
I also think allowed_requests
has to go because ChainSync
is now checking the availability of a peer three times and the likelyhood that at one point during future refactorings one of them will go out of sync with the others is non-zero.
state: Option<StateStrategy<B>>, | ||
chain_sync: Option<ChainSync<B, Client>>, | ||
peer_pool: Arc<Mutex<PeerPool>>, | ||
peer_best_blocks: HashMap<PeerId, (B::Hash, NumberFor<B>)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed to seed the peers when switching between the strategies. Otherwise, for example, the state strategy won't be aware of the best hash/number (and won't know what peer to request a state from) until the block announcement is received from a peer.
#[must_use] | ||
fn add_peer_inner( | ||
/// Process new peers assigning proper states and initiating requests. | ||
fn handle_new_peers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear to me why this function is needed and why can't the new peers be handled in ChainSync::block_requests()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look if this can be simplified.
self.actions.push(ChainSyncAction::CancelRequest { peer_id }); | ||
self.peer_pool.lock().free_peer(&peer_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if here's a race condition. The peer is immediately freed but the cancellation is postponed until SyncingEngine
processes the event so if some other strategy selects the peer while the request hasn't been canceled, it could result in two in-flight requests. Maybe the strategy/SyncingStrategy
should be notified by SyncingEngine
once the request is actually canceled so that freeing the peer is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's even worse than that. When another strategy initiates the request, the old one is automatically dropped. And when we finally cancel the request, we can cancel a legitimate request of another strategy.
I'm not super happy with any kind of ACKing and bookkeeping, but it looks like we need to keep track of pending cancellations in every strategy and free peers in PeerPool
when something like on_request_cancelled()
is heard from SyncingEngine
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why pending cancellations couldn't be stored in SyncingStrategy
and once the request has been canceled, SyncingEngine
calls SyncingStrategy
which releases the peer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means intercepting request cancellation actions in SyncingStrategy
on their way to SyncingEngine
. Not the best design ever, but this is probably better then dealing with it in every strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we store peer's best and common blocks in PeerPool
, we'd get rid of peer_best_blocks
, allow SyncingEngine
to replace its HashMap<PeerId, Peer<B>>
with PeerPool
and also allow it to free the peer once the request is canceled.
But now that I'm thinking this whole approach again, is there any fundamental reason why GapSync
and ChainSync
couldn't send a request to the same peer at the same time? I believe this limitation only applies to the sending end, namely the strategies, and made sense before when everything was one state machine. The request handlers shouldn't care how many requests from the same peer are in the queue. If GapSync
and ChainSync
are separate, is there a valid reason why they couldn't send simultaneous requests to a peer? We must ascertain that two copies of the same request are not sent because that'd get us banned, implying an existence of some kind of PeerPool
for shared data, but is there a reason why they couldn't send two different requests at the same time? So the limitation of one request per peer still applies but now it's per strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any fundamental reason why GapSync and ChainSync couldn't send a request to the same peer at the same time?
Valid point, there is nothing in the block request handler that forbids "simultaneous" requests. But if we get rid of the global PeerPool
, we'll need to introduce a way to cancel specific requests, as otherwise strategies could cancel requests of each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each strategy could be identified by a unique key and when strategy returns StrategyAction::SendRequest { PeerId, Request }
, SyncingStrategy
would convert it to SyncingAction::SendRequest { PeerId, Key, Request }
and PendingResponses
would keep track of responses with (PeerId, Key)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I had in mind — attaching strategy IDs to requests.
Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com>
common_number: Zero::zero(), | ||
best_hash, | ||
best_number, | ||
state: PeerSyncState::New, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new peer and an Available
peer are handled differently in the code, and the idea was to postpone peer initialization that can lead to requests (handle_new_peers()
) until actions()
, so that we can call strategies in the specific order and implement the priorities of reserving the peers (e.g., Sync 2.0, then ChainSync
, then GapSync
).
#[must_use] | ||
fn add_peer_inner( | ||
/// Process new peers assigning proper states and initiating requests. | ||
fn handle_new_peers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look if this can be simplified.
|
||
(Some(PeerSyncState::Available), None) | ||
} else { | ||
if self.peer_pool.try_reserve_peer(&peer_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we move this if to else if
one line above? All the indentation in this file makes it a bit hard to follow.
as the peer is reserved by another syncing strategy.", | ||
); | ||
|
||
(None, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think you could directly return None
here, then could get rid of the if let Some()
few lines below.
Ok(None) | ||
}, | ||
} | ||
self.allowed_requests.add(&peer_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not directly related to this PR, but how does this allowed_requests
work? It looks like it contains peers that we could potentially send a block request to. However, I don't fully understand why we reset it regularly to All
, like here
self.allowed_requests.set_all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't completely understand the logic behind it too, it looks like the only useful thing it's doing is blocking block request during state download in fast sync.
let mut matcher = self.extra_justifications.matcher(); | ||
std::iter::from_fn(move || { | ||
if let Some((peer, request)) = matcher.next(peers) { | ||
if let Some((peer_id, request)) = matcher.next(peers, peer_pool) { | ||
// TODO: reserve the peer in `PeerPool`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover todo?
Edit: Ah, its done already in next
call it looks like.
for mut available_peer in self.peer_pool.lock().available_peers() { | ||
let peer_id = available_peer.peer_id(); | ||
if let Some(peer) = self.peers.get_mut(&peer_id) { | ||
if peer.state.is_available() && peer.common_number >= sync.target_number() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we check for peer.state.is_available()
even though it is in the peer_pool
as available. I understand that the pool is shared between the strategies, but I am not sure whether it is legal for the peer to be available in the peer pool but not in chain_sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, should not be needed.
@skunert Thanks for reviewing the PR, but I'm about to publish another one that should completely supersede it. So, please don't spend more time on reviewing this one for now. |
Closing in favor of #3224. |
This PR should supersede #2814 and accomplish the same with less changes. It's needed to run sync strategies in parallel, like running `ChainSync` and `GapSync` as independent strategies, and running `ChainSync` and Sync 2.0 alongside each other. The difference with #2814 is that we allow simultaneous requests to remote peers initiated by different strategies, as this is not tracked on the remote node in any way. Therefore, `PeerPool` is not needed. CC @skunert --------- Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
This PR should supersede paritytech#2814 and accomplish the same with less changes. It's needed to run sync strategies in parallel, like running `ChainSync` and `GapSync` as independent strategies, and running `ChainSync` and Sync 2.0 alongside each other. The difference with paritytech#2814 is that we allow simultaneous requests to remote peers initiated by different strategies, as this is not tracked on the remote node in any way. Therefore, `PeerPool` is not needed. CC @skunert --------- Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
Introduce the ability to share peers between syncing strategies and reserve them for requests. This is needed to run
GapSync
as a separate strategy, and, ultimately, run Sync 2.0 alongside withChainSync
.