Prepare syncing for parallel sync strategies #3224

dmitry-markin · 2024-02-06T11:37:28Z

This PR should supersede #2814 and accomplish the same with less changes. It's needed to run sync strategies in parallel, like running ChainSync and GapSync as independent strategies, and running ChainSync and Sync 2.0 alongside each other.

The difference with #2814 is that we allow simultaneous requests to remote peers initiated by different strategies, as this is not tracked on the remote node in any way. Therefore, PeerPool is not needed.

Build upon #2467.

CC @skunert

substrate/client/network/sync/src/pending_responses.rs

substrate/client/network/sync/src/strategy.rs

substrate/client/network/sync/src/engine.rs

…ponse`

skunert

The logic LGTM! Just left some questions and minor remarks.

substrate/client/network/sync/src/strategy.rs

skunert · 2024-02-13T09:45:40Z

substrate/client/network/sync/src/strategy.rs

While reading I was asking myself if it was feasible to use a message based system for the individual strategies too. Instead of the syncing engine calling individual on_XY handlers of the strategy, it could pass a message down to the individual strategies, and each strategy decides if it wants to handle it.

Would get rid of strategy knowing which substrategy is interested in the individual results. But yeah this is more an educational question for me (and maybe there are cases were this does not even work), nothing to act upon.

Firstly, I'd like to comment on the feasibility of a fully async interface with strategies. When we implemented a bidirectional async interface between ProtocolController and Notifications, it turned out that it breaks the constraints of the originally half-synchronous interface (where we call on_XY handlers on ProtocolController and poll it as a stream for actions), requiring some tricks with handling of duplicate messages and discarding some "invalid" messages. We decided to just live with some smaller inconsistencies, because the proper implementation would require complex ACKing system with state machines in both ProtocolController and Notifications with lots of states. In syncing, our goal is to focus on a syncing state machines, and not on a message passing state machines — this is why we got rid of all the polling in the strategies.

On the other hand, what could work is introducing synchronous subscriber-like system, where we call a generic on_event(event: Event) handler that dispatches the events to specific strategies down the tree. Logically, this would be the same as calling specific on_XY handlers, but could replace the manual matches on active strategies with a subscriber-looking system, where a strategy instead registers itself for specific events. I'm not sure though if it's possible to have a "proper" Rust implementation of this without downcasting of abstract events when they reach specific strategies — otherwise, it looks like event matching would just move to strategies with a burden of them knowing about all the event types.

@altonen do you have something to add?

What Sebastian is proposing is similar to the trait approach I've been harping about. If we store the active strategies in HashMap<StategyKey, Box<dyn Strategy>>, SyncingStrategy could iterate over all active strategies when handling an event and each strategy can decide whether it wants to handle that particular event. Of course if a key is provided, e.g., when a response is received, then SyncingStrategy would only call the specified strategy. This would clean up much of the code in SyncingStrategy and would allow plugging custom syncing implementations.

Like you described in the second paragraph, I don't see how Sebastian's proposal necessarily implies any async code though. I think the code would still work the way it does now but instead of SyncingStrategy checking explicitly if a strategy could be interested in an event, it won't make any assumptions, passes the event to the strategy and if it's not interested, it will just ignore it.

Yes, the second paragraph captures pretty well what I meant. When we at some point add more strategies with different response handlers we would just add another message instead of adding a new on_XY on Strategy and the concrete implementation.

If I got it right, what @altonen is proposing implies that all strategies handle all event types. I.e., should provide on_XY handlers for all XY, implementing some generic Strategy trait, even though they may not be interested in all XY.

Co-authored-by: Sebastian Kunert <skunert49@gmail.com>

…rategies

@skunert

This PR should supersede paritytech#2814 and accomplish the same with less changes. It's needed to run sync strategies in parallel, like running `ChainSync` and `GapSync` as independent strategies, and running `ChainSync` and Sync 2.0 alongside each other. The difference with paritytech#2814 is that we allow simultaneous requests to remote peers initiated by different strategies, as this is not tracked on the remote node in any way. Therefore, `PeerPool` is not needed. CC @skunert --------- Co-authored-by: Sebastian Kunert <skunert49@gmail.com>

dmitry-markin added 5 commits February 6, 2024 11:21

Prepare SyncingStrategy for parallel strategies

ed077bd

Implement From for strategy actions conversion

6ade1a9

Refactor ChainSync::restart and request cancellation

4fbe8c7

Rename extra_requests.rs -> justification_requests.rs, add docs

5455027

Tag requests by strategy keys and route responses using these keys

c85c669

dmitry-markin added R0-silent Changes should not be mentioned in any release notes T0-node This PR/Issue is related to the topic “node”. labels Feb 6, 2024

dmitry-markin requested a review from altonen February 6, 2024 11:37

Rename Key->StrategyKey, get rid of Unpin impl

0d245f1

altonen approved these changes Feb 6, 2024

View reviewed changes

Apply review suggestions

d75df9c

dmitry-markin requested a review from skunert February 12, 2024 11:15

dmitry-markin mentioned this pull request Feb 12, 2024

Share peers between syncing strategies #2814

Closed

Check for active strategy in on_state_response & `on_warp_proof_res…

81b7ae6

…ponse`

skunert approved these changes Feb 13, 2024

View reviewed changes

dmitry-markin and others added 3 commits February 13, 2024 13:05

Apply suggestions from code review

abe9ea2

Co-authored-by: Sebastian Kunert <skunert49@gmail.com>

Apply review suggestions

63a1fbc

Merge remote-tracking branch 'origin/master' into dm-parallel-sync-st…

8738160

…rategies

dmitry-markin added this pull request to the merge queue Feb 13, 2024

Merged via the queue into master with commit 96ebb30 Feb 13, 2024
129 of 130 checks passed

dmitry-markin deleted the dm-parallel-sync-strategies branch February 13, 2024 20:25

nazar-pc mentioned this pull request Apr 5, 2024

Update Substrate to 1.8.0 autonomys/subspace#2667

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare syncing for parallel sync strategies #3224

Prepare syncing for parallel sync strategies #3224

dmitry-markin commented Feb 6, 2024 •

edited

Loading

skunert left a comment

skunert Feb 13, 2024

dmitry-markin Feb 13, 2024

altonen Feb 13, 2024

skunert Feb 13, 2024

dmitry-markin Feb 13, 2024

Prepare syncing for parallel sync strategies #3224

Prepare syncing for parallel sync strategies #3224

Conversation

dmitry-markin commented Feb 6, 2024 • edited Loading

skunert left a comment

Choose a reason for hiding this comment

skunert Feb 13, 2024

Choose a reason for hiding this comment

dmitry-markin Feb 13, 2024

Choose a reason for hiding this comment

altonen Feb 13, 2024

Choose a reason for hiding this comment

skunert Feb 13, 2024

Choose a reason for hiding this comment

dmitry-markin Feb 13, 2024

Choose a reason for hiding this comment

dmitry-markin commented Feb 6, 2024 •

edited

Loading