Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Integrate Prospective Parachains Subsystem into Backing: Part 2 #5618

Closed

Conversation

rphmeier
Copy link
Contributor

@rphmeier rphmeier commented May 31, 2022

Follows on to #5557 (currently based upon it, but I will rebase on the feature branch once that's merged)

Closes #5055 . More details about the intended changeset can be found in this issue.

The high-level of the goals of the networking changes for asynchronous backing. We are coordinating 2 upgrades: a runtime API upgrade (v2 -> v3) and a network message upgrade (v1 -> v2). The idea is to do the network protocol upgrade first and have it be compatible with the messages that are needed for both runtime v2 and v3. Peers which are both on net-v2 will send each other messages and continue operating even after the runtime API upgrade. Peers which are on net-v1 will be useless for statement distribution after the runtime API upgrade. Until the runtime API upgrade, nodes running net-v2 will continue circulating statements to peers on net-v1, because it's backwards compatible. We will do something similar in the collator-protocol as well. Nothing changes in bitfield distribution, availability distribution, availability recovery, approval-distribution, or dispute-distribution, so outdated nodes can still continue interoperating on those protocols.

The main changes that asynchronous backing makes to backing are that 1) validators can legally second more than one candidate per relay-parent and 2) candidates can stick around for longer than the relay-parent remains a leaf in the block-tree. This has numerous implications for spam prevention which are described in detail in #5055

Work in this PR:

  • Introduces a new network protocol vstaging which is identical to v1 (with the exception of changes for statement distribution). This will become net-v2 later.
  • Enabled vstaging as the default with v1 as a fallback under the 'network-protocol-staging' feature-flag
  • Adapts the network bridge to handle vstaging network messages properly
  • Updates network subsystems to gracefully handle vstaging
  • Updates statement distribution to handle asynchronous backing. Update vstaging network protocol accordingly. This is the bulk of the work.
  • Test new statement distribution logic

@rphmeier
Copy link
Contributor Author

rphmeier commented Jun 6, 2022

Posting some notes on a couple questions I was thinking about - large statements, spam, and topology. At the end there is a proposed solution to all the issues.

  • How do we handle large statements for asynchronous backing?
    • In the current form of statement distribution, we pile up all statements depending on a large CommittedCandidateReceipt and wait until we've fetched the candidate before processing them. Detecting dependent statements is easy because we can just check the candidate hash in the compact statement.
    • With asynchronous backing, we may have candidates that depend on other candidates. We might get a candidate and check where it might appear in the fragment tree and find nothing. This could be spam, or it could be dependent on some Seconded statement that we've yet to fetch the candidate for.
    • The best solution I can think of is to detect spam lazily, at least when it comes to Seconded statements. The rule is that we can't detect spam candidates until all of the fetches that were ongoing for the same para when the candidate was received have concluded. When we complete the last pending fetch started before the candidate was received, then we can definitively say whether the candidate is spam. For Valid statements, we can simply use the current spam detection mechanism of disallowing them unless the Seconded statement is known.
    • We don't import or forward anything until we definitively know it's not spam. Furthermore, we don't initiate fetches for unconfirmed large-statements until we know they're not spam. This may mean we'll only make one large-statement request per para at a time under certain circumstances.
    • We'll need some kind of upper-bound threshold on the number of potentially-spam candidates we're willing to field at any given time in general and from each peer. It should be n_validators * (max_depth + 1) per relay-parent, uniformly distributed - that is max_depth + 1 per validator per relay-parent. This is an over-estimate and we should also be careful only to actually import one per depth per validator per active leaf. If a peer ever sends us more than two candidates with a duplicate depth under an active-leaf then it's spam. This also means that we have to do large-statement fetches just in order to determine if candidates we're receiving are spam.
  • Gossip Topology for asynchronous backing statement distribution
    • With asynchronous backing, statement distribution gets a few more restrictions on which messages can be sent at any time - i.e. we can't send statements about a candidate Y until its parent X is known by the peer. The only way we learn whether a candidate is known by a peer is if they send it to us or we send it to them.
    • This is potentially incompatible with the gossip topology: the peer A who sends some target peer P the candidate X will likely not be the same as B who wants to send P the candidate Y. If X and Y have different authors and therefore appear elsewhere in the topology, this is almost certainly the case.
    • What we can do is make sure that every peer makes sure that other peers in in their row and column are aware of which candidates they have.
    • We can introduce a new message "NoteAware" which is sent to peers (in their row/column) on every candidate they're now aware of and is not forwarded. This will add some traffic overhead (mostly in the number of messages) which scales linearly in the number of validators and proportionally to sqrt(n_peers). Peers shouldn't send candidates which build on some other candidate until they're received a "NoteAware" from the intended recipient. This would also solve the spam prevention issue described above.

@slumber slumber force-pushed the rh-async-backing-integration-2 branch from 1300b20 to 5c8e048 Compare June 22, 2022 19:01
@burdges
Copy link
Contributor

burdges commented Sep 8, 2022

Are Views also tracking nodes assigned roles? We'd presumably send initial backing statements only to other backers, and only gossip double backing statements to everybody, for example.

@rphmeier
Copy link
Contributor Author

rphmeier commented Sep 9, 2022

Are Views also tracking nodes assigned roles? We'd presumably send initial backing statements only to other backers, and only gossip double backing statements to everybody, for example.

They don't directly but do indirectly - we can achieve the same effect based on node network public keys + chain state, which is what we usually do.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A3-in_progress Pull request is in progress. No review needed at this stage.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants