Optimize Backing Networking #929

rphmeier · 2021-10-11T14:38:42Z

I did some back-of-the-envelope calculations about network bandwidth used with some ideal parameters (16MB PoV, 1000 validators, 100 parachain cores) and we can see that bandwidth is dominated by backing.

Parachain Networking Figures
- Relating PoV Size, Number of Validators (N_V), Number of Backers (N_B), and Number of Parachains (N_P), Checkers Required (N_C)
- Validators have to fetch the PoV from a collator in T_C and distribute it to other validators in their group in T_V
- Backing:
  - With contextual execution, we have a 12 second window.
  - 12 seconds: collator sends -> seconder validates -> seconder sends -> others validate -> gossip
  - 2s -> 3s -> 2s -> 3s -> 1s
  - Transferring in 2s requires PoV/2 MBps or (PoV * 8 / 2)Mbps between validator and collator
    - With 16MB PoVs, that's 64Mbps upload by the collator and download by the validator
  - Transferring in 2s to other members of the group requires another PoV*(N_B-1)*8/2 Mbps up.
    - With 16MB PoVs and backing groups of size 5, that's another 256Mbps up from the seconder
  - This gets a lot better with Torrent-style fetching for PoVs: high-level #968, as it ensures that bandwidth isn't wasted.
- Availability:
  - Every 6s, each validator needs to fetch N_P chunks of size (PoV/N_V)*3.
    - At 16MB and 1000 validators, that's roughly 50KB per chunk. At 100 parachains, that's 50MB downloaded overall, so 400Mb/6s or 66Mbps down.
  - The backers each need to serve N_V/N_B chunks of size (PoV/N_V)*3
    - At 16MB, 1000 validators, 100 cores, that's 200 chunks of 50KB, so another 1MB up, assuming that validators distribute their requests to backers randomly. This is every 6 seconds, so that's 8/6 Mbps or ~1.3Mbps
- Approval:
  - Every 6s, each validator needs to recover data for ~ N_P*(N_C/N_V) cores, although this is bursty as it's a poisson distribution.
  - As a counterpart, every 6s each validator needs to provide chunks for N_P*(N_C/N_V)/3 requesters.
  - With 1000 validators, 20 checkers, 100 parachains, and 16MB PoV, we're looking at validators recovering about 100*(1/50)*16MB or 32MB of chunks. That'd be 256Mb over 6 seconds, or another 40Mbps down. This can be burstier.
  - For upload with the same parameters: 100*(1/50)/3 chunks served on average, with each chunk having 3*(16MB/1000) size. That's around 73KB of upload or 598Kbps per 6 seconds, so 99.6Kbps
- Disputes are rare, so shouldn't require much extra bandwidth. Basically an occasional 16MB download/upload by each validator.
- Overall, with desired parameters, we're looking at
  - Validator Upload: 256Mbps(backing) + 1.3Mbps(availability) + 100Kbps (approval) = ~260Mbps (dominated by backing!)
  - Validator Download: 64Mbps (backing, seconding) + 64Mbps (backing, download) + 66 Mbps (availability) + 40Mbps (approval) = 234Mbps
- This doesn't account for latency at all, but as backing is the most bandwidth-intensive component and also the most latency-intensive component, it'll make most sense to optimize our networking for the backing pipeline. Contextual execution will help this substantially.

pepyakin · 2021-10-15T11:17:02Z

Leave it here for persistence:

Off-chain XCMP, in the current line of thought, will also be a consumer of bandwidth. Mostly, we can think it to be using pretty much the same resources as the PoV, so we can just say that it will eat up some of the 16 MiB of the PoV. However, on top of that there is additional bandwidth for the collators of the receiving chains to recover the messages. We were thinking that there should be a cooperative case, i.e. that the collators of the sending network should serve the messages, however, that doesn't help us here since we still need to reserve bandwidth for that anyway.
Off-chain code. AFAIU, this will require hand offs between the validators of the current and the next set. This doesn't seem too bad taking into account a large session window.

…ytech#929) Bumps [decode-uri-component](https://github.com/SamVerschueren/decode-uri-component) from 0.2.0 to 0.2.2. - [Release notes](https://github.com/SamVerschueren/decode-uri-component/releases) - [Commits](SamVerschueren/decode-uri-component@v0.2.0...v0.2.2) --- updated-dependencies: - dependency-name: decode-uri-component dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

rphmeier added the I10-optimisation label Oct 11, 2021

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

the-right-joyce added I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. and removed I10-optimisation labels Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Backing Networking #929

Optimize Backing Networking #929

rphmeier commented Oct 11, 2021

pepyakin commented Oct 15, 2021

Optimize Backing Networking #929

Optimize Backing Networking #929

Comments

rphmeier commented Oct 11, 2021

pepyakin commented Oct 15, 2021