Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Backing Networking #929

Open
rphmeier opened this issue Oct 11, 2021 · 1 comment
Open

Optimize Backing Networking #929

rphmeier opened this issue Oct 11, 2021 · 1 comment
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.

Comments

@rphmeier
Copy link
Contributor

This assumes paritytech/polkadot#3779 .

I did some back-of-the-envelope calculations about network bandwidth used with some ideal parameters (16MB PoV, 1000 validators, 100 parachain cores) and we can see that bandwidth is dominated by backing.

  • Parachain Networking Figures
    • Relating PoV Size, Number of Validators (N_V), Number of Backers (N_B), and Number of Parachains (N_P), Checkers Required (N_C)
    • Validators have to fetch the PoV from a collator in T_C and distribute it to other validators in their group in T_V
    • Backing:
      • With contextual execution, we have a 12 second window.
      • 12 seconds: collator sends -> seconder validates -> seconder sends -> others validate -> gossip
      • 2s -> 3s -> 2s -> 3s -> 1s
      • Transferring in 2s requires PoV/2 MBps or (PoV * 8 / 2)Mbps between validator and collator
        • With 16MB PoVs, that's 64Mbps upload by the collator and download by the validator
      • Transferring in 2s to other members of the group requires another PoV*(N_B-1)*8/2 Mbps up.
        • With 16MB PoVs and backing groups of size 5, that's another 256Mbps up from the seconder
      • This gets a lot better with Torrent-style fetching for PoVs: high-level #968, as it ensures that bandwidth isn't wasted.
    • Availability:
      • Every 6s, each validator needs to fetch N_P chunks of size (PoV/N_V)*3.
        • At 16MB and 1000 validators, that's roughly 50KB per chunk. At 100 parachains, that's 50MB downloaded overall, so 400Mb/6s or 66Mbps down.
      • The backers each need to serve N_V/N_B chunks of size (PoV/N_V)*3
        • At 16MB, 1000 validators, 100 cores, that's 200 chunks of 50KB, so another 1MB up, assuming that validators distribute their requests to backers randomly. This is every 6 seconds, so that's 8/6 Mbps or ~1.3Mbps
    • Approval:
      • Every 6s, each validator needs to recover data for ~ N_P*(N_C/N_V) cores, although this is bursty as it's a poisson distribution.
      • As a counterpart, every 6s each validator needs to provide chunks for N_P*(N_C/N_V)/3 requesters.
      • With 1000 validators, 20 checkers, 100 parachains, and 16MB PoV, we're looking at validators recovering about 100*(1/50)*16MB or 32MB of chunks. That'd be 256Mb over 6 seconds, or another 40Mbps down. This can be burstier.
      • For upload with the same parameters: 100*(1/50)/3 chunks served on average, with each chunk having 3*(16MB/1000) size. That's around 73KB of upload or 598Kbps per 6 seconds, so 99.6Kbps
    • Disputes are rare, so shouldn't require much extra bandwidth. Basically an occasional 16MB download/upload by each validator.
    • Overall, with desired parameters, we're looking at
      • Validator Upload: 256Mbps(backing) + 1.3Mbps(availability) + 100Kbps (approval) = ~260Mbps (dominated by backing!)
      • Validator Download: 64Mbps (backing, seconding) + 64Mbps (backing, download) + 66 Mbps (availability) + 40Mbps (approval) = 234Mbps
    • This doesn't account for latency at all, but as backing is the most bandwidth-intensive component and also the most latency-intensive component, it'll make most sense to optimize our networking for the backing pipeline. Contextual execution will help this substantially.
@pepyakin
Copy link
Contributor

Leave it here for persistence:

  1. Off-chain XCMP, in the current line of thought, will also be a consumer of bandwidth. Mostly, we can think it to be using pretty much the same resources as the PoV, so we can just say that it will eat up some of the 16 MiB of the PoV. However, on top of that there is additional bandwidth for the collators of the receiving chains to recover the messages. We were thinking that there should be a cooperative case, i.e. that the collators of the sending network should serve the messages, however, that doesn't help us here since we still need to reserve bandwidth for that anyway.
  2. Off-chain code. AFAIU, this will require hand offs between the validators of the current and the next set. This doesn't seem too bad taking into account a large session window.

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
@the-right-joyce the-right-joyce added I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. and removed I10-optimisation labels Aug 25, 2023
helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024
…ytech#929)

Bumps [decode-uri-component](https://github.com/SamVerschueren/decode-uri-component) from 0.2.0 to 0.2.2.
- [Release notes](https://github.com/SamVerschueren/decode-uri-component/releases)
- [Commits](SamVerschueren/decode-uri-component@v0.2.0...v0.2.2)

---
updated-dependencies:
- dependency-name: decode-uri-component
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.
Projects
None yet
Development

No branches or pull requests

3 participants