eth, p2p/msgrate: move peer QoS tracking to its own package and use it for snap #22876
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Way back when we wrote the downloader in 2015, we've added various mechanisms to try and estimate the live throughput of our connected peers. The rationale was that a node will always have faster and slower peers too, so assigning the same amount of retrieval task to each would result in wildly different response times. This is problematic in chain syncing, because a very slow peer can block faster ones from delivering, as we need to import (verify) the data as a stream. By live tracking the capacities of the different peers, we could make delivery times the same independent of how fast a peer was, so chain data import was fully stabilized.
A second important feature that this idea turned into was automatic adjustment to local resources (e.g. bandwidth). If a node has many many peers connected, those might have enough capacity to overload the local node. By dynamically tracking how much data peers can deliver us, we implicitly also ensure that we never request more than what we can download from them globally. This ensures that the downloader is more or less stable across various connection types and can work even on satellite with huge latencies.
Whilst the mechanism was effective, it was hacked in all over the downloader. With snap sync being implemented completely outside of the original downloader, it became obvious we need to abstract out all that capacity estimation and rate limiting logic if we want to reuse it elsewhere too. Thus this PR, which does exactly that. It introduces a
p2p/msgrate
package to track and limit requests based on delivery times and amounts; it refactors the downloader to use the new package instead of the old mechanisms; and it also adds the new traffic shaping to the snap sync.Performance wise snap syncing with the old code vs. new code takes the same amount of time.
What we can notice with this PR is that whilst previously snap sync had quite random serving times (at 100KB request caps) across the different peers:
This PR stabilizes the serving times so they hover around the same mean and have a nice normal distribution, all whilst allowing packet sizes to range between 64KB and 512KB.
Packet counts also go down as the PR can detect that larger packets can also be fulfilled within the same time limits, so there's no need for so many round trips.