Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth/protocols, prp/tracker: add support for req/rep rtt tracking #22608

Merged
merged 4 commits into from
Apr 22, 2021

Conversation

karalabe
Copy link
Member

@karalabe karalabe commented Apr 1, 2021

This PR is a counterpart to #22581, adding metrics to the request/reply RTTs too.

Given that we (currently) do not have a centralized dispatcher which matches requests with replies, this PR introduces a global request tracker (per protocol). The tracker can be notified of outbound requests and inbound replies, and it will match them up and update the relevant metrics.

Since the request/reply pattern is stateful, there are a few caveats in the tracker:

  • The tracker maintains a list of all currently pending requests. The only way to remove an item from this list is to either deliver a response; or for the request to time out. The timeout however is not the same as the downloader might use, rather a global number per protocol (5 min for eth, 1 min for snap). This allows us to a) detect loooong deliveries even if the calling code timed out and discarded the request; b) track expiries with a doubly linked list, making deletions and expiry O(1).
  • Given that we track every send request, these can accumulate quite a bit if we have a huge peer count. In theory it shouldn't go over the number of peers / packet type, but it can temporarily on sync restarts or peer churn. I've added a metric to display these too to avoid surprises, but we might consider some hard enforced limit to make sure we can't go OOM due to some bug.

Open questions:

  • I've used one singleton global tracker per protocol. Creating a scoped tracker in package eth (backend) and propagating it to all relevant call sites would become quite nasty; and our metrics can't differentiate between 2 node instances running within the same process either way. May or may not be a goo idea, wanted to emphasize it.
  • We could fairly easily add the possibility to track request/reply sizes too. Unsure if this would be more help or more noise.

TL;DR

Screenshot from 2021-04-01 10-25-52

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, if you add some kind of hard limit to the number of tracked requests (it can be large though)

@karalabe
Copy link
Member Author

@holiman PTAL

Added a cap, looks like this when hit (I set it to 1 to test it):

Screenshot from 2021-04-21 14-20-16

@karalabe karalabe added this to the 1.10.3 milestone Apr 21, 2021
@karalabe karalabe merged commit 1fb9a6d into ethereum:master Apr 22, 2021
atif-konasl pushed a commit to frozeman/pandora-execution-engine that referenced this pull request Oct 15, 2021
…ereum#22608)

* eth/protocols, prp/tracker: add support for req/rep rtt tracking

* p2p/tracker: sanity cap the number of pending requests

* pap/tracker: linter <3

* p2p/tracker: disable entire tracker if no metrics are enabled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants