Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.18: Add in metrics for detecting Redundant Pulls (backport of #199) #251

Merged
merged 1 commit into from
Mar 14, 2024

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Mar 14, 2024

Problem

We had previously added in a metric for tracking gossip push messages through the network in PR: #32725. However, this metric does not account for redundant pull requests.
Redundant Pull: A node receives a message via PullResponse and then receives the same message via Push.
Redundant Pulls prevent us from accurately calculating how well messages are propagating via Push.

Summary of Changes

Add in a metric to report when we receive a Push for a message we already (and first) received via PullResponse

  • num_push_dups changed to num_push_recv and now tracks the number of times we have received a push message.
  • add a new metric to cluster_info_crds_stats called num_redundant_pull_responses. It counts the number of times a unique message is received via a Redundant Pull

Identifying redundant Pulls:

  1. Receive a message via PullRequest that successfully updates crds.table, set the num_push_recv of this message to 0. Set num_push_recv to 1 if it is a PushMessage
  2. Receive the same message again but via Push, it will fail to insert. Since the already existing entry has num_push_recv == 0, we know this is a Redundant Pull.
  3. Increment CrdsStats.num_redundant_pull_responses

Calculating fraction of messages received via Redundant Pull:

num_redundant_pull_responses / (num_redundant_pull_responses + all-push)

Monogon Simulation: % of redundant pulls in a 100 validator cluster

We see a mean Redundant Pull percentage of ~0.2%. But it does seem to increase with the number of validators
Screenshot 2024-03-11 at 8 09 13 PM

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 95.65217% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 81.6%. Comparing base (585a413) to head (4cb9fec).

Additional details and impacted files
@@           Coverage Diff           @@
##            v1.18     #251   +/-   ##
=======================================
  Coverage    81.6%    81.6%           
=======================================
  Files         827      827           
  Lines      224480   224496   +16     
=======================================
+ Hits       183262   183297   +35     
+ Misses      41218    41199   -19     

@gregcusack gregcusack merged commit 04356e7 into v1.18 Mar 14, 2024
33 checks passed
@gregcusack gregcusack deleted the mergify/bp/v1.18/pr-199 branch March 14, 2024 20:59
anwayde pushed a commit to firedancer-io/agave that referenced this pull request Jul 23, 2024
…-xyz#199) (anza-xyz#251)

Add in metrics for detecting Redundant Pulls (anza-xyz#199)

(cherry picked from commit d49ceb0)

Co-authored-by: Greg Cusack <greg.cusack@anza.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants