v1.18: Add in metrics for detecting Redundant Pulls (backport of #199) #251
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
We had previously added in a metric for tracking gossip push messages through the network in PR: #32725. However, this metric does not account for redundant pull requests.
Redundant Pull: A node receives a message via
PullResponse
and then receives the same message viaPush
.Redundant Pulls prevent us from accurately calculating how well messages are propagating via
Push
.Summary of Changes
Add in a metric to report when we receive a
Push
for a message we already (and first) received viaPullResponse
num_push_dups
changed tonum_push_recv
and now tracks the number of times we have received a push message.cluster_info_crds_stats
callednum_redundant_pull_responses
. It counts the number of times a unique message is received via a Redundant PullIdentifying redundant Pulls:
PullRequest
that successfully updatescrds.table
, set thenum_push_recv
of this message to 0. Setnum_push_recv
to 1 if it is aPushMessage
Push
, it will fail to insert. Since the already existing entry hasnum_push_recv == 0
, we know this is a Redundant Pull.CrdsStats.num_redundant_pull_responses
Calculating fraction of messages received via Redundant Pull:
Monogon Simulation: % of redundant pulls in a 100 validator cluster
We see a mean Redundant Pull percentage of ~0.2%. But it does seem to increase with the number of validators
![Screenshot 2024-03-11 at 8 09 13 PM](https://private-user-images.githubusercontent.com/11299967/311930199-a9f2626b-e510-4d18-9d30-7e9f7e8a65a2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzODgwNTcsIm5iZiI6MTczOTM4Nzc1NywicGF0aCI6Ii8xMTI5OTk2Ny8zMTE5MzAxOTktYTlmMjYyNmItZTUxMC00ZDE4LTlkMzAtN2U5ZjdlOGE2NWEyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDE5MTU1N1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQ2YWE3NWMwOTBiYjE3NGFlM2E2NGIyNmY4OTU5NWZlMDg4NWZkMzQwNDg3ZTU4NjEzMDg4ZWRlYWRiMmJhNzMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.oK3TG5G04y3AwsmMmEtHilWhb9Qki4ozl7UaTm7A4y8)