Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: snappy downloader #5393

Merged
merged 42 commits into from
Nov 7, 2024
Merged

Fix: snappy downloader #5393

merged 42 commits into from
Nov 7, 2024

Conversation

jcnelson
Copy link
Member

@jcnelson jcnelson commented Oct 28, 2024

This fixes a few bugs in the relayer and networking stack:

  • It removes a convoy effect that can happen when the node is under load. Before, the channel between the p2p thread and relayer thread could grow unbounded if the relayer couldn't keep up with bursts of NetworkResults. In this PR, the p2p thread merges outstanding NetworkResults into a single NetworkResult and drops / consolidates obsolete data, which both minimizes the relayer's total workload and minimizes the time between receiving a data-bearing message and processing it.

  • It fixes the block downloader so that it detects and deprioritizes unhealthy replicas during block download, so that most of the time, the node is only querying replicas that can serve it data. It also improves error and retry logging in the downloader.

  • To stress-test the downloader, it adds an option to disable block-push altogether, so the node is forced to download everything

  • It fixes an off-by-one error in the p2p stack which was preventing it from caching reward sets. Instead, the p2p stack would always fetch reward sets from disk, which lead to performance degradation.

@jcnelson jcnelson requested a review from a team as a code owner October 28, 2024 20:55
jferrant
jferrant previously approved these changes Oct 28, 2024
jcnelson and others added 20 commits October 28, 2024 17:48
… so that we only forward results that contain blocks (drop tx and stackerdb messages)
…ded), and merge un-sent NetworkResult's in order to keep the queue length bound to at most one outstanding NetworkResult
… and clean out completed tenures based on whether or not we see them processed
@jcnelson jcnelson changed the title Fix: drain relayer channel Fix: snappy downloader Nov 1, 2024
@jcnelson jcnelson requested a review from jferrant November 1, 2024 21:13
obycode
obycode previously approved these changes Nov 5, 2024
Copy link
Contributor

@obycode obycode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jferrant
Copy link
Collaborator

jferrant commented Nov 5, 2024

I think this breaks simple_neon_integration test. I don't see this failing anywhere else (passes on develop with prom metrics enabled). It seems to be there was a change to the prometheus metric in this PR that is screwing it up.

Copy link
Collaborator

@jferrant jferrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will reapprove once simple_neon_integration test is fixed.

Copy link
Member

@kantai kantai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- will approve once the prom test issue is resolved

… reward set for nakamoto prepare phases eagerly, and pass the stacks tip height via NetworkResult to the relayer so it can update prometheus
stackslib/src/net/p2p.rs Outdated Show resolved Hide resolved
Copy link
Member

@kantai kantai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some logging comments

@jcnelson jcnelson requested a review from kantai November 7, 2024 18:53
Copy link
Contributor

@obycode obycode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jcnelson jcnelson dismissed jferrant’s stale review November 7, 2024 20:39

All tests are passing

@jcnelson jcnelson added this pull request to the merge queue Nov 7, 2024
Merged via the queue into develop with commit 9c5d822 Nov 7, 2024
1 check passed
@blockstack-devops
Copy link
Contributor

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@stacks-network stacks-network locked as resolved and limited conversation to collaborators Nov 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants