Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update epoch slots to include all missing slots #8276

Merged
merged 8 commits into from
Feb 17, 2020

Conversation

pgarg66
Copy link
Contributor

@pgarg66 pgarg66 commented Feb 14, 2020

Problem

Some shreds could be missing across the cluster (e.g. due to turbine failure). Such shreds will be repaired by all/most nodes. Need a mechanism to detect that most of the cluster is missing shreds/slots, so that the such shreds could be pushed thru turbine again.

Summary of Changes

This change augments epoch_slots information that's pushed thru gossip. It'll include information about all incomplete slots in the current epoch. Since the list could be large, it's getting compressed using gzip.

@codecov
Copy link

codecov bot commented Feb 14, 2020

Codecov Report

Merging #8276 into master will increase coverage by <.1%.
The diff coverage is 96.8%.

@@           Coverage Diff            @@
##           master   #8276     +/-   ##
========================================
+ Coverage    80.6%   80.6%   +<.1%     
========================================
  Files         253     253             
  Lines       55247   55386    +139     
========================================
+ Hits        44550   44683    +133     
- Misses      10697   10703      +6

.expect("expected to find at least one slot");
let last_slot = incomplete_slots
.iter()
.last()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this iterator implements DoubleEndedIterator , next_back() is a more efficient way to get the last item without iterating over the entire set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LMK when you are done with the review. I'll update the PR with the required changes.

aeyakovenko
aeyakovenko previously approved these changes Feb 14, 2020
Copy link
Member

@aeyakovenko aeyakovenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you guys are fast!

@mergify mergify bot dismissed aeyakovenko’s stale review February 14, 2020 20:32

Pull request has been modified.

@pgarg66
Copy link
Contributor Author

pgarg66 commented Feb 17, 2020

@carllin , the latest patch should start limiting the cache size, instead of limiting it based on roots.

@aeyakovenko , there are two separate changes we can make down the line.

  1. Split cache (last N completed slots), and stash (missing slots in last epoch, that are older than cache) in separate gossip keys
  2. If the cache/stash grows beyond MTU size, design a mechanism to fragment/reassemble these messages via multiple MTU sized chunks.

@aeyakovenko
Copy link
Member

@pgarg66 we just need to do 1. And replace the oldest stash. This would also reduce the churn in gossip

@pgarg66
Copy link
Contributor Author

pgarg66 commented Feb 17, 2020

@pgarg66 we just need to do 1. And replace the oldest stash. This would also reduce the churn in gossip

Cool. We are already pruning the old stash in current PR. I'll merge this PR. We can do leftover from #1 as a separate PR

@pgarg66 pgarg66 merged commit 0d5c123 into solana-labs:master Feb 17, 2020
@pgarg66 pgarg66 deleted the fix-repair branch February 17, 2020 22:36
@pgarg66 pgarg66 added the v0.23 label Feb 20, 2020
mergify bot pushed a commit that referenced this pull request Feb 20, 2020
* Update epoch slots to include all missing slots

* new test for compress/decompress

* address review comments

* limit cache based on size, instead of comparing roots

(cherry picked from commit 0d5c123)

# Conflicts:
#	Cargo.lock
pgarg66 added a commit to pgarg66/solana that referenced this pull request Feb 20, 2020
* Update epoch slots to include all missing slots

* new test for compress/decompress

* address review comments

* limit cache based on size, instead of comparing roots
pgarg66 added a commit that referenced this pull request Feb 20, 2020
* Update epoch slots to include all missing slots

* new test for compress/decompress

* address review comments

* limit cache based on size, instead of comparing roots
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants