-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shred Repair Request #34771
Shred Repair Request #34771
Conversation
61eacd0
to
5f7dc3d
Compare
validator/src/main.rs
Outdated
@@ -1691,20 +1709,6 @@ pub fn main() { | |||
} else { | |||
(None, None) | |||
}; | |||
admin_rpc_service::run( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why the rpc service is initialized before Node::new
. i wonder if we break anything by moving it after?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was weary of this. Haven't been able to spot any problems or noticed any negative effects in my testing.
Here's what's now happening before Admin RPC service starts up that used to happen after:
- Initialize gossip host IP address (from command line param, entrypoints, or localhost)
- Initialize gossip address (from gossip host IP + command line port or available port 0/1)
- Initialize TPU address (from command line param)
- Initialize TPU forward address (from command line param)
- Initialize cluster entrypoints contact info vector
- Initialize node struct w/ contact and socket info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to move the repair socket to post init structure for consistency, which will also allow us to not change the order here
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #34771 +/- ##
=========================================
- Coverage 81.8% 81.7% -0.1%
=========================================
Files 825 825
Lines 223160 223249 +89
=========================================
+ Hits 182553 182593 +40
- Misses 40607 40656 +49 |
Things appear to be working as expected after adding outstanding repair requests. I'm seeing the following logs:
Running with the following command: Performed testing with the following temp logging diff:
|
61c54a7
to
5cbaeb0
Compare
5cbaeb0
to
fca1ad8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, test results and code looks good.
Problem
It would be nice to have the ability to manually trigger a shred repair
Summary of Changes
Admin RPC/CLI interface for manually initiating shred repair. Inputs include pubkey of node to request repair from, slot number, and shred index.
Added a test to confirm that the serialization / translation from inputs to packet to Repair Request object is working.
Tested that this was working on testnet by spinning up a node, issuing a repair call to
9YVpEeZf8uBoUtzCFC6SSFDDqPt16uKFubNhLvGxeUDy
, and logging incoming repaired shred through TVU