-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
on restart miner shouldn't dial client #463
Conversation
75e7831
to
dc6bb03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely have no problem removing the provider restart
Regarding the "move restart to data transfer" -- this will be more clear when you're in the guts of data transfer, but there is a qualitative difference between restarting in the libp2p protocol and the concept of "stalled".
The current retry logic you identify is for the data transfer Libp2p protocol. However, data transfer is both its own libp2p protocol and go-graphsync. Go graphsync has it's own retry logic (which, side note, needs some work). A "Stall" occurs when go graphsync experiences a network error -- which occurs when it expends it's own internal retry logic, which is among other things, fairly short. At that point, the graphsync request fails and to "restart" data transfer actually has to internally make a new graphsync request. (this is when you call "Restart Data Transfer").
So we have:
- retrying the data transfer libp2p protocol (internal to go-data-transfer)
- retrying the go-graphsync libp2p protocol (internal to go-graphsync
- retrying the go-graphsync request (internal to go-data-transfer but currently ONLY when you externally call RestartDataTransfer)
I just want to identify that there's some layers of complexity here -- not saying they are neccesary, but to make a clean solution you probably need to udnerstand what they are :)
@@ -265,6 +265,7 @@ const ( | |||
ProviderEventRestart | |||
|
|||
// ProviderEventDataTransferRestartFailed means a data transfer that was restarted by the provider failed | |||
// Deprecated: this event is no longer used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to flag a thing for the future: at some point we are going to want to re-arrange these events and clean up deprecated ones.
When this happens, two things will be needed:
- a migration - see our whole migrations process using go-ds-migrations
- a protocol upgrade - meaning that we will need to define a new version on the protocol, and translate even numbers back and forth for older protocol versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: this is the markets libp2p protocol here FWIW
Thanks for the detailed explanation @hannahhoward Looking at the code for go-data-transfer it seems like it will fire the
As you mentioned, in both go-data-transfer Edit: My mistake, there is retry logic in go-data-transfer and go-fil-markets at the network layer, but the retry logic in go-graphsync is at a higher layer |
I had misunderstood how the retry logic works in go-graphysyc, in summary:
Therefore
I think to keep things moving forward we should keep the original intention of this PR: to prevent the miner from failing the deal if the miner can't contact the client on restart. We can tackle the case of the client reconnecting to the miner in a separate PR. |
This PR is currently blocked by an issue with go-graphsync: ipfs/go-graphsync#124 Essentially graphsync is not propagating some network errors up the stack, which means that when the client attempts to make a data transfer, the transfer fails silently and the deal gets stuck in This PR is currently blocked because we need to write a test that
|
When the client wants to send data to a provider:
If the miner goes down in step 3, currently the client will not automatically try to restart the channel. ProposalAfter the client opens a "push" channel to the provider
Default config params:
Edit: This proposal is implemented in filecoin-project/go-data-transfer#127 |
dc6bb03
to
fb00aa7
Compare
fb00aa7
to
5bf526e
Compare
Codecov Report
@@ Coverage Diff @@
## master #463 +/- ##
==========================================
- Coverage 65.58% 64.93% -0.64%
==========================================
Files 46 49 +3
Lines 3198 3259 +61
==========================================
+ Hits 2097 2116 +19
- Misses 875 916 +41
- Partials 226 227 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, couple suggestions. Like the ready manager refactor -- shoulda added this a bit a go!
FromMany(storagemarket.StorageDealTransferring, storagemarket.StorageDealProviderTransferAwaitRestart). | ||
ToJustRecord(). | ||
Action(func(deal *storagemarket.MinerDeal) error { | ||
deal.Message = "data transfer appears to be stalled, awaiting reconnect from client" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just want to flag that based on how I'm reading this, it looks like you want to delete the miner restart data transfer commands in Lotus when you merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I guess they can still get called if it's manual -- at least the miner bears responsibility if they do. also probably they would never actually do that anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a good point, we will need to think through if we still need those commands
Addresses filecoin-project/lotus#4991
Depends on filecoin-project/go-data-transfer#127
I think it may make more sense to let go-data-transfer take care of restarts because it already has retry logic, so
To that end in this PR:
This PR does not change the existing behaviour on the miner when the connection (but not the miner itself) goes down:
StorageDealTransferring
(don't error out the deal)This PR does not change the existing behaviour on the client when the client restarts:
Current retry behaviour in go-data-transfer is:
I suggest
1s + 5s + 25s + 125s + 5m + (10 x 5m) ~= 1 hour
Note: This will be configured in lotus (not in go-fil-markets)
Implemented in Automatically restart push channel go-data-transfer#127
Notes:
TODO: