-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetching cache layers times out, resets sector back to PC1 #3563
Comments
I feel your pain Rob PC1 completely finished, over to Precommit2, which fails due to the transfer issue. Not sure who or why wrote it like that but it's ridiculous. |
And thats the 4th failure;
|
Here you can clearly see something glitched up during transfer, yet the other ones were solid:
|
And again:
|
This still happens in 0.6.0:
Worker log:
|
The retries don't really work in this case; see my comment here: #3720 |
I'm seeing this same thing, also after upgrading my network. Is the solution to this year-old bug really to downgrade the network? Transferring cache files between workers takes hours at 1Gbps. |
transferring 400GB of data at 1Gbps (125MB/s) is supposed to take about an hour |
Yes, I have seen cache transfer take around an hour when there's only one--but another 5-10 min each for the 32G sealed and unsealed sector files. What I was probably thinking of is something I recently started seeing, which is that sometimes jobs end up being shuttled among workers. If multiple sectors start sealing around the same time, when one inevitably fails (I haven't yet been able to identify why), it's taken up by a different worker. And if more than one fails, there can be multiple simultaneous transfers from or to a single worker sharing the 1Gbps connection, making them each take 2+ hours. |
Hi @RobQuistNL Thanks for the report. This issue is a bit outdated and with the new scheduler improvements in the upcoming release - I think this issue will be resolved completely. Closing ticket for now - should it still be a issue, please let me know and will re-open the ticket and triage it again. Thank you ! |
Describe the bug
Since I now have a remote PC2 worker, I've added SFP+ cards to all my machines and connected them. Ever since I did so, I have done 1 succesful PC2, and 3 failed ones which have reset the PC2 state back to PC1.
To Reproduce
Connect lotus-miner to a worker which does PC2
Expected behavior
The fetching completes normally.
On a failure, it retries, instead of deleting the cache layers and re-starting PC1.
Miner sector log
Worker logs
Version 0.5.10
I'm probably going to have to reset it all back to an 1gbit connection, because slow transfers that succeed are better than fast transfers that throw away 4 hours of PC1 work :(
The text was updated successfully, but these errors were encountered: