-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant upload and repair failures/alerts #1779
Comments
I have a few questions.
cc @n8maninger it seems like migrations fail because all hosts return "internal error" when executing the read sector instruction. Any thoughts on what the renter might do wrong on all these hosts simultaneously to cause that error within the |
I can provide the debug log files in a few hours |
Ok so the issue template is updated and the curl examples should work now. Logs might still be helpful but the odd issue here is all the hosts returning "internal error". In theory it's pretty clear what that error means in the context of reading a sector (it's not the renter's fault) but the odd thing is that so many hosts return the same error at the same time. We updated |
I just checked how often uploads fail. Seems like there are a lot more failures than I anticipated. This is a little more than a week of constantly (trying to) upload; |
That is very useful. Probably won't get to it today but I did grep my way through them to see what we can tackle next week. Ok so 48742 lines of logs total
cc @peterjan |
I would like to emphasize the error I mentioned before.
An upload can get stuck on this state indefinitely without throwing an alert in the UI and the API call doesn't get interrupted. Even with smaller files, and waiting for over a day. Here's the debug log from genenis up till now. It might be able to add some context that's missing now. EDIT: I could share the log file of the other node as well, though it looks like more of the same |
Current Behavior
I'm currently uploading on two different Renterd nodes. Both of them keep getting stuck on uploads via the API every now and then, and both of them have reoccuring failures with uploads and repairs. Here's a current list of active alerts of one of them;
https://gist.github.com/CtrlAltDefeat94/7f3a180705390b8b6b97c7521dc21d79
(gist because of size limit)
Expected Behavior
.
Steps to Reproduce
I'm uploading data via
/worker/objects/{path}?bucket=default
Version
V1.1.1
What operating system did the problem occur on (e.g. Ubuntu 22.04, macOS 12.0, Windows 11)?
Docker
Autopilot Config
{ "contracts": { "set": "autopilot", "amount": 50, "allowance": "4909550423995516048300000000", "period": 6048, "renewWindow": 2016, "download": 1400000000000, "upload": 1400000000000, "storage": 5000000000000, "prune": true }, "hosts": { "allowRedundantIPs": false, "maxDowntimeHours": 336, "minProtocolVersion": "1.6.0", "maxConsecutiveScanFailures": 10, "scoreOverrides": null } }
Bus Config
These all give a 404 and I cant figure all the new endpoints. I dont think they are too related to my issue. But I can provide the results of the corrent endpoints if need be.
Anything else?
The config on the other node with this issue is identical aside from the 30/60 redundancy with 90 hosts.
Unrelated note; The autopilot config endpoint was updated to http://localhost:9980/api/autopilot/config.
The text was updated successfully, but these errors were encountered: