-
-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connection closed by remote #636
Comments
Idea for maybe getting more infos from the server side: instead of relying on the log messages that come over the wire (or rather don't come if the connection gets killed), configure the borg on the server to use logging to a local file. logging.conf: https://paste.thinkmo.de/Nu7rdXvz#logging.conf export BORG_LOGGING_CONF=/path/to/that/logging.conf |
The borg log sadly doesn't actually contain any useful information, it just says
Here is the log from the client side, running with BORG_RSH="ssh -v"
|
ok, if that 1 line is all you have in the server log, we at least know it is not the borg process on the server side that encounters some exception or other error - that would be in that log. so, i guess you have to go through the list of suspicions from my first post, especially the ones relating to ssh itself and the network layer. another idea is to enhance logging on both sides with a logging.conf and include the timestamps, so we get timing information. also interesting would be to add another piece of information to the protocol of the experiments:
|
To confirm, I do indeed have
In my /etc/ssh/ssh_config file, however the issue still occurs.
To address these questions, I created 10 random files and then tried to backup these files (twice) while running mtr, here's the results
and the output from mtr
My router is an Asus RT-AC87R running AsusWRT-Merlin, DoS protection is disabled and QoS is also off. |
@Azelphur just as a weird idea: could you remove these ssh options (both on server and on client side, restart sshd afterwards) and run your random-files test again? |
jborg/attic#337 http://z9.io/2008/12/10/how-to-fix-ssh-timeout-problems/ (esp. first comment of Kimmo) |
yes, those settings can help but if there's no response, the session can still be terminated. I would also say remove the settings and attempt to leave ssh connected and see what the results are, outside of borg. |
Ok...here I am, 6 months later. I finally solved it. There was an issue with my router, I was running an app on the router which does speedtests every 15 minutes. After disabling this app, the problem no longer occurs. Not sure exactly why the issue occurred beyond that, but if anyone else gets this issue and happens to be running some services on their router - disable those services and see if you still get it. :) |
@Azelphur thanks for the feedback. |
Adding to this topic (unless you prefer a new issue) that killing the ssh/gpg-agent will make |
That's strange. We tested this with killing the ssh client, which works as expected (ie. terminates with an error message). |
@drzraf please give the precise command how you do the kill. |
Sorry for the false flag. The timeouts seen were not a sign of hanging (it was indeed uploading): the process was still ongoing. In case these "select() timeout" are of interest anyway here is my command:
(I use such a backup name in order to "resume" from the previous The fact I observed was that remote repository size was not growing after multiple resumed |
@drzraf you do not need to use a special name to "resume" a past backup. just create a new backup with any name you like and it will use what is already in the repo (so this feels like a resume operation, but technically it is not). |
IMHO you cannot prevent this type of error. As one user pointed out it may have to do with the router or - e.g. - with VPNs/proxies whatever... In any case I would much more like it, if Bork is able to circumvent this issue by automatically retrying the backup (maybe 3 or a configurable number of) times. |
I am getting this error consistently directly connected to a gigabit switch. Doesn't seem to be happening on the same file though. (I just reran it and it redid a bunch of files and went past the one that it hung on before. |
BTW I actually made a small shell wrapper script for borg, which circumvents this issue by just retrying the backup (as I suggested before). See: https://github.com/rugk/borg-cron-helper |
Thank you @rugk. I am also affected by randomly dropped connections so your script is very helpful. If anyone wants to automate their backups without an additional dependency, the most barebone approach would be: If you want to control how often to restart the backup or include some exponential backoff time between attempts, look at https://github.com/kadwanev/retry, which is a very small bash retry script that is generally useful. |
My comment on issue #3988 might also help. Specifically: In the server's (the machine running
This will cause the server to send a keep alive to the client every 10 seconds. If 30 consecutive keepalives are sent without a response, the connection will be terminated. If you then run your borg commands with |
Add "SSH Configuration" section Add "SSH Configuration" section to "borg serve" documentation, to outline ssh/sshd configuration to prevent borg serve keeping a lock on a repo in the event the ssh connection is abnormally disconnected. In response to issues #3988, #636 and #4485 (and probably others).
The documentation has been updated, I think this issue can now be closed? |
yup, thanks! |
I am seeing this issue on BorgBase when syncing the indexes of an existing repo - could it be that it isn't configured correctly? |
@m3nu ^ |
Thanks for the pointer, @ThomasWaldmann . I did NOT have those values in my configs, but the default, which is:
Just updated this for current and future setups. Hope this solves the issue @akvadrako is observing. |
Thanks @m3nu - it turns out I was seeing this errors while over a VPN. If I see them again I'll let you know. |
I think the actual documentation of the ssh fix referred to here is
I'm trying out these steps now... |
Not sure if enforcing |
Yeah, turns out the ssh tweaks above didn't help me. Sigh... not sure what to try next - have raised #7313 |
Yeah same for me, tried all the above tweaks but raises the "Connection closed by remote host" error after a while no matter what |
Borgmatic has a retry setting that will just retry the backup. Decrease checkpoint time if you want. I keep a long FAQ on the issue here: https://docs.borgbase.com/faq/#my-ssh-connection-breaks-after-a-long-backup-or-prune-operation May give some new ideas. |
@m3nu That retry script, does it retry on rc==1? That would be bad... |
Good point. Borgmatic doesn't, but the general retry script treats it as error and retries 10x. I've added a note on this in the FAQ. Good to mention it. |
Thanks a lot! But tbh a bit janky but it's working i guess I actually ended up setting the tcp keepalive to no coz it gave me better results but my tests were not that precise so i may be wrong |
sometimes a borg backup gets interrupted with "connection closed by remote".
it is unclear why this happens or whether this is a problem in borg, in the ssh configuration or on the network layer.
Suspicions:
See also: http://www.cyberciti.biz/tips/open-ssh-server-connection-drops-out-after-few-or-n-minutes-of-inactivity.html
The text was updated successfully, but these errors were encountered: