Replies: 11 comments 8 replies
-
I am familiar with the rsync+hardlinks method, I also used such scripts for at least a decade. But at some time I felt "there should be something better than this meanwhile" and started to search. Then found attic and a bit later borg was forked. :-) Rsync with hardlinks doesn't do much: it just copies files to backup IF they are modified or creates a hardlink if not. And (IIRC) the modifcation check by default does not even involve a checksum, but only compares mtime (or ctime?) timestamp. Borg has a lot more to do (and that's the reason why it is slower (for first backup) and uses more CPU and RAM):
The key/value store being used as storage is the reason why you can't switch of deduplication in borg - it is not able to store data without deduplicating it. But nobody would want that anyway. What you can try:
Much more important than the first backup run are the many runs after that and these are usually much faster as a lot of data did not change and is already in the repository. Thus, borg can usually process about 1000 files per second in a backup run and finish only after a few minutes (depends on a lot of factors, of course). "Freezing on a file":
WiFi: a lot of people use borg over WiFi (including me). Slow for first backup, fast for all other backups. Of course a LAN connection would be faster (esp. for first backup, so that is another option you have). Kinds of borg deduplication:
|
Beta Was this translation helpful? Give feedback.
-
From a recent backup (using borg 1.2.0rc1+):
Notable:
Setup: workstation class machine --> ~30Mbit/s internet connection --> small and rather old mini server |
Beta Was this translation helpful? Give feedback.
-
First, thank you for the intelligent response. That's much appreciated. I'm getting tired of CSRs that tell me that if I have a timezone problem in such and such device, then I need to rebuild my house. 😬 I don't think it is worth going into the finer points right now because I'm thinking my CPU may be outdated for borg. I see this in
AFAIK it does not have AES or SHA built in. I built this computer in April 2015 as far as I can tell. And to tell you the truth I did not remember what I put in. Maybe it is time to swap the board and CPU for something more powerful?? I am definitely going to replace the disks soon because they are quite old (older than the server itself, actually) and maybe make other changes to the server at the same time. I have a server rack coming today. I have maybe a couple things that are currently rack mount and was planing to put stuff on shelves until I rotate my hardware out and replace it with rack mount... but maybe I should swap the file server for a rack mount server. Suggestions appreciated, if there are any suggestions to make. That server is only used for storing backups. I'm not planing to have a media server in my house any time soon, and I run Home Assistant and Zoneminder on their own servers. |
Beta Was this translation helpful? Give feedback.
-
That's an older atom family cpu - rather cheap and weak (even as new part and by back-then standards, 2013). No AES acceleration and (of course) also no sha256 acceleration (which is only present in some rather new CPUs). But, on the repo server (== where borg serve is executed), neither AES nor sha256 are used by borg. The CPU heaviest operation there are crc32 computations (borg check, repository part) and also the sshd process uses quite some CPU. So it might be fine as long as repo index and borg fit into RAM. On the client (== where borg create is executed) you have a decent CPU with AES hw acceleration, but no sha256 hw acceleration (I guess, check cpu flags). Most of borg's processing happens client-side. So, use repokey-blake2 for a bit more speed there. Also make sure files cache, chunks index and borg fit into RAM. RAM usage / swap activity can be checked with atop or similar tool while borg is running. |
Beta Was this translation helpful? Give feedback.
-
@lddubeau You shouldn't have any issues with your cpus. Definitely keep compression and deduplication enabled. The first backup might be slow, but after that things should be much faster. You could try initially backing up subsets of your source. Once they are done, a full backup should be fast, and you can delete the partial backups. I don't think you mentioned how big your source is... |
Beta Was this translation helpful? Give feedback.
-
Ok, the plot deepens... I might be dealing with wonky equipment??? This morning I forgot that I should be checking on my backup. I got up, made coffee, got my breakfast, watched last night's Colbert show, YouTube, what-have-you and I had zero problems with the streaming. Then I remembered I had started a backup and went to my laptop to see if it was running and it was. It is still running as we speak. I'm baffled. All my previous backups would be killed by me in the morning because I couldn't watch TV. As soon as I killed the backup, I could stream. At first, I was not using The only things that changed between my previous test, and the backup I have running now:
In other news:
I'm also concerned about what my wife would do if I were to become incapacitated or die. It is more than theoretical. I'm doing pretty well for the moment, but I was diagnosed with a primary CNS lymphoma and almost died from it. Again, I'm fine now but relapse is possible. My current setup is held together by snot and seaweed. I suspect a Ubiquiti setup would be more likely to stay up and running for longer if I were to kick the bucket, and would be more easily manageable by someone else. (Like my stepson, who is a software engineer like I am.) |
Beta Was this translation helpful? Give feedback.
-
Success! But I still need to fine tune things. I'll start with questions:
And now the results of my first backups... I was using
[Sensitive information changed to protect the innocent. 🤣] Last night the first backup after the initial backup ran. I think this is the interesting portion of
Almost 5 hours is slow but I still was using the rate limit flag. I think I'll drop it tonight and see how fast it goes. (Also, there's going to be less modifications because it will just have one day of modifications instead of 2-3 days.) |
Beta Was this translation helpful? Give feedback.
-
@lddubeau With just one day's data and no rate limit, you should be fine having it complete during the night, so there will be no need to dynamically change the rate limit or get fancy with your network. Note that 8500000/500/60/60 = 4.72 hours = 4 hours 43.2 minutes, so the rate limit you set is the only reason that it took almost five hours to upload 8.5GB. |
Beta Was this translation helpful? Give feedback.
-
Here the latest backup that ran last night after I went to bed. This was run without rate limit. About 4 minutes and a half.
It should be even faster tonight because I did not do anything today. I got a 4th shot of vaccine yesterday and I don't feel great today. (Immunocompromised folks get more boosters than the general population.) I guess that's great because it means my immune system responded robustly to the vaccine. 🥳 |
Beta Was this translation helpful? Give feedback.
-
Added some material from this discussion to our FAQ. Let me know if there is anything important that's still missing with regards to performance. |
Beta Was this translation helpful? Give feedback.
-
Merged: https://borgbackup.readthedocs.io/en/master/faq.html#what-s-the-expected-backup-performance |
Beta Was this translation helpful? Give feedback.
-
Since at least 2008 I've been running a very idiosyncratic script to perform backups from my laptop to a NAS in my home. The backups are encrypted but I do not have deduplication or compression. It is essentially the
rsync
with hard links method that some were promoting a few years back. Unfortunately, this script is super idiosyncratic. I translated it from Python 2 to 3 essentially by running a tool on it a while back and answering y/n to its questions. No intelligence went into the conversion. I'm still using it, but it would need a rewrite. At any rate backing up an entire home used to take about 6 hours.Enter borg who does just about everything I want, and has deduplication and compression. So far so good, but I ran tests that got me super concerned about performance. I was never able to get it to do a backup in 6 hours. After 6 hours I have a small portion of my home backed up. At first I was using borgmatic but I removed it and now just use my own sh scripts to run borg. I then turned off compression entirely. My own backup script that I mention above did not use compression. Still too slow. Is my next step to turn off the deduplication algorithm??? I've seen
--chunker-param
but I don't know if that's what I should use to turn off deduplication. I did not see a flag that would just be an off switch for deduplication. Note that I could live with borg without compression and deduplication, because my script did not have these features. But I'm not sure how to turn off deduplication and whether it is going to make much of a difference.At first I thought borg was just freezing on a file but after I turned off compression I find that it stops at another file when I kill it. So I do not think it freezes.
I'd very much to put my backup script to rest, but the current performance of borg gives me pause. And I truly cannot say that my comparison is unfair. In both cases (my script and borg) the backup is done over WiFi, and no I don't plan to run cable from the basement to the 2nd floor in a house that wasn't built for a network.
ETA: Oops. I made a mistake above. My script did have deduplication between backups but not inside a backup. If you did two backups in a row and nothing changed in your files, the new backup would take a minimal amount of new disk space but most of the backup would be a bunch of hard links to the previous backup. However, if you copied file
/home/foo/a
a bunch of times and backed all of that up, it would not detect that duplication and do anything about it. I believe borg backup detects this kind of duplication.Beta Was this translation helpful? Give feedback.
All reactions