-
-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read relative exclude paths from files within backup source #641
Comments
#937 suggested supporting .gitignore which is somehow the same basic idea. |
It might be good to think about creating some plug points in the file system scanner code, such that users can roll their own "exclude this, too" code if the base exclusions aren't enough. Either that, or implement reading the list of paths to process from a file, and letting users implement whatever inclusion/exclusion strategy they want out entirely of process. The downsides of the latter are the increased chance of stale data for users who don't use snapshots, and blowing the FS cache (reading some metadata to generate the file list, but then not reading the rest of the metadata and data in one pass). |
--files-from is #841 |
Is there currently a way to specify paths in an |
@guedressel don't think so. |
Would this be difficult to add? Any pointers on how to get started? I am also in a situation where I want to back up the home folders of all the users on our storage cluster, and want the users to be in control of what files get backed up. |
@ostrokach maybe have a look at the "--pattern" changes in master branch. |
As
Now include that pattern set into the runtime config. You should be fine. I've made something similar in bash script that runs before borg and generates exclude list. But I use |
There really should be a way to specify the filtering via .gitignore ... |
How about this: we add two options, |
Note that borgmatic does support specifying |
By the way: I use pathspec for a small script of mine. Works fine! |
Regarding my suggestion above: It would change how tagged files work. Instead of saying "hey, I'm tagged, exclude me", This change is backward-compatible because the old marker files become marker files that ignore everything in that directory. (It would even make the option to keep the marker files themselves obsolete). But this feature could be implemented completely independent of the other as well. |
I'ld rather not call a script. borg often runs as root and calling external scripts can be a security issue. |
Maybe "calling a script" is a bad way to phrase it. What I mean with it is to have the possibility to call an external command that does the job like with |
Maybe I misunderstood your suggestion. Calling one specific, admin-configured script is not a problem usually (as the admin is responsible for having safe permissions on that), but if we would discover such scripts on the fs like we do with the exclude tags, that might easily become a security issue. |
Say I set
to standard out which then will be ignored. (Note that everything here is just an idea and I'm absolutely open on the details of the implementation) |
I am going to abandon this feature for now. Since I do not have the brain power to process borg's core backup code yet to add a new feature, I will hack together a solution using a preprocessor that generates a custom exclusion file by walking the file tree before calling borg. If someone wants to implement this feature, I will be happy to help as much as I can. |
If anyone's interested: I've written a small script to exclude gitignored files: #!/bin/bash
# Arguments: a path to check for
# Output: all ignored files and folders in all git repositories in the input folder as borg ignore pattern
# Iterate through all directories that contain a .git folder.
# Warning: This will result into invalid patterns if the folder is not a valid git repository (grep fatal to find them out)
for p in $(find $1 -name ".git" | xargs dirname)
do
# Keep the last folder in mind to skip redundant subfolder exclusions
LASTFOLDER="$p/.foldernamethatwonteverexist/"
# Loop list all files of the current repository and ask git if they are ignored
tree -f -i -x --noreport $p | git -C $p check-ignore --stdin | while read -r q
do
# Skip folders that are subfolders of the last skipped folder; print the final result to stdout
if [[ $q == $LASTFOLDER* ]]; then
continue
elif [[ -d $q ]]; then
LASTFOLDER=$q
echo "pp:$q/"
else
echo "pf:$q"
fi
done
done Given a path as argument, it will recursively search for git projects in it. It will then list all files in those git projects and filter them if they are not gitignored. The remaining paths are processed to a borg exclude file written to stdout. On my Documents folder, it takes only three seconds to run, which is acceptable for me. |
Why would i want to back up my dependencies in my git repos? This is pretty gross that it doesnt work. Does nobody from borg use javascript / have a node_modules folder? |
Sorry to bump this issue, but I came here by searching for something like borg ignore directories by name and I do have to handle lots of same (unique) name directories (like the mentioned node_modules) that I want to exclude from my backup. tree backup_test/
backup_test/
├── node_modules
│ └── test_in_node_modules.txt
├── subdir
│ ├── node_modules
│ │ └── test_in_node_modules.txt
│ ├── subsubdir
│ │ ├── node_modules
│ │ │ └── test_in_node_modules.txt
│ │ └── subsub_test.txt
│ └── sub_test.txt
└── test.txt By using the create command with the following exclude option I was able to exclude all node_modules directories: Backup lists as follows: borg list borgrepo::1
drwxr-xr-x user users 0 Sun, 2019-10-20 22:27:24 backup_test
drwxr-xr-x user users 0 Sun, 2019-10-20 22:27:32 backup_test/subdir
drwxr-xr-x user users 0 Sun, 2019-10-20 22:27:46 backup_test/subdir/subsubdir
-rw-r--r-- user users 0 Sun, 2019-10-20 22:27:46 backup_test/subdir/subsubdir/subsub_test.txt
-rw-r--r-- user users 0 Sun, 2019-10-20 22:27:32 backup_test/subdir/sub_test.txt
-rw-r--r-- user users 0 Sun, 2019-10-20 22:27:24 backup_test/test.txt I'm not sure if I'm missing something, but I think for my primitive use case that seems to be enough.
|
I came up with a few lines of bash in my backup script that does just that, if anyone is interested : find /home/user/Workspace -type f -name ".gitignore" -printf "%h\n" | \
xargs -I '{}' bash -c "egrep -v '^(\s*|#.*)$' \"{}/.gitignore\" | awk '{print \"{}/\" \$0}' " \
> /tmp/exclude-backup
borg create [...] \
/home/user \
--exclude-from /tmp/exclude-backup \ This creates a file at |
@biocrypto730: It isn't safe to exclude all folders named
If Borg has to be specifically told to honor version control ignore files, and the documentation specifically warns not to use that option if you semi-install things as described in item 4, then it's safe to do that. But it's not safe as a default behavior. It should always be safe to read exclude paths from files within the backup source, if the file is named something like |
I current back up my home machine with
rsync
and make use of its-F
option.Combined with the file
~/.rsync-filter
, with the following contents:This has several advantages over Borg's existing features for specifying exclusion paths:
CACHEDIR.TAG
files throughout my home directory, becausewhen I clear out a cache directory by removing it, I don't have to remember to recreate it and itsCACHEDIR.TAG
file./mnt/bsnap/home
, rather than/home
directly.The text was updated successfully, but these errors were encountered: