Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to create multiple backup.tar.gz based on the input array of directories #29

Closed
larskruckow opened this issue Oct 23, 2021 · 10 comments
Labels
enhancement New feature or request

Comments

@larskruckow
Copy link

As far as I can tel, it works by taking everything in /backup/* and adding that to a single tar.gz file.
My use case is as follows:
I run this in docker-compose separately from the running containers.
All my containers data dirs are located in lets say: /docker-volumes/app1 /docker-volumes/app2 etc.
I have grouped them by schedule for backups, eg. nightly, weekly etc. and have an instance of docker-volume-backup running per schedule, mapping the relevant volumes to them.
I've gotten the label feature to work brilliantly stopping all the "nightly" ones at night, backing them up, then starting them back up.
I can't seem to figure out an easy way to have this do the backups "one by one".
What I would like to see is an option to have every dir in /backup/* be treated as a single backup so that I could have
App1, App2, App3 in separate tar.gz files instead of a single tar.gz with App1, App2, App3 inside.
Is the only alternative to run multiple of these containers?

Thanks for a great little container!

@larskruckow
Copy link
Author

The reason I'm avoiding multiple sidecars is the logistics in scheduling the backups as I don't think it's too good an idea to have multiple sidecars backing up at the same time

@m90
Copy link
Member

m90 commented Oct 23, 2021

I personally run a setup where we have a host running 7 applications that are getting backed up by 7 distinct container running on 7 distinct schedules (all of them are daily so it's pretty arbitrary), which is probably the reason things are working as they are right now.


Your proposal however makes sense. I think it could be implemented in a way that if you set BACKUP_SOURCES=/backup/app1,/backup/app2 you'd get two archive files (one per item in the given list) instead of one. The default behavior would still be to backup all of /backup.

BACKUP_SOURCES is already there:

BackupSources string `split_words:"true" default:"/backup"`
so all it would need is making it a comma separated list instead of a string and also make the file that is handled a list of files instead.

Would this API make sense to you?


The downside in this scenario would be that this means all apps are down while the archives are being created and you have no more granularity in starting / stopping containers while their data is being backed up. This is the reason I ended up using the solution as described in the beginning btw: one of my volumes is so big that it takes ~2 minutes to create the archive and I didn't want all of my services being down during that time.

@m90 m90 added the enhancement New feature or request label Oct 23, 2021
@larskruckow
Copy link
Author

Yeah you have a point there, some of my backups are in the 10s of Gs so that would take a while. And as you say the rest of my "daily" group would be down while this was going on.
What I am really looking for is a way to:
Take app1 down
Backup App1
Bring app1 up
Take app2 down
Backup App2
Bring app2 up
All from the same starting schedule eg. at 1AM

Not sure if it's possible to combine your suggestion with multiple matching labels?
eg. BACKUP_SOURCES=/backup/app1,/backup/app2 and docker-volume-backup.stop-during-backup=app1,app2
But maybe that is a bit too frail

@larskruckow
Copy link
Author

But otherwise for smaller not so important containers your suggestion makes sense and I could definitely use that functionality.
I run 12 "production" containers and 15-20 test/containers that are not required to be online at night so they could all be off for 8 hours if that's what it would take to back them up.

@m90
Copy link
Member

m90 commented Oct 23, 2021

Considering you can already run multiple containers in parallel in case you need granularity on stopping I think coming up with a rather brittle/confusing API that conflates backup sources and stop labels isn't really a good option here.

Adding support for multiple sources is a good idea though which I'd like to add as soon as I find the time to do so.

If you (or someone else) would like to work on this, let me know and I'm happy to help with getting this merged.

@m90 m90 added the help wanted Extra attention is needed label Oct 23, 2021
@m90
Copy link
Member

m90 commented Oct 28, 2021

I was thinking about this further and there is another API challenge to solve when implementing this:

Backing up multiple sources into multiple archives means we also need to define a way of specifying multiple target filenames. Right now consumers set BACKUP_FILENAME="backup-%Y-%m-%dT%H-%M-%S.tar.gz" but this won't work when there are multiple files being saved.

I can see two options right now:

Making BACKUP_FILENAME a list of templates

This list would simply map to the sources defined in BACKUP_SOURCES. This is relatively easy to implement but might be complicated to configure as consumers now need to keep those two configuration values in sync. This also raises the question what the script should be doing when it encounters lists of different lengths (probably bail).

Interpolating the source into the BACKUP_FILENAME

BACKUP_FILENAME could start interpolating a special token of sth like {SOURCE} so that consumers would define BACKUP_FILENAME="backup-{SOURCE}-%Y-%m-%dT%H-%M-%S.tar.gz" and the token would be replaced with a (cleaned and filename safe version) of the source. Open questions would be:

  • can we guarantee that those source names are unique when making the safe to use in a filename
  • what happens when {SOURCE} is not given in the filename

@larskruckow
Copy link
Author

I think this again makes the complexity a bit too high, I quite like how simple it is to use this image.
What about a config file mounted to the container for "advanced" stuff like this, then potentially this issue and other could more easily be handled without necessarily compromising the simplicity of the environment args.
Either way I got it solved by spreading out my backup schedule a bit, and actually running some at the same time by choosing different target drives to backup up to (then the write performance is fine)

@m90
Copy link
Member

m90 commented Dec 5, 2021

After moving this around in the back of my head for a little longer, I think this is how it could work:

  • consumers can mount an arbitrary number of configuration files into the container's /etc/backup.d directory (e.g. /etc/backup.d/10app.env and /etc/backup.d/20database.env)
  • the entrypoint script checks whether that directory exists
  • in case yes, it creates a crontab entry for each of the files and tells the backup binary to read configuration values from the file in addition to the defined environment variables
  • in case no, a cron will be created from environment variables (i.e. the current behavior)

Open questions:

  • this means that when configuration files change, the container will need to be restarted. Is that common practice or possibly unexpected behavior?

@m90 m90 mentioned this issue Dec 30, 2021
3 tasks
@rpatel3001
Copy link
Contributor

Something like https://github.com/fsnotify/fsnotify could trigger a config reload.

@m90 m90 removed the help wanted Extra attention is needed label Mar 4, 2022
@m90
Copy link
Member

m90 commented Mar 4, 2022

This is now possible as of v2.14.0. While I already had it working I did not implement inotify for file changes as it would require adding the openrc package to the container, adding slightly more than 2MB to the image size. In case this turns out to be a problem / requested feature for users, it can always be added in a later step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants