Python script for moving files on a cached disk to a backing mergerFS disk pool.
More information in this blog post:
https://blog.muffn.io/posts/part-4-100tb-mini-nas/ (if that link doesn't work it's not released yet.)
The script operates by checking the disk usage of the cache directory. If the usage is above the threshold percentage defined in the configuration file (config.yml
), it will move the oldest files out to the backing storage location until the usage is below a defined target percentage. Empty directories are also cleaned up after files are moved.
The script uses a configuration file to manage settings such as paths, thresholds, and system parameters. It also checks for other instances of itself to prevent multiple concurrent operations, in the event a move process is still occurring from a previous run either because you are using slow storage, running the script too regularly, or both.
The script logs its operations, which includes information on moved files, errors, and other warnings. The logs are rotated based on the file size and backup count defined in config.yml.
- Python 3.6 or higher
- PyYAML (to be installed from
requirements.txt
)
- To get started, clone the repository to your local machine using the following command:
git clone https://github.com/MonsterMuffin/mergerfs-cache-mover.git
- Install the required Python package using pip:
pip install -r requirements.txt
Copy config.example.yml
to config.yml
and set up your config.yml
with the appropriate values:
CACHE_PATH
: The path to your cache directory. !!THIS IS YOUR CACHE DISK ROOT, NOT MERGERFS CACHE MOUNT!!BACKING_PATH
: The path to the backing storage where files will be moved.LOG_PATH
: The path for the log file generated by the script.THRESHOLD_PERCENTAGE
: The usage percentage of the cache directory that triggers the file-moving process.TARGET_PERCENTAGE
: The target usage percentage to achieve after moving files.MAX_WORKERS
: The maximum number of parallel file-moving operations.MAX_LOG_SIZE_MB
: The maximum size for the log file before it's rotated.BACKUP_COUNT
: The number of backup log files to maintain.USER
: The username that should have ownership of the files.GROUP
: The group that should have ownership of the files.FILE_CHMOD
: The permissions to set for the specified user/group on all files moved. This value should be provided as a string (e.g., '770').DIR_CHMOD
: The permissions to set for the specified user/group on all directories created during the move process. This value should be provided as a string (e.g., '770').
To run the script, use the following command from your terminal:
python3 cache-mover.py --console-log
Of course, this is meant to be run automatically....
Use either a Systemd timer or Crontab entry. I have been moving from crontab to systemd timers myself, but you live your life how you see fit.
- Create a systemd service file
/etc/systemd/system/cache_mover.service
. Change/path/to/cache-mover.py
to where you downloaded the script, obviously.
[Unit]
Description="Cache Mover Script."
[Service]
Type=oneshot
ExecStart=/usr/bin/python3 /path/to/cache-mover.py
- Create a systemd timer file
/etc/systemd/system/cache_mover.timer
. The timer format is not the usual crontab format, find out more if you need help.
[Unit]
Description="Runs Cache Mover Script Daily at 3AM."
[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true
[Install]
WantedBy=timers.target
- Enable and start the timer:
systemctl enable cache_mover.timer
systemctl start cache_mover.timer
- Check timer status:
systemctl list-timers
- Open crontab file for editing:
sudo crontab -e
- Add line to run script. The following example will run the script daily, at 3AM. You can adjust this by using a site such as crontab.guru.
Change /path/to/cache-mover.py
to where you downloaded the script, obviously.
0 3 * * * /usr/bin/python3 /path/to/cache-mover.py
This has been working well for me, but always take care.