-
Notifications
You must be signed in to change notification settings - Fork 279
Archiving
Many of us have multiple drives that we want to fill with plots.
If we're lucky we are even adding to the pile to keep our plotters busy.
The dst
drive must be picked hours ahead of time when the chia plot process is launched.
The more plotting activity and drive swapping there is the harder it becomes to predict which plot process should target which drive.
When the prediction is wrong you end up having to manually intervene and move plots to unjam the stuck plotting processes.
Plotman's archiving operates on completed plots to avoid the need to predict the future.
The final drives are selected to make sure they all get filled.
Specific plots are chosen in an effort to use more of the available bandwidth of the dst
drives.
Configuration is provided to significantly avoid IO contention at the receiving end of the transfer as well.
Most users will want to either configure archiving or provide their own plot distribution mechanism.
Regardless of whether the dst
paths are dedicated drives or configured to be the same as the tmp drives, they are used as a buffer between plot creation and archiving.
In the past, archiving required use of rsyncd and ssh which made it cumbersome to use with local plot storage drives. That mode is still available and well suited for remote storage, but you can also setup local archiving in as little as four lines of YAML. Usually, the only required external setup is mounting the drives in the expacted manner and having an rsync client installed.
Archiving is configured by selecting a target definition and specifying the parameters it requires. Builtin target definitions are provided for the common rsyncd target for remote archiving as well as a new local rsync target. For local rsync, the only required parameter is a path to a directory that contains the mount points of the plot storage drives. You can also write your own target definition in your configuration file if you want to adjust one of the builtins or develop your entire own transfer mechanism. Each target definition is composed of two activities. Drive identification and the actual file transfer. Each activity is defined by a script which can be written in a language of your choice.
Instructions and comments are based on a standard Ubuntu Server installation. Commands and file locations may differ for other Linux distributions.
We will start with the simple setup first. This may be used for any locally accessible path including directly mounted internal or external drives as well as network mounts via nfs, smb, or other means. Network mounts are not explicitly recommended, but mentioned for completeness.
archiving:
target: local_rsync
env:
site_root: /mnt/farm
This selects the builtin target definition named local_rsync
and configures its site_root
parameter to refer to /mnt/farm
.
In this configuration, candidate drives would be mounted inside the specified directory such as at /mnt/farm/plots1
and /mnt/farm/plots2
.
Since all drives in that directory are considered, it is generally best to not use /mnt
itself or /media/username
since they are general use mount points that will often contain other mounted drives.
rsync will be used to transfer the completed plots to their final resting places using local paths.
No rsyncd server is required and any configured will not be used.
Setting up remote archiving is a bit more involved. While the configuration file is different, as is the code backing that, the overall functionality is the same as was historically provided by plotman. Except more configurable. The details of rsyncd and ssh setup will be covered later. Here is a basic setup.
archiving:
target: rsyncd
env:
site_root: /mnt/farm
user: username
host: plot.storage.ip
rsync_port: 12000
site: sites
This selects the builtin target definition named rsyncd
.
ssh will be used to connect to plot.storage.ip
as the user username
to check which drives are mounted inside /mnt/farm
and how much space they have available.
Once a plot is available and a target drive has been selected, rsync will be used to connect to rsyncd on the remote system to transfer the plot.
You can define your own target definitions in your plotman.yaml
configuration file.
You can write the two required scripts either inline in the configuration or reference external scripts you maintain separately.
You could duplicate the builtin local_rsync
target definition as follows.
This is meant to be exemplary only.
Presumably you would only do this if you were going to modify it in some way.
You can define multiple target definitions though presently you can only select and use one at a time.
archiving:
target: my_target
env:
site_root: /mnt/farm
target_definitions:
my_target:
env:
command: rsync
options: --preallocate --remove-source-files --skip-compress plot --whole-file
site_root: null
disk_space_script: |
#!/bin/bash
df -BK | grep " ${site_root}/" | awk '{ gsub(/K$/,"",$4); print $6 ":" $4*1024 }'
transfer_script: |
#!/bin/bash
"${command}" ${options} "${source}" "${destination}"
transfer_process_name: "{command}"
transfer_process_argument_prefix: "{site_root}"
The my_target:
env:
section defines parameters that will be made available to the scripts as environment variables.
You either provide a default string value or specify null
to make the parameter mandatory.
For example, the site_root
is a thing we cannot make any sensible guess for.
The user must specify it in the archiving:
env:
section.
The output of the disk space script must have the form of a single line per disk with the path and available byte count separated by a colon. If you are writing your own custom disk space script you can select any directories any way you want to.
/mnt/farm/plots1:94148112384
/mnt/farm/plots2:39723638784
The transfer script is provided two extra environment variables.
source
will be an absolute path to the plot that needs to be transferred.
destination
will be one of the paths reported by the disk space script.
/mnt/farm/plots2
for example.
transfer_process_name
is used as the first filter when discovering existing archive transfer processes.
It should be written as a Python format string.
Names to be interpolated will match the environment variables defined as parameters.
transfer_process_argument_prefix
is the second filter.
We will scan the arguments of any process matching transfer_process_name
to see if any arguments start with the specified prefix.
If both requirements are satisfied, we consider that an active archival transfer.
If you prefer, you can maintain the scripts as separate files and specify their paths such as follows.
archiving:
target: my_target
env:
site_root: /mnt/farm
target_definitions:
my_target:
env:
command: rsync
options: --preallocate --remove-source-files --skip-compress plot --whole-file
site_root: null
disk_space_path: /some/where/disk_space
transfer_path: /some/where/transfer
transfer_process_name: "{command}"
transfer_process_argument_prefix: "{site_root}"
There are two main pieces to plotting. Creating the plots and getting them to where you want them to be farmed. In some cases these will both be on the same machine, in other cases there will be one or more dedicated plotters with a separate farmer. We will start by setting up the Plot Storage and then configure everything on the Plotter.
On your Plot Storage
machine, make sure that all the storage drives are mounted, rsync daemon is running and SSH is set up to accept incoming connections from your Plotter.
Archiving expects there to be a directory that contains the mounts for the drives you want to archive to and no other drives mounted there.
/mnt
itself is unlikely to be a good choice since it is a standard place to mount anything.
In this tutorial the following mount points will be used:
/mnt/farm/plots1
/mnt/farm/plots2
If you are using the rsyncd
target definition described above, or a similar ssh/rsyncd custom setup, then you will need to configure and run the rsync daemon on the Plot Storage system.
-
Install rsync (Ubuntu Server already comes with it installed)
-
Create
/etc/rsyncd.conf
:
lock file = /var/run/rsync.lock
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
# don't change the port, plotman (as of version 0.2) has the port hard coded
port = 12000
# rsync module name
[chia]
# Path with your mounted drives
path = /mnt/farm
comment = Chia
# Use the username that you log into Ubuntu with or create a new one
uid = username
# User group (by default same as username)
gid = username
read only = no
list = yes
# dont uncomment this,
#auth users = none
# plotman does not work with authentication
#secrets file = none
# since we dont use auth only accept connections from plotter's ip
hosts allow = plotter.ip.address
-
Start rsync daemon by typing
sudo systemctl start rsync
-
If you automatically want the daemon to start after a reboot type:
sudo systemctl enable rsync
The path
variable represents a path where all your storage drives are mount.
So in our example in /mnt/farm
we have two drives mounted, namely plots1
and plots2
.
If you only have a single drive mounted for archiving (e.g. /media/username/plots
) your path should not point to /media/username/plots
, but rather to /media/username
.
Note that you should still consider that any other mount points in that directory will be considered for archiving.
This should generally drive you to use explicit mount points instead of any default locations.
Make sure you configure ssh in a way that you can connect from your Plotter without having to use a password or keyfile passphrase.
The best way to do this is to create a ssh-key without a passphrase on the Plotter and copy the public key to your Plot Storage.
Update the archive section of your plotman.yaml
(Default Configuration File) file.
If the Plotter and the Plot storage are on the same machine then you can use rsyncd_host: localhost
.
In our example the config would look as follows:
archive:
rsyncd_module: chia # Module name specified in the Plot Storage's rsyncd.conf
rsyncd_path: /mnt/farm # Path where your storage drives are mounted (same as in rsyncd.conf)
rsyncd_bwlimit: 100000 # Bandwidth limit in KB/s
rsyncd_host: plot.storage.ip # IP address or hostname of your Plot Storage, localhost if local
rsyncd_user: username # Username that can ssh into your Plot Storage
Before starting plotman
you should make sure that both SSH and rsync is set up correctly.
If you can't successfully run the tests below, plotman's archiving will not work.
The machines should be set up in such a way that you can SSH from your Plotter to your Plot Storage without having to enter a password. In order to do this you should use a ssh public/private keypair that doesn't require entering a passphrase.
To test if you set this up correctly type the following command on your Plotter: ssh username@plot.storage.ip df -aBK | grep /mnt/farm/
The command above should give you a list of all the mounted drives of your Plot Storage. If it doesn't, or if it asks for a password or a passphrase then SSH is not set up as required by plotman and archiving will not work.
To tunnel rsync through ssh, rsync should have -e ssh
or --rsh ssh
in the rsync options:
rsync -Pe ssh testfile.test rsync://username@plot.storage.ip:12000/chia/plots1
rsync -P --rsh ssh testfile.test rsync://username@plot.storage.ip:12000/chia/plots1
- Create a testfile on your Plotter using
echo "testing" > testfile.test
- Enter the following on your Plotter:
rsync -P testfile.test rsync://username@plot.storage.ip:12000/chia/plots1
- Check your Plot Storage and make sure testfile.test exists in
/mnt/farm/plots1
If both of the tests above pass but archiving still doesn't work, you can look at the rsync output in your console.
- Start
plotman interactive
- Locate the rsync line in the Log section at the bottom of the screen, e.g.:
05-03 08:37:46 Starting archive: rsync --bwlimit=80000 --remove-source-files -P /mnt/dst1/plot-k32-2021-05-03-01-50-b4271f88a74b36b516c242151e00fdda20e3f31ce1f8624465bf05a195009ecd.plot rsync://username@plot.storage.ip:12000/chia/plots2
- Copy the part after
05-03 08:37:46 Starting archive:
. In our example that would be:
rsync --bwlimit=80000 --remove-source-files -P /mnt/plots1/plot-k32-2021-05-03-01-50-b4271f88a74b36b516c242151e00fdda20e3f31ce1f8624465bf05a195009ecd.plot rsync://username@plot.storage.ip:12000/chia/plots2
- Run the command in your terminal and use the output for finding any errors you may have in your configuration.
In order to disable archiving, completely comment out the corresponding archive:
section in your .config/plotman/plotman.yaml
.
Users should either use archiving or provide their own plot distribution mechanism.
The dst
directories are not intended to be the final storage location for plots.