-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Remote executors
Note: This is documentation for an experimental feature which is under active development, it should not be used in production environments.
dvc machine
provides a set of DVC commands for provisioning and managing remote machines which will eventually be used for executing DVC experiments.
Currently dvc machine
implementation utilizes https://github.com/iterative/terraform-provider-iterative and requires the terraform client be installed and available in your PATH.
-
(Optional) Download & install terraform client for your platform
-
(Optional) Install latest tpi from
master
(pip install -e
) -
Install DVC deps (preferably using
pip install -e
frommaster
:pip install dvc[terraform]
- This will install tpi from pypi if you did not already install it from source
Note: If you do not install a terraform client yourself, it will be downloaded and installed for you (via tpi)
- Enable the
dvc machine
feature (either per-repo or globally):
dvc config [--global] feature.machine true
Machines are configured similarly to DVC remotes, and configuration usage generally mirrors dvc remote add/modify/remove
.
-
dvc machine add
- adds a machine to your repo configuration (note that no machine instance will actually be created untildvc machine create
is run). -
dvc machine modify
- modify the configuration for an existing machine. For a full list of available options, refer to the documentation for https://github.com/iterative/terraform-provider-iterative#machine -
dvc machine list
- List the configuration of one/all machines. -
dvc machine remove
- removes a machine from your repo configuration (note that any running machine instances should be destroyed withdvc machine destroy
before removing the machine from your repo configuration. -
dvc machine rename
- Rename a machine to a new name, will also affect the instances related to this machine.
-
dvc machine create
- create and start an instance of a configured machine. -
dvc machine status
- List the running status of the instances from one specified or all machines. -
dvc machine destroy
- stop and destroy a previously created machine instance. -
dvc machine ssh
- connect to a machine via SSH.- Your default
ssh
client will be used if available in your PATH. - Otherwise a limited functionality client session will be provided via
asyncssh
- Note that interactive programs (particularly line editors likevi
) may not work as expected when run in this shell session.
- Your default
- Very basic exp execution can be done over SSH via
dvc exp run --machine <machine_name>
(see also: https://github.com/iterative/dvc/pull/7173). - Runtime execution environment for the remote machine can be configured via the
setup_script
machine configuration option.-
setup_script
should be a shell script, and will be sourced from the root of the user's Git repository prior to running an experiment (i.e. it is sourced before executingdvc exp run
). - Note that this is separate from the
startup_script
terraform configuration, which is executed at boot time and meant for installing system packages.
-
- Detached/unattended execution is not currently supported, killing or interrupting the
dvc exp run --machine
command will also terminate the exp execution on the remote machine. - Also note that the default
iterative-machine
image uses Ubuntu 18.04 and Python 3.6 as the system python, which is not supported in DVC. The defaultstartup_script
also installs DVC from the latest.deb
package, which will not include the latest changes/fixes related todvc machine
anddvc exp run --machine
. It is recommended to override the defaultstartup_script
to install a more recent Python and to install DVC from source, rather than the.deb
package.- Overridden startup scripts should end by generating the file
/var/log/dvc-machine-init.log
(it can be an empty file). This is used by DVC as a signal that the startup script has completed execution (sinceiterative-machine
does not provide a built-in way to do this).
- Overridden startup scripts should end by generating the file
Example .dvc/config
:
['machine "aws-test"']
cloud = aws
startup_script = ../startup.sh
setup_script = ../env-setup.sh
Example startup.sh
(run at machine boot time):
#!/bin/bash
# Install latest python3.9 + pip from deadsnakes PPA
# NOTE: deadsnakes PPA python requires debian/ubuntu system python3-pip (rather than separate PPA python3.x-pip)
sudo add-apt-repository --yes ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install --yes python3.9 python3.9-dev python3.9-venv python3-pip
sudo -u ubuntu python3.9 -m pip install --upgrade pip --user
sudo -u ubuntu python3.9 -m pip install --upgrade setuptools --user
# Install DVC from source
sudo -u ubuntu python3.9 -m pip install "git+https://github.com/iterative/dvc.git#egg=dvc[all]" --user
# Write signal/log file
sudo echo "OK" > /var/log/dvc-machine-init.log
Example env-setup.sh
(sourced at exp runtime):
#!/bin/bash
python3.9 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r src/requirements.txt
To run on remote machine:
$ dvc machine create aws-test
$ dvc exp run --machine aws-test
$ dvc machine destroy aws-test