Skip to content

Remote executors

Peter Rowlands (변기호) edited this page Jan 13, 2022 · 12 revisions

Machine management

Note: This is documentation for an experimental feature which is under active development, it should not be used in production environments.

dvc machine provides a set of DVC commands for provisioning and managing remote machines which will eventually be used for executing DVC experiments.

Currently dvc machine implementation utilizes https://github.com/iterative/terraform-provider-iterative and requires the terraform client be installed and available in your PATH.

Installation/Configuration

  • (Optional) Download & install terraform client for your platform

  • (Optional) Install latest tpi from master (pip install -e)

  • Install DVC deps (preferably using pip install -e from master:

    pip install dvc[terraform]
    
    • This will install tpi from pypi if you did not already install it from source

Note: If you do not install a terraform client yourself, it will be downloaded and installed for you (via tpi)

  • Enable the dvc machine feature (either per-repo or globally):
dvc config [--global] feature.machine true

Machine configuration

Machines are configured similarly to DVC remotes, and configuration usage generally mirrors dvc remote add/modify/remove.

  • dvc machine add - adds a machine to your repo configuration (note that no machine instance will actually be created until dvc machine create is run).
  • dvc machine modify - modify the configuration for an existing machine. For a full list of available options, refer to the documentation for https://github.com/iterative/terraform-provider-iterative#machine
  • dvc machine list - List the configuration of one/all machines.
  • dvc machine remove - removes a machine from your repo configuration (note that any running machine instances should be destroyed with dvc machine destroy before removing the machine from your repo configuration.
  • dvc machine rename - Rename a machine to a new name, will also affect the instances related to this machine.

Instance management

  • dvc machine create - create and start an instance of a configured machine.
  • dvc machine status - List the running status of the instances from one specified or all machines.
  • dvc machine destroy - stop and destroy a previously created machine instance.
  • dvc machine ssh - connect to a machine via SSH.
    • Your default ssh client will be used if available in your PATH.
    • Otherwise a limited functionality client session will be provided via asyncssh - Note that interactive programs (particularly line editors like vi) may not work as expected when run in this shell session.

Remote experiment execution

  • Very basic exp execution can be done over SSH via dvc exp run --machine <machine_name> (see also: https://github.com/iterative/dvc/pull/7173).
  • Runtime execution environment for the remote machine can be configured via the setup_script machine configuration option.
    • setup_script should be a shell script, and will be sourced from the root of the user's Git repository prior to running an experiment (i.e. it is sourced before executing dvc exp run).
    • Note that this is separate from the startup_script terraform configuration, which is executed at boot time and meant for installing system packages.
  • Detached/unattended execution is not currently supported, killing or interrupting the dvc exp run --machine command will also terminate the exp execution on the remote machine.
  • Also note that the default iterative-machine image uses Ubuntu 18.04 and Python 3.6 as the system python, which is not supported in DVC. The default startup_script also installs DVC from the latest .deb package, which will not include the latest changes/fixes related to dvc machine and dvc exp run --machine. It is recommended to override the default startup_script to install a more recent Python and to install DVC from source, rather than the .deb package.
    • Overridden startup scripts should end by generating the file /var/log/dvc-machine-init.log (it can be an empty file). This is used by DVC as a signal that the startup script has completed execution (since iterative-machine does not provide a built-in way to do this).

Example .dvc/config:

['machine "aws-test"']
    cloud = aws
    startup_script = ../startup.sh
    setup_script = ../env-setup.sh

Example startup.sh (run at machine boot time):

#!/bin/bash
# Install latest python3.9 + pip from deadsnakes PPA
# NOTE: deadsnakes PPA python requires debian/ubuntu system python3-pip (rather than separate PPA python3.x-pip)
sudo add-apt-repository --yes ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install --yes python3.9 python3.9-dev python3.9-venv python3-pip
sudo -u ubuntu python3.9 -m pip install --upgrade pip --user
sudo -u ubuntu python3.9 -m pip install --upgrade setuptools --user

# Install DVC from source
sudo -u ubuntu python3.9 -m pip install "git+https://github.com/iterative/dvc.git#egg=dvc[all]" --user

# Write signal/log file
sudo echo "OK" > /var/log/dvc-machine-init.log

Example env-setup.sh (sourced at exp runtime):

#!/bin/bash
python3.9 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r src/requirements.txt

To run on remote machine:

$ dvc machine create aws-test
$ dvc exp run --machine aws-test
$ dvc machine destroy aws-test

Example: asciicast