Why?
The cascading matrix
Install
Hello world
Environments
Variants
Blacklisting and whitelisting
Preparing and restoring
Functions
Rebooting
Timeouts
Fast iterations with reuse
Debugging
Repeating tasks
Passwords and usernames
Including, excluding, and renaming files
Selecting which tasks to run
Disabling unless manually selected
Fetching artifacts
LXD backend
QEMU backend
Google backend
Linode backend
AdHoc backend
More on parallelism
Repacking and delta uploads
Because our integration test machinery was unreasonably frustrating. It was slow, very unstable, hard to make sense of the output, impossible to debug, hard to write tests for, hard to run on multiple environments, and parallelism was not a thing.
Spread came out as a plesant way to fix that. A few simple and concrete concepts that are fun to play with and fix the exact piece missing in the puzzle. It's not Jenkins, it's not Travis, it's not a library, not a language, and it's not even specific to testing. It's a simple way to express what to run and where, what to do before and after it runs, and how to duplicate jobs with minor variations without copy & paste.
Each individual job in Spread has a:
-
Project - There's a single one of those. This is your source code repository or whatever the top-level thing is that you want to run tasks for.
-
Backend - A few lines expressing how to obtain machines for tasks to run on. Add as many of these as you want, as long as it's one of the supported backend types (bonus points for contributing a new backend type - it's rather easy).
-
System - This is the name and version of the operating system that the task will run on. Each backend specifies a set of systems, and optionally the number of machines per system (more parallelism for the impatient).
-
Suite - A couple more lines and you have a group of tasks which can share some settings. Defining this and all the above is done in
spread.yaml
or.spread.yaml
in your project base directory. -
Task - What to run, effectively. All tasks for a suite live under the same directory. One directory per suite, one directory per task, and
task.yaml
inside the latter. -
Variants - All of the above is nice, but this is the reason of much rejoice. Variants are a mechanism that replicates tasks with minor variations with no copy & paste and no trouble. See below for details.
Again, each job in spread has a single one of each of these. If you have two systems and one task, there will be two jobs running in parallel on each of the two systems. If you have two systems and three tasks, you have six jobs, three in parallel with three. See where this is going? You can easily blacklist specific cases too, but this is the basic idea.
Any time you want to see how your matrix looks like and all the jobs that would
run, use the -list
option. It will show one entry per line in the format:
backend:system:suite/task:variant
Please install spread
using Go install method:
go install github.com/snapcore/spread/cmd/spread@latest
Two tiny files and you are in business:
$PROJECT/spread.yaml
project: hello-world
backends:
lxd:
systems: [ubuntu-16.04]
suites:
examples/:
summary: Simple examples
path: /remote/path
$PROJECT/examples/hello/task.yaml
summary: Greet the planet
execute: |
echo "Hello world!"
exit 1
This example uses the LXD backend on the local sytem and thus requires Ubuntu 16.04 or later. If you want to distribute the tasks over to a remote system, try the Linode backend with:
$PROJECT/spread.yaml
backends:
linode:
key: $(HOST:echo $LINODE_API_KEY)
systems: [ubuntu-16.04]
Then run the example with $ spread
from anywhere inside your project tree
for instant gratification. The echo will happen on the remote machine and
system specified, and you'll see the output locally since the task failed
(-vv
to see the output nevertheless).
The default output for the example should look similar to this:
2016/06/06 09:59:34 Allocating server lxd:ubuntu-16.04...
2016/06/06 09:59:55 Waiting for LXD container spread-1-ubuntu-16-04 to have an address...
2016/06/06 09:59:59 Allocated lxd:ubuntu-16.04 (spread-1-ubuntu-16-04).
2016/06/06 09:59:59 Connecting to lxd:ubuntu-16.04 (spread-1-ubuntu-16-04)...
2016/06/06 10:00:04 Connected to lxd:ubuntu-16.04 (spread-1-ubuntu-16-04).
2016/06/06 10:00:04 Sending data to lxd:ubuntu-16.04 (spread-1-ubuntu-16-04)...
2016/06/06 10:00:05 Error executing lxd:ubuntu-16.04:examples/hello:
-----
+ echo Hello world!
Hello world!
+ exit 1
-----
2016/06/06 10:00:05 Discarding lxd:ubuntu-16.04 (spread-1-ubuntu-16-04)...
2016/06/06 10:00:06 Successful tasks: 0
2016/06/06 10:00:06 Aborted tasks: 0
2016/06/06 10:00:06 Failed tasks: 1
- lxd:ubuntu-16.04:examples/hello
Pretty much everything in Spread can be customized with environment variables.
$PROJECT/examples/hello/task.yaml
summary: Greet the planet
environment:
SUBJECT: world
GREETING: Hello $SUBJECT!
execute: |
echo "$GREETING"
exit 1
The values defined for those variables are evaluated at the remote system and
may contain references to other variables as well as commands using the typical
$(...)
shell syntax. As a special case, executing such commands locally at
the host running Spread is also possible via the syntax $(HOST:...)
. This is
handy to feed local details such as API keys out of files or local environment
variables as was done on the Linode example.
Common variables and defaults are possible by defining them in the suite or earlier:
$PROJECT/spread.yaml
(...)
suites:
examples/:
summary: Simple examples
environment:
SUBJECT: sanity
The cascading happens in the following order:
- Project => Backend => System => Suite => Task
All of these can have an equivalent environment field and their variables will be ordered accordingly on executed scripts.
The cascading logic explained is nice, but a great deal of the convenience offered by Spread comes from variants.
If you understood how the environment cascading takes place, watch this:
$PROJECT/spread.yaml
(...)
suites:
examples/:
summary: Simple examples
environment:
SUBJECT/foo: sanity
SUBJECT/bar: lunacy
Now every task under this suite will run twice1 - once for each variant key defined.
Then, let's redefine the examples/hello task like this:
$PROJECT/examples/hello/task.yaml
summary: Greet the planet
environment:
GREETING: Hello
GREETING/bar: Goodbye
SUBJECT/baz: world
execute: |
echo "$GREETING $SUBJECT!"
exit 1
This task under that suite will spawn three independent jobs, producing the following outputs:
- Variant "foo": Hello sanity!
- Variant "bar": Goodbye lunacy!
- Variant "baz": Hello world!
Some key takeaways here:
- Each variant key produces a single job per task.
- It's okay to declare the same variable with and without a variant suffix. The bare one becomes the default.
- The variant key suffix may be comma-separated for multiple definitions at
once (
SUBJECT/foo,bar
).
1 Actually, times two. It's an N-dimensional matrix.
The described cascading and multiplication semantics of that matrix offers plenty of comfort for reproducing the same tasks with minor or major variations, but the real world is full of edge cases. Often things will be mostly smooth except for that one case that doesn't quite make sense on such and such situations.
For fine tuning, Spread has a convenient mechanism for blacklisting or whitelisting particular cases across most axis of the matrix.
For example, let's avoid lunacy altogether by blacklisting the bar variant out of the particular task:
variants:
- -bar
Alternatively, let's reach the same effect by explicitly stating which variants
to use for the task (no -
or +
prefix):
variants:
- foo
- baz
Finally, we can also append another key to current set of variants without replacing the existing ones:
variants:
- +buz
We've been talking about tasks, but this same field works at the project, backend, suite, and task levels. This is also not just about variants either. The following fields may be defined with equivalent add/remove or replace semantics:
backends: [...]
(suite and task)systems: [...]
(suite and task)variants: [...]
(project, backend, suite, and task)
So what if you don't want to run a specific task or whole suite on ubuntu-14.04? Just add this to the task or suite body:
systems: [-ubuntu-14.04]
Cascading also takes place for these settings - each level can add/remove/replace what the previous level defined, again with the ordering:
- Project => Backend => System => Suite => Task
A similar group of tasks will often depend on a similar setup of the system. Instead of copying & pasting logic, suites can define scripts for tasks under them to execute before running, and also scripts that will restore the system to its original state so that follow up logic will find a (hopefuly? :) unmodified base:
$PROJECT/spread.yaml
(...)
suites:
examples/:
summary: Simple examples
prepare: |
echo Preparing...
restore: |
echo Restoring...
The prepare script is called once before any of the tasks inside the suite are run, and the restore script is called once after all of the tasks inside the suite finish running.
Note that the restore script is called even if the prepare or execute scripts failed at any point while running, and it's supposed to do the right job of cleaning up the system even then for follow up logic to find a pristine state. If the restore script itself fails to execute, the whole system is considered broken and follow up jobs will be aborted. If the restore script does a bad job silently, you may instead lose your sleep over curious issues.
By now you may already be getting used to this, but the prepare
and restore
fields are not in fact exclusive of suites. They are available at the project,
backend, suite, and task levels. In addition to those, the project, backend, and
suite levels may also hold prepare-each
and restore-each
fields, which are
run before and after each task executes.
Assuming two tasks available under one suite, one task under another suite, and no failures, this is the ordering of execution:
project prepare
backend1 prepare
suite1 prepare
project prepare-each
backend1 prepare-each
suite1 prepare-each
task1 prepare; task1 execute; task1 restore
suite1 restore-each
backend1 restore-each
project restore-each
project prepare-each
backend1 prepare-each
suite1 prepare-each
task2 prepare; task2 execute; task2 restore
suite restore-each
backend1 restore-each
project restore-each
suite1 restore
suite2 prepare
project prepare-each
backend1 prepare-each
suite2 prepare-each
task3 prepare; task3 execute; task3 restore
suite2 restore-each
backend2 restore-each
project restore-each
suite2 restore
backend1 restore
project restore
Typically only a few of those script slots will be used.
In addition to preparing and restoring scripts, debug
and debug-each
scripts may also be defined in the same places. These are only run when other
scripts fail, and their purpose is to display further information which might
be helpful when trying to understand what went wrong.
A few helper functions are available for scripts to use:
- REBOOT - Reboot the system. See below for details.
- MATCH - Run
grep -q -e
on stdin. Without match, print error including content. - NOMATCH - Assert no match on stdin. If match found, print error including content.
- ERROR - Fail script with provided error message only instead of script trace.
- FATAL - Similar to ERROR, but prevents retries. Specific to adhoc backend.
- ADDRESS - Set allocated system address. Specific to adhoc backend.
Scripts can reboot the system at any point by simply running the REBOOT
function at the exact point the reboot should happen. The system will
then reboot and the same script will be re-executed with the
$SPREAD_REBOOT
environment variable set to the number of times the
script has rebooted the system.
$PROJECT/examples/hello/task.yaml
execute: |
if [ $SPREAD_REBOOT = 0 ]; then
echo "Before reboot"
REBOOT
fi
echo "After reboot"
Alternatively the REBOOT function may also be called with a single
parameter which will be used as the value of $SPREAD_REBOOT
after
the system reboots, instead of the count.
Every 5 minutes a warning will be issued including the operation output since the last warning. If the operation does not finish within 15 minutes, it is killed and considered an error per the usual rules of whatever is being run. For example, a killed task is considered failed, but a killed restore script will render the whole system broken (see Preparing and restoring).
These timings may be tweaked at the project, backend, suite, and task level,
by defining the warn-timeout
and kill-timeout
fields with a value such as
30s
, 1m30s
, 10m
, or 1.5h
. A value of -1
means disable the timeout
altogether.
For fast iterations during development or debugging, it's best to keep the
servers around so they're not allocated and discarded on every run. To do
that just provide the -reuse
flag. On any successful allocation the server
details are immediately written down for tracking. Then, just provide -reuse
again on follow up runs and servers already alive will be used whenever
possible, and new ones will be allocated and also tracked down if necessary to
perform follow up operations.
Without the -resend
flag, the project files previously sent are also left
alone and reused on the next run. That said, the spread.yaml
and task.yaml
content considered is actually the one in the local machine, so any updates to
those will always be taken in account on re-runs.
Once you're done with the servers, throw them away with -discard
. Reused
systems will remain running for as long as desired by default, which may run
the pool out of machines. With Linode you may define the
halt-timeout
option to allow Spread itself to shutdown those systems and use
them, without destroying the data.
The obvious caveat when reusing machines like this is that failing restore scripts or bogus ones may leave the server in a bad state which affects the next run improperly. In such cases the restore scripts should be fixed to be correct and more resilient.
Debugging such remote tasking systems is generally quite boring, and Spread
offers a good hand to make the problem less painful. Just add -debug
to
whatever set of options is in use and it will stop and open a shell at the
exact point you get a failure, with the exact same environment as the script
had. Exit the shell and the process continues, until the next failure, no
matter which backend or system it was in.
Similarly, -shell
, -shell-before
, and -shell-after
allow running an
interactive shell instead of, before, and after the original task script,
respectively, for every job that was selected to run. You'll most probably want
to filter down what is being run when using that mode, to avoid having a
troubling sequence of shells opened.
If you'd prefer to debug by logging in from an independent ssh session, the
-abend
option will abruptly stop the execution on failures, without running
any of the restore scripts. You'll probably want to pair that with the -reuse
option so the server is not discarded, and after you're done with the debugging
in this style it may be necessary to do a run with the -restore
flag, to
clean up the state left behind by the task.
Besides manual debugging through those flags, it's often handy to have more
details displayed once something does break. Next to
(preparing and restoring)[#preparing] scripts, Spread supports
specifying debug
scripts that are run in trace mode and have their output
reported when a failure happens:
$PROJECT/examples/hello/task.yaml
execute: |
echo "Something went wrong."
exit 1
debug: |
dmesg | tail
In a similar way to prepare and restore scripts, these can also be defined
as a debug-each
script at the project, backend, and suite levels, so they
are aggregated and repeated for every task under them.
The order of tasks on every run is random by default, so that it becomes visible when the correctness of some tasks depends on unspecified side effects of prior tasks.
When breakages related to ordering occur, Spread can attempt to reproduce the
ordering used via the -seed
parameter. On every run, the required seed to
reproduce the order utilized will be logged in the output. Note that when
several workers are being used, they will steal pending work from a common
queue based on timing, which means the order may not be exactly the same.
In some cases, it may also be useful to explicitly prioritize some tasks. For example, if there are two workers and one long task, it's best if that known long task starts first, so that the workers can share more of the load. If the long task comes last the two workers will share all the smaller tasks, then one worker will pick the long task, and the other worker will stop since there's nothing else to do. The outcome is a longer total run time.
To define the priority of a task, suite, system, or backend, simply specify the priority field in the desired context:
priority: 100
The larger the priority, the earlier it will be scheduled. The default priority is zero, and negative priorities are supported too.
Reproducing an error may be a very boring experience, and Spread has a way to simplify that process by reexecuting the tasks as many times as desired until the task fails.
To do that there is an option -repeat
which receives an integer indicating
the number of reexecutions to do, being 0 the default value.
To keep things simple and convenient, Spread prepares systems to connect over SSH
as the root user using a single password for all systems. Unless explicitly defined
via the -pass
command line option, the password will be random and different on
each run.
Some of the supported backends may be unable to provide an image with the correct password in place, or with the correct SSH configuration for root to connect. In those cases, the system "username" and "password" fields may be used to tell Spread how to perform the SSH connection:
$PROJECT/spread.yaml
backends:
qemu:
systems:
- debian-sid:
password: mypassword
- ubuntu-16.04:
username: ubuntu
password: ubuntu
If the password field is defined without a username, it specifies the password for root to connect over SSH. If both username and password are provided, the credentials will be used to connect to the system, and password-less sudo must be available for the provided user.
In all cases the end result is the same: a system that executes scripts as root.
The following fields define what is pushed to the remote servers after each server is allocated and where that content is placed:
$PROJECT/spread.yaml
(...)
path: /remote/path
include:
- src/*
exclude:
- src/*.o
rename:
- s,path/one,path/two,
The path
option must be provided to define the base directory where the
content will live under, while include
defaults to a single entry with *
which causes everything inside the project directory to be sent over.
The remote tree will usually look the same as the local one, but that
may be changed using the rename
field. It takes a list of regular
expressions that act on the full relative name of each entry, and also
on the target of symbolic links. To avoid touching the target of symbolic
links append the S
modifier as a suffix of the expression.
Note that Spread will still expect tasks to live in the same directories as they do locally, so these directories cannot be moved.
Often times you'll want to iterate over a single task or a few of these, or a given suite, or perhaps select a specific backend to run on instead of doing them all.
For that you can pass additional arguments indicating what to run:
$ spread my-suite/task-one my-suite/task-two
These arguments are matched against the Spread job name which uniquely identifies it, looking like this:
1. lxd:ubuntu-16.04:mysuite/task-one:variant-a
2. lxd:ubuntu-16.04:mysuite/task-two:variant-b
The provided parameter must match one of the name components exactly,
optionally making use of the ...
wildcard for that. As an exception, the task
name may be matched partially as long as the slash is present as a prefix or
suffix. Matching multiple components at once is also possible separating them
with a colon; they don't have to be consecutive as long as the ordering is
correct.
For example, assuming the two jobs above, these parameters would all match at least one of them:
- lxd
- lxd:mysuite/
- ubuntu-16.04
- ubuntu-...
- mysuite/
- /task-one
- /task...
- mysu...one
- lxd:ubuntu-16.04:variant-a
The -list
option is useful to see what jobs would be selected by a given
filter without actually running them.
It may be useful to have a task written down as part of the suite without it
being run all the time together with the usual tasks. For that, just add a
manual: true
field, and it will only be run when explicitly selected. This is
equivalent to disabing the task, except it may still be run when manually
selected.
For example:
$PROJECT/examples/manually-run/task.yaml
summary: This task only runs manually.
manual: true
...
The logic for explicit selection is the following: if the provided arguments match any non-manual tasks at all, the manual tasks are not run, even if they match the arguments.
Besides tasks, the same logic works for backends, systems, and suites.
Just add the manual
field to their definition and they will only run
when explicitly selected, following the same logic described above: if
the provided arguments match any non-manual suite, matching manual
suites won't run, and so on.
Note that it's fine to have manual and non-manual tasks inside a manual
suite, and so forth. Play with -list
to get a clear idea of what is
selected to run.
Content generated by tasks may easily be retrieved after the task completes
by registering the desired content under the artifacts
field:
$PROJECT/examples/hello/task.yaml
summary: Generate some useful content.
artifacts:
- some/file
- some/dir/
...
The provided directory or file paths are relative to the task directory, and
they are only considered when Spread is run with the -artifacts
flag pointing
to the target directory where content will be downloaded into.
For example, consider the following command:
$ spread -artifacts=./artifacts lxd:ubuntu-16.04:mysuite/task-one:variant-a
Assuming the given task has residual content registered, the directory
./artifacts/lxd:ubuntu-16.04:mysuite/task-one:variant-a
would be created to
hold it after the job is executed.
Residual content is fetched whether the job finishes successfully or not, and even if some of the provided paths are missing.
The LXD backend depends on the LXD container hypervisor available on Ubuntu 16.04 or later, and allows you to run tasks using the local system alone.
Setup LXD there with:
sudo apt update
sudo apt install lxd
sudo lxd init
Then, make sure your local user has access to the lxc
client tool. If you
can run lxc list
without errors, you're good to go. If not, you'll probably
have to logout and login again, or manually change your group with:
$ newgrp lxd
Then, setting up the backend in your project file is as trivial as:
backends:
lxd:
systems:
- ubuntu-16.04
System names are mapped into LXD images the following way:
- ubuntu-16.04 => ubuntu:16.04
- debian-sid => images:debian/sid/amd64
- fedora-8 => images:fedora/8/amd64
- etc
Alternatively they may also be provided explicitly as:
backends:
lxd:
systems:
- ubuntu-16.04:
image: ubuntu:16.04.1
That's it. Have fun with your self-contained multi-system task runner.
The QEMU backend depends on the QEMU emulator available from various sources and allows you to run tasks using the local system alone even if those tasks depend on low-level features not avaliable under LXD.
Setting up the QEMU backend looks similar to:
$PROJECT/spread.yaml
backends:
qemu:
systems:
- ubuntu-16.04:
username: ubuntu
password: ubuntu
For this example to work, a QEMU image must be made available under
~/.spread/qemu/ubuntu-16.04.img
, and when run this image must open
an SSH daemon on port 22 using the provided credentials.
During the initial setup, spread will enable root access over SSH, and
will set its password to the current global password in use for the
running session as usual for every other backend (random by default,
see the -pass
command line option).
The QEMU backend is run with the -nographic
option by default. This
may be changed with export SPREAD_QEMU_GUI=1
.
Note that at the moment QEMU is run via the kvm
script, which enables
the KVM performance optimizations for the local architecture. This will
not work for other architectures, though. This problem may be easily
addressed in the future when use cases show up.
As a hint if you are using Ubuntu, here is an easy way to get a suitable QEMU image:
sudo apt install qemu-kvm autopkgtest
adt-buildvm-ubuntu-cloud
When done move the downloaded image into the location described above.
The Google backend is easy to setup and use, and allows distributing your tasks to remote infrastructure in Google Compute Engine (GCE).
$PROJECT/spread.yaml
(...)
backends:
google:
key: $(HOST:echo $GOOGLE_JSON_FILENAME)
location: yourproject/southamerica-east1-a
systems:
- ubuntu-16.04
# Extended syntax:
- another-system:
image: some-other-image
workers: 3
With these settings the Google backend in Spread will pick credentials from
the JSON file pointed to in $GOOGLE_JSON_FILENAME
environment variable
(we don't want that content inside spread.yaml
itself). If no key is
explicitly provided, Spread will attempt to use the "application default"
credentials as traditional in the Google platform. You can set those up by
using either a service account:
$ gcloud auth application-default activate-service-account --key-file=$GOOGLE_JSON_FILENAME
or your own credentials:
$ gcloud auth application-default login
Service accounts are best as they can be further constrained and not be associated with your overall authenticated access. Do not ship your own credentials to remote systems.
Images are located by first attempting to match the provided value exactly
against the image name, and then some processing is done to verify if an
image with the individual tokens in its description exists. Images are
first searched for in the project itself, and then if the prefix is a
recognized name for which a public image project exists (e.g. ubuntu-*
is searched for in the ubuntu-os-cloud
project too). An explicit image
project may also be requested by prefixing the image name with a project
name, as in "ubuntu-os-cloud/ubuntu-16.04-64".
When these machines terminate running, they will be removed. If anything happens that prevents the immediate removal, they will remain in the account and need to be removed by hand.
For long term use, a dedicated project in the Google Cloud Platform is recommended to prevent automated manipulation of important machines.
The Linode backend is very simple to setup and use as well, and allows distributing your tasks over into remote infrastructure runing in Linode's data centers.
$PROJECT/spread.yaml
(...)
backends:
linode:
key: $(HOST:echo $LINODE_API_KEY)
systems:
- ubuntu-16.04
With these settings the Linode backend in Spread will pick the API key from
the local $LINODE_API_KEY
environment variable (we don't want that in
spread.yaml
), and look for a powered-off server available on that user
account that. When it finds one, it creates a brand new configuration and
disk set to run the tasks. That means you can even reuse existing servers to
run the tasks, if you wish. When discarding the server, assuming no -reuse
or
-debug
, it will power off the server and remove the created configuration
and disks, leaving it ready for the next run.
The root disk is built out of a Linode-supported distribution or a custom image available in the user account. The system name is mapped into an image or distribution label the following way:
- ubuntu-16.04 => Ubuntu 16.04 LTS
- debian-8 => Debian 8
- arch-2015-08 => Arch Linux 2015.08
- etc
Images have user-defined labels, so they're also searched for using the Spread system name itself.
Alternatively, the extended system syntax may be used to define these details:
(...)
backends:
linode:
key: (...)
systems:
- ubuntu-16.04:
image: Ubuntu 16.04
kernel: GRUB 2
The image
value is matched case-insensitively as a prefix of one of the
Linode-supported distributions or a custom
image available in the user account. The kernel
value is
similarly matched against the available kernels.
Both fields are optional. Image defaults to the behavior based on system name described above, and the kernel defaults to the latest recommended Linode kernel.
Reused systems will remain running for as long as desired by default,
which may run the pool out of machines. Define the halt-timeout
option to allow
Spread itself to shutdown those systems and use them, without destroying the data:
$PROJECT/spread.yaml
backends:
linode:
key: (...)
halt-timeout: 6h
systems:
- ubuntu-16.04
The Linode backend can also allocate systems dynamically. For that, just define these two fields specifying which plan you'd like to use for the new machines, and which datacenter to allocate them on:
backends:
linode:
key: (...)
plan: 4GB
location: newark
When these machines terminate running, they will be removed. If anything happens that prevents the immediate removal, they will remain in the account and then be reused by follow up runs and removed when done, effectively garbage collecting what's left behind. System reuse works as explained above too.
Note that in Linode you can create additional users inside your own account that have limited access to a selection of servers only, and with limited permissions on them. You should use this even if your account is entirely dedicated to Spread, because it allows you to constrain what the key in use is allowed to do on your account. Note that you'll need to login with the sub-user to obtain the proper key.
Some links to make your life easier:
The AdHoc backend allows scripting the procedure for allocating and deallocating systems directly in the body of the backend:
$PROJECT/spread.yaml
backends:
adhoc:
allocate: |
echo "Allocating $SPREAD_SYSTEM..."
ADDRESS disposable.machine.address:22
discard:
echo "Discarding $SPREAD_SYSTEM..."
systems:
- ubuntu-16.04
The AdHoc scripts have the following custom commands available:
- ADDRESS addr[:port] - Inform SSH address of machine allocated.
- ERROR message - Exit with error message. Operation may be retried.
- FATAL message - Exit with fatal message. Operation won't be retried.
A failing script (non-zero exit) is equivalent to calling ERROR, but rather than displaying a nice message, the whole script trace and output will be shown.
The following environment variables are available for the scripts to do their job:
- SPREAD_BACKEND - Name of current backend.
- SPREAD_SYSTEM - Name of the system being allocated.
- SPREAD_PASSWORD - Password root will use to connect to the allocated system. Not available if the system has a custom username or password defined.
- SPREAD_SYSTEM_USERNAME - Username Spread will connect as for initial system setup.
- SPREAD_SYSTEM_PASSWORD - Password Spread will connect as for initial system setup.
- SPREAD_SYSTEM_ADDRESS - Address of the allocated system. Only available for discard.
The system allocated by the allocate script must return a system that Spread can connect to over SSH. The system must be either setup to be accessible as root using the session password (random or specified with -pass), or be accessible with the username and password details specified under the system name (see passwords and usernames).
Note that the system returned by adhoc, although it can point to anything accessible over SSH, is supposed to be a disposable system oriented towards running the specified tasks only. It's atypical and dangerous for Spread to be run against important systems, as it will fiddle with their configuration.
The systems
entry under each backend contains a list of systems that will be
allocated on that backend for running tasks concurrently.
Consider these settings:
$PROJECT/spread.yaml
(...)
backends:
linode:
systems:
- ubuntu-14.04
- ubuntu-16.04:
workers: 2
This will cause three different machines to be allocated for running tasks concurrently: one running Ubuntu 14.04 and two 16.04.
Systems share a single job pool generated out of the variable matrix, and will run through it observing the constraints specified. For example, if there is a backend with one ubuntu-16.04 and one ubuntu-16.10 system, and there's one suite with 100 tasks, there will be 200 jobs and each system will run exactly 100 tasks because the 200 jobs were generated precisely so that both systems could be exercised. On the other hand, if that same backend has instead two ubuntu-16.04 systems, there will be only 100 jobs matching the 100 tasks, and each system will run approximately half of them each, assuming similar task execution duration.
Spread can also take multiple backends of the same type. In that case the backend name will not match the backend type and thus the latter must be provided explicitly:
backends:
linode-a:
type: linode
(...)
linode-b:
type: linode
(...)
This is generally not necessary, but may be useful when fine-tuning control over the use of sets of remote machines.
When working over slow networks even small uploads tend to take a bit too long. Spread offers a general "repacking" mechanism that may be used to transform the data being delivered in arbitrary ways, even into a delta of content that the remote servers can obtain by themselves, for example.
The repack
script is run with file descriptors 3 and 4 used as pipes for
the specified project content into and out of the script,
respectively, in tar format. In other words, the original specified
project content may be read from file descriptor 3, and the new project
content must be writen into file descriptor 4.
To illustrate, the following repack script will preserve content unchanged:
$PROJECT/spread.yaml
repack: |
cat <&3 >&4
As a more complex example, the following setup explores that feature to ship a delta from a GitHub repository, computed on top of a tag or commit reference that is knowingly part of the history for all clients:
environment:
DELTA_REF: v1.23
rename:
- s,^,$DELTA_REF,S
exclude:
- .git
repack: |
trap "rm -f delta-ref.tar current.delta" EXIT
git archive -o delta-ref.tar --format=tar --prefix=$DELTA_PREFIX $DELTA_REF
xdelta3 -s delta-ref.tar <&3 > current.delta
tar c current.delta >&4
prepare: |
apt-get install xdelta3
curl -s -o - https://codeload.github.com/myrepo/myproject/tar.gz/$DELTA_REF | gunzip > delta-ref.tar
xdelta3 -d -s delta-ref.tar current.delta | tar x --strip-components=1
rm -f delta-ref.tar current.delta
The rename
and exclude
settings used above ensure that the tarball that goes
into repack
looks like the one offered by GitHub.