Hyperparameter search with cluster computing

This Python package makes it easy to launch lots of jobs with different parameters to be evaluated on a computer cluster. It also has features to monitor the status of running or completed jobs, aggregate their output, and create visualizations.

Dependencies

numpy
scipy
pandas
parse
tqdm

Basic usage

Here is how you can launch an experiment on a Slurm cluster in about 10 lines of code:

import param_search as ps

# define a basic job template and name format
template = '''\
#!/bin/bash
#SBATCH --job-name={job_name}
#SBATCH -o %J.stdout
#SBATCH -e %J.stderr
pwd
product=`python3 -c "print({term1}*{term2})"`
quotient=`python3 -c "print({term1}/{term2})"`

echo "product quotient" > {job_name}.metrics
echo "$product $quotient" >> {job_name}.metrics
'''
name = 'job_{term1}_{term2}'

# define the ranges of parameters to evaluate
param_space = ps.ParamSpace(term1=range(4), term2=range(4))

# submit jobs for each parameter setting
jobs = ps.submit(template, name, param_space, use='slurm')

# check the stdout and stderr
print(jobs.iloc[0].stdout)
print(jobs.iloc[0].stderr)

# read in output metrics
metrics = ps.metrics(jobs)

# plot metrics against parameters
fig = ps.plot(metrics, x=['term1', 'term2'], y=['product', 'quotient'])

Jobs templates and name formats

These are simple python formatting strings. When jobs are submitted, the templates and name formats are filled in with the parameter settings:

job_hash = hash(job_params)
job_name = name.format(**job_params, hash=job_hash)
job_content = template.format(**job_params, job_name=job_name, hash=job_hash)

Parameter spaces

A parameter space is a set of parameters and ranges of values they can take on. They are a sublcass of collections.OrderedDict where keys represent names of parameters and values are the ranges of the parameter values.

param_space = ps.ParamSpace(
	a=range(10),
	b=1e-3,
	c='hello',
	d=[True, False],
)

Note that all values are promoted to non-string iterables on creation (i.e., they are put into a singleton list) to support iteration.

A ParamSpace can be iterated over to produce parameter assignments from the Cartesian product of the value ranges, or it can be randomly sampled, with or without replacement. The iterates are Params objects, which have the same keys as the ParamSpace but each value is a single element of the value range.

for p in param_space:
	print(p)

assert len(param_space) == 20
assert len(param_space.sample(5, replace=False)) == 5

Parameter spaces can also be combined with algebraic operations of addition and multiplication, which allows certain subsets of parameters to be grouped together when iterating or sampling.

param_space_a = ps.ParamSpace(type='a', param=[1,2,3])
param_space_b = ps.ParamSpace(type='b', param=[4,5,6])
param_space_c = ps.ParamSpace(other_param=1.5)

# addition iterates over the param spaces sequentially
#   which requires that the keys be the identical
param_space_ab = param_space_a + param_space_b

# scalar multiplication just repeats the parameter ranges
#   which can be useful for balanced sampling of two subspaces
param_space_ab = 10 * param_space_ab

# multiplying two spaces produces their Cartesian product
#   which requires that the keys be disjoint sets
param_space_abc = param_space_ab * param_space_c

assert len(param_space_abc) == 60

Submitting jobs to a queue

The following queues are supported: LocalQueue, SlurmQueue, and TorqueQueue.

See this Jupyter notebook for a walkthrough.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
param_search		param_search
tests		tests
README.md		README.md
example.ipynb		example.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hyperparameter search with cluster computing

Dependencies

Basic usage

Jobs templates and name formats

Parameter spaces

Submitting jobs to a queue

About

Releases

Packages

Languages

mattragoza/param_search

Folders and files

Latest commit

History

Repository files navigation

Hyperparameter search with cluster computing

Dependencies

Basic usage

Jobs templates and name formats

Parameter spaces

Submitting jobs to a queue

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages