A tool to facilitate creating a bunch of compute jobs
If you use either PBS or SLURM and you want to run a series of jobs that are very similar but differ in 1 or more parameter values or input files, this tool could help.
Install using pip install create_jobs
. Click here to view this on the Python Package Index site. If you don't have pip
on your computer, then download Anaconda. You don't need root permissions. If you're on Linux, the download will give you a bash
script that you just run using something like bash Anaconda2-2.4.0-Linux-x86_64.sh
. This will give you a python distribution that includes pip
.
If you're in the Riggleman lab and using our research cluster (rrlogin
), then I already have the Anaconda install script ready to go. You just need to run bash /opt/share/lindsb/Anaconda3-4.4.0-Linux-x86_64.sh
. Make sure to let it add the line to alter your PATH
in .bashrc
, then source your .bashrc
file with source .bashrc
. Then you can type pip install create_jobs
.
This tool operates on the concept of input files, a job submission file, and a table of parameters for each job. The table is just a space-separated text file with columns representing variables and rows representing simulations.
Here's a sample workflow. Let's say I have some simulation code that reads a file called bcp.input
, which specifies a nanorod length, and other input parameters. I want to generate 3 folders, each of which will run a simulation with a different nanorod length. My bcp.input
file looks something like this:
1000 # Number of iterations
60 # Polymer length
1 # Nanorod radius
{NRLENGTH} # Nanorod length
Note that I put a variable NRLENGTH
inside brackets {}
. You can have as many variables like this in as many input files as you want as well as your submission script. My submission script, sub.sh
, looks something like this:
#!/bin/sh
#PBS -N {JOB_NAME}
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00,mem=2gb
cd $PBS_O_WORKDIR
# Run code that happens to look for bcp.input in the current directory
mpirun $HOME/code/awesome_code.exe
I put the variable JOB_NAME
in brackets here too, so that each simulation will have a unique name when examining your jobs with something like qstat
or squeue
. (Note: JOB_NAME
is a special variable. More on that in the next section.) Finally, I need to make a file that contains a space-separated table where each row represents a simulation and each column represents a variable. I'll call it trials.txt
. Here's what it looks like:
JOB_NAME NRLENGTH
lnr-4 4
lnr-5 5
lnr-6 6
With those files in place, I just need to call create_jobs
and pass those files as arguments. The command looks like this:
$ create_jobs -i bcp.input -s sub.sh -t trials.txt
Copying files to ./lnr-4 and replacing vars
submitting ./lnr-4/sub.sh
3719921.rrlogin.internal
Copying files to ./lnr-5 and replacing vars
submitting ./lnr-5/sub.sh
3719922.rrlogin.internal
Copying files to ./lnr-6 and replacing vars
submitting ./lnr-6/sub.sh
3719923.rrlogin.internal
The JOB_NAME
variable is actually a special variable that determines the name of the simulation folder. It is generated automatically if not provided in the table (trials.txt
in this case). If it's not provided, the JOB_NAME
column will be internally created by copying the first column it sees with unique values for each row. So in this case, we could simply use the following for trials.txt
:
NRLENGTH
4
5
6
and the simulation folders would just be 4
, 5
, and 6
. {JOB_NAME}
in sub.sh
would also be replaced with either 4
, 5
, or 6
in each simulation folder.
The default submission script name and parameter table file names looked for with this tool are sub.sh
and trials.txt
, respectively. So if you happen to be using those file names for your templates, you can simply type create_jobs -i bcp.input
and skip the -s
and -t
flags.
- You can put as many input files as you want after the
-i
flag. - If you want to generate all the folders and files but not submit the jobs as a dry-run, you can add the
-n
flag to your command. - This tool can be used to help generate some more complex workflows as well, like including changing file names when they're copied like including variables within input file names. I haven't added documentation for those capabilities, but email me or raise an issue if you'd like to know how to use them.