-
Notifications
You must be signed in to change notification settings - Fork 5
Using Farm
Farm is the College of Agriculture and Environmental Sciences computer cluster that runs Ubuntu Linux version 18.04. It's our primary high performance computing resource in the Ross-Ibarra lab.
This document serves as a reference for the most commonly used SLURM commands and workflows. Setup and other topics can be found on the pages listed below:
- Farm Account Set Up: Generating SSH keys.
- Farm Data Transfer: Logging in, moving files (see also Basics of SSH and SCP)
-
Farm Software Installation: Compiling software, using Java programs, and using
module
. - Farm Interactive Use: Working interactively on Farm.
- Farm Tips and Tricks: Getting around issues like argument limits.
- Farm Productivity Tips: Tips to make working with Farm more productive.
- Farm Emailing Help: How to get help
Farm has a head node, which controls the cluster and compute nodes which is where the action happens. Farm runs on a cluster workload management system called Slurm. For the most part, you interact with Farm using scripts to launch jobs on the compute nodes; you don't run processes on the head node and you don't log into the compute nodes directly. The only tasks that acceptable on the head node are:
- Downloading files (with
wget
orcurl
) - Compiling/building files
- Installing R packages
- Submitting or checking on jobs
The address is username@agri.cse.ucdavis.edu
. username
here will
be your UCD kerberos ID. You will have to have generated a SSH key
(see
this section)
and given the public key part (do not share the private key!) to
CSE Help.
Make your life a little easier by adding the following to ~/.ssh/config
:
Host farm
HostName agri.cse.ucdavis.edu
User username
Replace username
with your username. This will allow you to ssh to
farm with just ssh farm
in the future.
Slurm is a lot like SGE: you submit jobs via batch scripts. These batch scripts have common headers; we will see one below.
First, we can get a sense of our lovely cluster with sinfo
:
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
hi up infinite 4 drain* c8-[25,39-40],c9-35
hi up infinite 64 mix c8-[22-24,30-38,45,54-58,60-61,84-87]
hi up infinite 6 idle c8-[42-44,59],c9-[94,97]
serial up infinite 2 down* c10-[12,40]
serial up infinite 15 mix c10-[8-9,11,13-22,41-42]
serial up infinite 18 idle c10-[10,23-39]
bigmeml up infinite 1 mixed bigmem2
bigmeml up infinite 5 mix bigmem[1,3-6]
bigmemm up infinite 1 mixed bigmem2
bigmemm up infinite 5 mix bigmem[1,3-6]
bigmemh up infinite 1 mixed bigmem2
bigmemh up infinite 5 mix bigmem[1,3-6]
Here we see our bigmems (more on those later), and all of their inferior but still useful friends.
Note that there is a column of STATE
, which indicates the state of
the machine. A better way of looking at what's going on on each
machine is with squeue
, which is the job queue.
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1541913 bigmemh some_job someone R 6-04:27:02 1 bigmem6
1613530 bigmemm some_job someone PD 0:00 1 (Resources)
1544863 bigmeml some_job someone R 5-10:59:32 1 bigmem2
1472908 hi some_job someone R 5-22:16:23 1 c8-22
1477386 serial some_job someone R 14-03:19:29 1 c10-11
This shows each job ID (very important), partition the job is running
on, name of person running the job. Also note TIME
which is how long
a job has been running.
This queue is very important: it can tell us who is running what
where, and how long it's been running. Also, if we realize that we're
accidentally doing something silly like mapping maize reads to the
human genome, we can use squeue
to find the job ID, allowing us to
cancel a job with scancel
. Let's kill vince251
's job eva
:
$ scancel 5370
It's that easy! Slurm is pretty boring so far; all we can do is look at the cluster and try to kill jobs. Let's see how to submit jobs.
We wrap our jobs in little batch scripts, which is nice because these
also help make steps reproducible. We'll see how to write batch
scripts for Slurm in the next section, but suppose we had one written
called steve.sh
. To keep your directory organized, I usually
keep a scripts/
directory (or even slurm-scripts/
if you have lots
of other little scripts).
I like to organize each of my projects in their own directory in a
general ~/projects/
directory. In each project directory, I make a
directory called slurm-log
for Slurm's logs. Tip: use these logs, as
these are very helpful in debugging. I separate them from my
project because they fill up directories rather quickly.
Let's look at an example batch script header for a job called steve
(which is run with script steve.sh
) that's in a project directory
named your-cool-project
(you're going to change these parts).
#!/bin/bash -l
#SBATCH -D /home/vince251/projects/your-cool-project/
#SBATCH -o /home/vince251/projects/your-cool-project/slurm-log/steve-stdout-%j.txt
#SBATCH -e /home/vince251/projects/your-cool-project/slurm-log/steve-stderr-%j.txt
#SBATCH -J steve
#SBATCH -t 24:00:00
set -e
set -u
# insert your script here
-
-D
sets your project directory. -
-o
sets where standard output (of your batch script) goes. -
-e
sets where standard error (of your batch script) goes. -
-J
sets the job name. -
-t
sets the time limit for the job, 24:00:00 indicates 24 hours.
Note that the programs in your batch script can redirect their output however they like — something you will like want to do. This is the standard output and standard error of the batch script itself.
Also note that these directories must already be made — Slurm will not
create them if they don't exist. If they don't exist, sbatch
will
not work and die silently (since there's no place to write standard
error). If you keep trying something and it doesn't log the error,
make sure all these directories exist.
As mentioned, the jobname is how you distinguish your jobs in
squeue
. If we ran this, we'd see "steve" in the JOBS column. Note that the %j
in the error/out files will be the job number, not the name.
The time limit for the job should be greater than the estimated time to complete your job. Time-and-a-half or twice as much time as you think it will take are good rules. If your job reaches this time limit it will be killed. It's frustrating to lose a job because you underestimate the time. Alternatively, you can set this with the --time flag (instead of -t, e.g. --time=1-00:00 sets a time limit of one day)
Try running this test script:
#!/bin/bash -l
#SBATCH -D /home/USERNAME
#SBATCH -J bob
#SBATCH -o /home/USERNAME/out-%A.%a.txt
#SBATCH -e /home/USERNAME/error-%A.%a.txt
#SBATCH -t 24:00:00
#SBATCH --array=0-8
bob=( 1 1 1 2 2 2 3 3 3 )
sue=( 1 2 3 1 2 3 1 2 3 )
block=${bob[$SLURM_ARRAY_TASK_ID]}
min=${sue[$SLURM_ARRAY_TASK_ID]}
echo "$block is $min"
Make sure you switch your user name for USERNAME. You should see a bunch of files named "error" and "out" show up in your home directory. %A
will become the job number, and %a
the number of the particular iteration of the array. Try launching this using sbatch -p bigmem
and sbatch -p med
to make sure you have access to both queues. More info on array jobs can be found below.
Array jobs can also process a fixed number of files. This set of files can be streamed into sed
which grabs the appropriate line (given by the array job), and then passes that to the pipeline. It's wise to use sort
(with -s
for stable) to ensure your input order does not change. For example:
find data/aln -name "*.bam" | xargs -n1 -I{} basename {} .bam | \
sort -s | sed -n "$SLURM_ARRAY_TASK_ID"p | \
xargs -n1 -I{} samtools sort -@ 10 -m2G data/aln/{}.bam $ODIR/{}.sorted
Do not use this if you have spaces in your file name. We could use -print0
, but that would interfere with other commands.
We share the cluster. There is a queue established with multiple users submitting jobs. This means often times SLURM will allocate resources to your jobs only when nodes in your desired partition are open and not occupied by jobs from other users. In order to submit a job (covered in the next section), you must specify which partition your job(s) will run on. Based on your memory and cpu requirements, you can choose to run with the following partition options:
bigmeml
(low): This means that your job might be killed at any time (but then restarted). Great for soaking up unused cycles with short jobs; a particularly good fit for large array jobs with short run times.
bigmemm
(medium): This means your jobs might be suspended, but will resume when a high priority job finishes. Good for long jobs that need lots of cpu, but NOT recommended for MPI jobs. Up to 100% of idle resources can be used.
bigmemh
(high) or (hell yeah, I want this to run): Your job will kill/suspend lower priority jobs. High priority means your jobs will keep the allocated hardware until it's done or there's a system or power failure. Limited to the number of CPUs your group contributed. Please check with Jeff or ask others before running anything using more than 16 cpu on bigmemh
, as we only have 128 slots on this queue.
med
: These are the older parallel nodes. For many jobs not needing high memory, the parallel queue is still the way to go. There are many hundreds of cpu there, but we only have access to them at medium priority. Note that each node in this queue only has 24 cpu and 32G of RAM, so plan your scripts accordingly.
high2
: These are the nodes Jeff paid for. Users in Jeff's group can use up to 512GB ram and 128 CPUs.
med2
: "fair" share of idle nodes, might be suspended. Priority = 20, which let you get 2/8th of idle nodes if it's contended, and up to 100% of the 8 nodes if not.
low2
: "fair" share of idle nodes, might be killed and rescheduled. Priority = 20, which let you get 2/8th of idle nodes if it's contended, and up to 100% of the 8 nodes if not.
To use these, we specify the partition in either sbatch
or our batch script itself.
To do so with sbatch
, do:
$ sbatch -p bigmemh steve.sh
Or, we can do so in the batch script itself by adding a line:
#SBATCH --partition=bigmemh
You can also submit jobs to a specific node in bigmem (say bigmem9) by including -p bigmemh --nodelist=bigmem9
in your sbatch command line.
We submit our batch scripts with sbatch
. For example, we can submit
the steve.sh
job (assuming it's in a scripts/
directory):
$ sbatch -p bigmemm scripts/steve.sh
The -p and -t are both necessary. Most people will want to submit to the bigmemm partition, indicated by -p bigmemm. You also must set a time limit after which your job will be automatically terminated using -t; here we have chosen two hours by indicating hours, minutes, and seconds (-t 2:00:00). Alternatively, you can use the --time flag, it works similarly. It is good to pick a time beyond what you think you need (e.g. twice as much time as you think it needs).
It's that easy! After submitting jobs, check with squeue
that it's
still running (and didn't immediately fail, do to syntax error or a
program not being in your $PATH
or a
module not loaded). If
you don't see a steve
job in squeue
, then it's time to debug. Use
these slurm-log/
directory to standard output and standard error to
figure out what happened. I use ls -lrt
(ls
with reverse time
sort) to see the most recent Slurm log, i.e.:
$ ls -lrt slurm-log/
-rw-rw-r-- 1 vince251 vince251 0 Sep 25 11:07 dwgeval-stderr-5370.txt
drwxrwxr-x 2 vince251 vince251 12288 Sep 25 11:07 .
-rw-rw-r-- 1 vince251 vince251 47 Sep 25 11:34 dwgeval-stderr-5368.txt
Hey cool, that's the name of the last Slurm job (but double check
against the job ID that sbatch
gives you when you run it. Then look
at this in less or something.
If you're using a process that requires more than one
CPU/thread/process, you'll need to allocate more tasks. It's very
important you do this, or else the cluster manager won't know how much
CPU your program is running, and allocate resources thinking it has
more available than it does. This really gums up the works, so don't
do this. To specify the number of tasks (roughly, number of
processes), use sbatch --ntasks=x
where x
is the number.
Farm assigns each CPU/task 8GB of memory by default for all bigmem queues, and 2.6GB of memory per CPU/task for the parallel queue. In order to suspend jobs on the bigmemm and bigmeml queues, farm has changed how memory can be requested. Now, even if you need only 1 CPU and 128Gb of memory, 16 CPU will be counted against the lab's allotment. Warn others if you will be monopolizing lab resources accordingly!
Slurm supports job arrays, which make it easy to submit many jobs of a similar type at once. The basic syntax is:
$ sbatch --array=<start>-<end>
You can also specify this in the script you are submitting via sbatch:
#!/bin/bash -l
#SBATCH --array=0-50
This will submit all 51 jobs simultaneously, and they will run as resources become available. You might not want this if you're submitting on bigmemh
, for example. Luckily, you can limit how many jobs of an array are running simultaneously. Do this with:
--array=0-50%10
This allows 10 jobs to run at the same time.
Slurm also comes with some built in variables for managing file names. SLURM_ARRAY_JOB_ID will be set to the first job ID in the array, and SLURM_ARRAY_TASK_ID will be set to the index of the current job array. You can use these in filenames by inserting %A or %a into your bash script, which will be replaced with SLURM_ARRAY_JOB_ID and SLURM_ARRAY_TASK_ID respectively. For example, slurm-%A_%a.out
will be replaced with slurm-7358_1.out
. You can also use these to index arrays in your script, or to keep track of output file names. For example, if I used an array for all 10 maize chromosomes, --array=1-10
, I could write to file within my script with:
outfile.chr${SLURM_ARRAY_TASK_ID}.txt
More information is available here and an example script is above.
You can monitor jobs a few ways:
-
squeue
: see if they are still running. -
Watching your files grow.
-
Advanced:
ssh
to a node and usetop
, but do not run anything on the nodes this way. Every time you do this, Bill, Jeff, or I will have to ruthlessly strangle a young kitten. So if you not sure what this section means, save the kittens and don'tssh
to the nodes.
An extension of Monitoring Stuff, you should also monitor the amount of space you data and files are taking up.
Good cluster practice involves realizing that many things on the cluster are shared, that includes storage. Currently, there is ~80 Tb of space for the entire lab. That sounds like a lot, but after data and outputs are generated from the many jobs you are bound to run, the cluster will begin to feel a little too clustered. (As an example, fastq files from 8 maize individuals sequenced to 30X will already be close to 1Tb of space if not compressed).
Use du -sh
to check the amount of storage being used in any given directory.
Just cd
to the directory of your choice and enter that command and options.
Check directory size regularly.
Please limit your home directory (~/
, which equals /home/yourusernamehere/
) to 1 Tb of disk space.
If you need more space, we have access to several shared drives, including /group/jrigrp2
, /group/jrigrp4
, /group/jrigrp7
, and several other numbers.
If you are starting a project that you anticipate will take over 500 Gb of space, check shared drives (e.g. df -h /group/jrigrp7
) to find one with enough space, and make a new directory for your project there.
Let others know on slack (#farm_issues) how much space you anticipate using.
Use df -h
to check the size of your drive, its total storage being used, how much you have left on your drive, and where that drive is mounted.
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 854G 638G 173G 79% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 32G 4.0K 32G 1% /dev
tmpfs 6.3G 1.4M 6.3G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 32G 0 32G 0% /run/shm
none 100M 0 100M 0% /run/user
nas-8-0:/export/1/sbhadral 11T 7.9T 2.3T 78% /home/sbhadral
Each node has a /scratch/
directory, which you can use to write intermediate files. But as this is space shared across all Farm users, you need to remove files when you are finished. Usually, you can do this with rm
within your submission script, but sometimes files can be left behind, like if your script encounters errors before erasing the file.
If this happens in an array job, you might not know which nodes to clean up. First, find all nodes that might be affected with sinfo
, then loop through the nodelist to ssh and remove files for i in `scontrol show hostname c8-[62-64,67-77,87-89,91-96] `; do echo $i; ssh $i rm -rf /scratch/mstitzer/*;done
In addition, we has a group scratch directory /group/jriscratch
. They are raid0 so no protection or backup. The idea is to use those as a scratch drive when running something super I/O intensive. Should be many times faster than writing to local scratch. Please note it’s only 2Tb for whole lab so you can NOT store anything there. Just for scratch purposes. Probably a good idea to mention in #farm_issue channel on slack if you plan to use it so it doesn’t fill up.
When transferring files to/from a server, scp
is a nice command for small data, as those are very quick and do not require babysitting (enter man scp
in the command-line for more details). With larger files, the transfers can become problematic as they take quite some time. No one wants to sit and watch files transfer. Solution: screen
and rsync
.
screen
allows you to enter an overlay of your terminal. The screen terminal environment is nearly identical to your previous terminal command-line, but you can now simply "detach" the screen and have it run in the background. No risk of connection drops, you can shutdown your computer and still have your tasks running. To get back to your screen, just "reattach" and continue. (refer to man screen
for more details).
Examples:
Enter screen
:
$ screen
To detach: press "ctrl + a" then "d". This will return you to your regular command-line.
To reattach:
$ screen -r
This will recover your screen immediately if you only have 1 running. If you have multiple screens running, this will return a list of those screens and their respective IDs in this format [pid.tty.host].
With multiple sessions to choose from, reattach with the following.
$ screen -D -r [4137.pts-65.farm]
(NOTE: check up on your active screen sessions with screen -list
)
To terminate screen, enter the respective screen and enter exit
.
(NOTE: From personal experience. When you enter a screen, it is possible to enter a screen within that screen, and a screen within that screen...etc. You can get lost in screen limbo. If you have watched the film Inception(2010) you will understand. Just terminate each subscreen with exit
. You will know when you're out, when you see [screen is terminating] at the top of your window.)
Now that you know how screen
works, you can use it in conjunction with rsync
.
rsync
copies/transfers files fast and reliably with lots of options and versatility (refer to man rsync
). After entering screen
, rsync the files of your choosing from their location into the target destination on farm.
Example:
$ screen
$ rsync -avz <my files> farm:/home/sbhadral/Projects/Rice_project/fixed_fastqs/
Detach screen and check up on your transfer whenever.
(NOTE: You cannot transfer from server to server remotely, you need to be logged into one of the servers. e.g. rilab@169.237.206.32:/home/blah/.. to farm:/home/sbhadral/.. )
Often, we need to work with R interactively on a server. To do this,
we use srun
with the following options:
$ srun -p your-partition -t 2:00:00 --pty R
This will drop you into an interactive R session on the partition
specified by -p
. --pty
launches srun in terminal in pseudoterminal
mode, which makes R behave as it would on your local machine. With srun you still have to set a time limit.
If you want to render R plots to your local computer through X11, here is a solution:
$ srunx your-partition
Make sure your X11
is open before you typing the command.
Using ssh -Y
with X11
may seem like good idea, but this does not specify a partition to work interactively in, so you will end up running things on the head node. Bad idea.
Do not run anything on the headnode except cluster management
tools (squeue
, sbatch
, etc), compilation tasks (but usually ask
CSE help for big apps), or downloading files. If you run anything on
the headnode, you will disgrace your lab. How embarrassing is this?
Imagine if you had to give your QE dressed up like
Richard Simmons. It's that
embarrassing.
Back up your stuff, as Farm does not back up your stuff. Using git is a good option.
Monitor your disk space, as it can fill up quickly.
RILab Wiki
Farm Subwiki
- Farm Information
- How to Use
- Account Set Up
- Data Transfer
- Interactive Use
- Software Installation
- Customization
- Productivity Tips
- Other Tips and Tricks
- Emailing Help
Other Computing Information
Maize
Protocols