This repo contains various analysis pipelines for the lab. Here are the basic rules:
- each folder includes pipelines for a particular analysis - data type combination
- pipelines are nextflow workflows
- each pipeline comes with a list of conda environment files that manage the required software
Pipelines will usually operate from a top level project directory structured in the following way:
[project root]
├─ [pipeline].nf
├─ data
│ ├─ raw
│ │ ├─ sample1_R1.fastq.gz
│ │ ├─ sample1_R2.fastq.gz
│ │ └─ ...
│ ├─ figures
│ ├─ fig1.png
│ │ └─ ...
│ └─ ...
└─ refs
├─ eggnog
└─ kraken2
The initial raw data lives in data/raw
and all analysis artifacts should
be written into data/
as well. Figures go into figures/
.
The first step is to copy or symlink the pipeline files into the top project directory. After that you can set up a conda environment that includes all software for the pipeline (please see individual pipelines for variations on that).
conda env create -f conda.yml
Either activate the environment (usualy named after the pipeline):
conda activate metagenomics
or run the pipeline with the -with-conda /my/envs/metagenomics
option (required for HPC).
You may also create a nextflow config either in the project
directory as nextflow.config
or in your user HOME as ~/.nextflow/config
. A template config is
included in this repo. If you are a lab member please use the optimized
version from the wiki.
To install it as a global configuration:
On the server run the following to create the config directory
mkdir ~/.nextflow
After that edit and copy the config:
cp /path/to/pipelines/nextlow.config ~/.nextflow/config
Add in your token if you want to use Nextflow Tower to track your pipeline.
For slurm substitute the partition name default
with the SLURM partition.
After setup you can test the pipeline with
nextflow run [WORKFLOW].nf -profile local -resume
By default this will use all available 12 CPUs and 128 GB RAM unless specified otherwise in the personal netxflow config.