Nextflow Implementation for Metagenomics Analysis with MetaPhlAn v4. This workflow executes MetaPhlAn v4 to perform taxonomic profiling and subsequently extracts the most prominent clades. Additionally, it generates clade ranges found in the microbiomes of healthy individuals using the curatedMetagenomicData resource. While currently utilizing raw data generously provided by Paolo Manghi, which is compatible with MetaPhlAn v4, it also offers support for MetaPhlAn v3.
Clone the repo
git clone git@github.com:zhanyinx/metagenomics.git
Download the MetaPhlAn databases from here
The control databases can be downloaded missing link
Update in the configuration file (nextflow.config) by setting the path to the databases:
-
bowtie2db: MetaPhlAn databases that can be downloaded from here
-
control_db: control databases that can be downloaded missing link
To run the pipeline
nextflow run path_to/main.nf -c yourconfig -profile singularity --input samplesheet.csv --outdir outdir
The nextflow pipeline takes as input a csv samplesheet with 3 columns
IMPORTANT: HEADER is required
patient | sample_path | population |
---|---|---|
patient1 | path2fastq.gz files | Europe |
..... | ..... | ..... |
sample_path must be provided with full path, not relative path.
Population that the individual belongs to, available options: Europe, Asia, North_America, South_America, Africa, Oceania, all (all population merged).
Output structure:
params.outdir
|-- date
| `-- patient
| |-- patient.profiled_metagenome.txt
| |-- patient.topXX.csv
The pipeline outputs for each patient two files
- patient.profiled_metagenome.txt: the whole MetaPhlAn quantification
- patient.topXX.csv: the top XX clades with the abundancy found in healthy patients belonging to the same population.