Description of the metagenomic pipeline we have for analysis of 16S and 18S amplicon NGS.
At the Microbial Genomics Laboratory we have developed a simplified pipeline to analyse NGS sequences comming from the Illumina platform. The aim of this pipeline is to have a very simple, highly standarised, workflow to quickly process many samples. To achieve this, we have worked out the parameters that are enough to get a good result, of course, if you want to change some of this parameters, you'll have to mess with the code and since it is in bash it is quite straithforward.
Not all script are already uploaded here in Github, but soon will be.
The pipeline has four simple bash scripts that have to be run in the following order:
- pair-end_cleaner to unzip, clean, assemble, and convert illumina pair-end fastq files in all subdirectories for 16S amplicon data (V3, V4 and V3-V4 regions).
- chimera_detector to detect and eliminate chimeric sequences based on their similarity with a selected database.
- mg_classifier, a very fast script that taxonomically classifies sequences.
- metagenomic_reporter a script that collects reports and log files from the previous three scripts and produces a comprehensive report.
Each script describe the dependencies they need, but basically, all of them are already incorporated in any Ubuntu distro. You will also need the following scripts, so please install them first:
- CUTADAPT v1.13
- PEAR v0.9.8
- VSEARCH v2.5.0
- And several databases
You need to have the preformatted databases in a subdirectory named /opt/mg_pipeline/databases/
More info about the databases here.