Comprehensive Analysis of Mycoplasma Pneumoniae
CAMPneu is a Nextflow bioinformatic pipeline that is reproducible, scalable, and suitable for a wide range of computation environments. While extensible, early drafts of CAMPneu are targeted for Illumina paired-end sequence data with the objectives of
- Determining if the specimen belongs to the M. pneumoniae species
- Classification of the subtype (type1 or type2) of M. pneumoniae
- Identification of known SNPs conferring macrolide-resistance present within the sample
System Requirements:
CAMPneu requires systems to have the following installed/available:
- Conda
- Singularity
- Nextflow (to be used with the singularity profile)
CAMPneu is designed to work with both Conda and Singularity container, offering flexibility and reproducibility in computational environments.
CONDA:
Conda excels at managing dependencies and creating isolated environments. Conda is also easy to use across different operating systems and is ideal for setting up reproducible environments on local machines.
- Installation using Conda
conda install -n campneu -c bioconda -c conda-forge -c appliedbinf campneu
conda activate campneu
- Run command
CAMPneu.nf --input <fastq_reads_dir> --output <output_dir> -profile conda
- Help message
CAMPneu.nf --help
SINGULARITY:
Singularity ensures consistency and portability across systems and is tailored for high-performance computing (HPC) environments offering enhanced efficiency.
The conda installed version of CAMPneu can also be run using singularity but if the user does not have access to conda, they can clone the git repository (nextflow is required for this approach).
- Git installation
git clone https://github.com/appliedbinf/CAMPneu.git
- Run the program from the project repository:
nextflow run CAMPneu.nf --input <fastq_reads_dir> --output <output_dir> -profile singularity
- Help message
nextflow run CAMPneu.nf --help
Script Input Requirements
Required arguments:
--input Path to the Paired Fastq Reads directory
--output Directory where process outputs are saved
Optional arguments:
--snpFile Path to the custom SNP bed file
--help Print this message and exit
- Kraken2 Taxonomic Classification: Classifies input sequences based on a pre-built database.
- Quality Control with Fastp: Profiles and filters reads to ensure high-quality data.
- Coverage Assessment with Samtools: Calculates mean depth to evaluate sequencing coverage.
- De Novo Assembly with Unicycler: Reconstructs microbial genomes without a reference.
- ANI Calculation: Determines the best match by comparing the assembled genomes to reference genomes.
- Alignment with Minimap2: Aligns reads to the best-matched reference genome.
- Variant Calling with FreeBayes: Identifies SNPs and genetic variations against a type 1 reference.
- Macrolide-Resistant SNP Identification: Detects SNPs associated with macrolide resistance
The pipeline sets specific thresholds for input paired reads/samples. Any reads or samples that do not meet these thresholds are marked as failed.
- Kraken2 Percentage of Reads assigned to M. Pneumoniea > 90
- Average Q score > 30
- Coverage > 10x
- ANI to reference > 95
- SNP call quality > 100; Depth > 10
- Illumina paired-end sequences
- 23SsnpAnalysis.py: Python script for VCF manipulation and analysis
The scripts generates output directories for each process which have the files generated in the process
- Kraken: kraken reports and kraken summaries for all the paired end reads
- fastp: fastp reports and quality filtered paired end reads
- Coverage_check: samtools coverage report and coverage filtered paired end reads
- assembly: assembled fasta of the QC filtered samples and empty fasta of the failed samples
- fastANI: fastANI report
- bestReference: fastANI report with only the subtyped reference for the sample
- Sample_reports: Reports for each sample summarizing QC and type information
- Summary: Report for the entire run w=summarizing which samples have Passed or failed the QC and the SNPs identified for macrolide resistance