Skip to content

Nextflow pipeline for analysis of Nanopore Whole Genome Sequencing

Notifications You must be signed in to change notification settings

dhslab/nextflow-nanopore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Nanopore WGS analysis pipeline

nextflow-nanoproe is a Nextflow pipeline for analysis of Nanopore Whole Genome Sequencing.

Pipeline summary

  1. Basecalling (Guppy) - with GPU run option
  2. Basecalling QC (PycoQC)
  3. Alignment (Guppy with minimap2)
  4. Merge all aligned bam files into asingle file (samtools)
  5. Haplotyping and phased variants calling (PEPPER-Margin-DeepVariant)
  6. Depth calculation (mosdepth)
  7. MultiQC (MultiQC) for Basecalling (PycoQC) and Depth (mosdepth)

Usage

Input files:

  1. fast5 raw reads provided as a full path for the directory containing all fast5 files, either in a configuration file or as (--input path/to/fast5) command line parameter.
  2. Path for reference genome fasta file, either in a configuration file or as (--genome_fasta path/to/genome.fasta) command line parameter.

Running pipeline in LSF cluster (configured to WashU RIS cluster environment)

Example: Test run in RIS cluster

  • This test run takes input of:
    • few ".fast5" files (6 files: ~3.5 GB)
    • chr22.fasta as reference genome

1) Running directly from GitHub:

LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active $HOME:$HOME" bsub -g /dspencer/nextflow -G compute-dspencer -q dspencer -e nextflow_launcher.err -o nextflow_launcher.log -We 2:00 -n 2 -M 12GB -R "select[mem>=16000] span[hosts=1] rusage[mem=16000]" -a "docker(mdivr/centos:v0.1)" "NXF_HOME=${PWD}/.nextflow ; nextflow run dhslab/nextflow-nanopore -r main -profile ris,dhslab_test"

2) Alternatively, clone the repository and run the pipeline from local directory:

git clone https://github.com/dhslab/nextflow-nanopore.git
cd nextflow-nanopore/
LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active $HOME:$HOME" bsub -g /dspencer/nextflow -G compute-dspencer -q dspencer -e nextflow_launcher.err -o nextflow_launcher.log -We 2:00 -n 2 -M 12GB -R "select[mem>=16000] span[hosts=1] rusage[mem=16000]" -a "docker(mdivr/centos:v0.1)" "NXF_HOME=${PWD}/.nextflow ; nextflow run main.nf -profile ris,dhslab_test"

Note:

If the pipeline is intended to be run from local code (after being cloned), instead of running:

nextflow run main.nf -profile ris,dhslab_test

you can run:

nextflow run main.nf -profile ris -c conf/dhslab_test.config

The above two examples are interchangeable. As dhslab profile (defined in nextflow.config file) is basically just importing (or including in nextflow language) conf/dhslab_test.config file to the pipeline scope and append it to the configurations.

However "-profile ris" is still required in both cases as it is important to define the LSF runtime commands.

  • Output:

    • "results/" is the desired output from the test run
    • "work/" is the working directory for all tasks, can be removed if the pipeline ran successfully
  • Example for results output for sample "aml476081" in the test workflow

results/
├── aligned_bams
│   ├── aml476081.bam
│   └── aml476081.bam.bai
├── basecall
│   └── fastq
│       └── aml476081.fastq.gz
├── multiqc
│   ├── multiqc_data
│   │   ├── mosdepth_cov_dist.txt
│   │   ├── mosdepth_cumcov_dist.txt
│   │   ├── mosdepth_perchrom.txt
│   │   ├── multiqc.log
│   │   ├── multiqc_citations.txt
│   │   ├── multiqc_data.json
│   │   ├── multiqc_general_stats.txt
│   │   ├── multiqc_sources.txt
│   │   └── pycoqc.txt
│   └── multiqc_report.html
├── pepper
│   ├── haplotagged_bam
│   │   ├── aml476081.haplotagged.bam
│   │   └── aml476081.haplotagged.bam.bai
│   └── vcf
│       ├── aml476081.phased.vcf.gz
│       ├── aml476081.phased.vcf.gz.tbi
│       ├── aml476081.vcf.gz
│       └── aml476081.vcf.gz.tbi
└── pipeline_info
    ├── pipeline_report.html
    └── pipeline_timeline.html

About

Nextflow pipeline for analysis of Nanopore Whole Genome Sequencing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published