wgsnano

Whole Genome Sequencing by Nanopore data analysis

Introduction

nf-core-wgsnano is a bioinformatics best-practice analysis pipeline for Nanopore Whole Genome Sequencing.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible.

Pipeline summary

Basecalling (Guppy) - with GPU run option
Basecalling QC (PycoQC)
Alignment (Guppy with minimap2)
Merge all aligned bam files into a single file (samtools)
Haplotyping and phased variants calling (PEPPER-Margin-DeepVariant)
Methylation calls extraction from bam to bed files (modbam2bed)
Depth calculation (mosdepth)
MultiQC (MultiQC) for Basecalling (PycoQC) and Depth (mosdepth)

Quick Start

Install Nextflow (>=22.10.1)
Install any of Docker, Singularity (you can follow this tutorial), Podman, Shifter or Charliecloud for full pipeline reproducibility (this pipeline can NOT be run with conda)). This requirement is not needed for running the pipeline in WashU RIS cluster.

Download the pipeline and test it on a minimal dataset with a single command:

nextflow run dhslab/nf-core-wgsnano -profile test,YOURPROFILE(S) --outdir <OUTDIR>

Start running your own analysis!

nextflow run dhslab/nf-core-wgsnano --input samplesheet.csv --fasta <FASTA> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>

Usage

Required parameters:

Input samplesheet.cvs which provides directory paths for fast5 raw reads and their metadata. this can be provided either in a configuration file or as --input path/to/samplesheet.cvs command line parameter. Example sheet located in assets/samplesheet.csv.
Reference genome fasta file, either in a configuration file or as --fasta path/to/genome.fasta command line parameter.

Nanopore and runtime default parameters:

The following parameters are set to the shown default values, but should be modified when required in command line, or in user-provided config files:

--basecall_config dna_r10.4.1_e8.2_400bps_modbases_5mc_cg_sup.cfg -> Guppy's config file for basecalling
--nanopore_reads_type ont_r10_q20 -> PEPPER's reads-type option
--use_gpu true -> Enable GPU for basecalling (it can be disabled, but basecallling will take significantly longer time with CPU run)

Running a pipeline test in LSF cluster (configured to WashU RIS cluster environment)

1) Directly from GitHub:

NXF_HOME=${PWD}/.nextflow LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active $HOME:$HOME" bsub -g /dspencer/nextflow -G compute-dspencer -q dspencer -e nextflow_launcher.err -o nextflow_launcher.log -We 2:00 -n 2 -M 12GB -R "select[mem>=16000] span[hosts=1] rusage[mem=16000]" -a "docker(ghcr.io/dhslab/docker-nextflow)" nextflow run dhslab/nf-core-wgsnano -r dev -profile test,ris,dhslab --outdir results

Notice that three profiles are used here:

test-> to provide input and fasta paths for the test run
ris-> to set general configuration for RIS LSF cluster
dhslab-> to set lab-specific cluster configuration

2) Alternatively, clone the repository and run the pipeline from local directory:

git clone https://github.com/dhslab/nf-core-wgsnano.git
cd nf-core-wgsnano/
chmod +x bin/*
LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active $HOME:$HOME" bsub -g /dspencer/nextflow -G compute-dspencer -q dspencer -e nextflow_launcher.err -o nextflow_launcher.log -We 2:00 -n 2 -M 12GB -R "select[mem>=16000] span[hosts=1] rusage[mem=16000]" -a "docker(ghcr.io/dhslab/docker-nextflow)" "NXF_HOME=${PWD}/.nextflow ; nextflow run main.nf -profile test,ris,dhslab --outdir results"

Directory tree for test run output:

.
├── multiqc
│   ├── multiqc_data
│   └── multiqc_plots
│       ├── pdf
│       ├── png
│       └── svg
├── pipeline_info
└── samples
    ├── sample_1
    │   ├── fastq
    │   ├── methylation_calls
    │   │   ├── accumulated
    │   │   └── stranded
    │   ├── pepper
    │   │   ├── haplotagged_bam
    │   │   └── vcf
    │   └── qc
    │       ├── mosdepth
    │       └── pycoqc
    └── sample_2
        ├── fastq
        ├── methylation_calls
        │   ├── accumulated
        │   └── stranded
        ├── pepper
        │   ├── haplotagged_bam
        │   └── vcf
        └── qc
            ├── mosdepth
            └── pycoqc

Notes:

The pipeline is developed and optimized to be run in WashU RIS (LSF) HPC, but could be deployed in any HPC environment supported by Nextflow.
The pipeline does NOT support conda because some of the tools used are not available as conda packages.
The pipeline can NOT be fully tested in a personal computer as basecalling step is computationally intense even for small test files. For testing/development purposes, the pipeline can be run in stub (dry-run) mode (see below).

Stub (dry-run) for testing and development purposes

stub run requires aws cli and docker (or any other Containerization software)
steps:
1. download the pipeline
2. download the stub-data results generated from pre-run test analysis (requires aws cli installed). It should be downloaded in the pipeline directory (wgsnano/)
3. Run the pipeline in stub mode

git clone https://github.com/dhslab/nf-core-wgsnano.git
cd nf-core-wgsnano/
aws s3 sync s3://davidspencerlab/nextflow/wgsnano/test-datasets/stub-test/ stub-test/ --no-sign-request
nextflow run main.nf -stub -profile stub,docker --outdir results

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
assets		assets
bin		bin
conf		conf
docs		docs
lib		lib
modules		modules
subworkflows/local		subworkflows/local
workflows		workflows
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.nf-core.yml		.nf-core.yml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
clean.sh		clean.sh
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wgsnano

Whole Genome Sequencing by Nanopore data analysis

Introduction

Pipeline summary

Quick Start

Usage

Required parameters:

Nanopore and runtime default parameters:

Running a pipeline test in LSF cluster (configured to WashU RIS cluster environment)

1) Directly from GitHub:

2) Alternatively, clone the repository and run the pipeline from local directory:

Directory tree for test run output:

Notes:

About

Releases

Packages

Languages

License

dhslab/nf-core-rnanano

Folders and files

Latest commit

History

Repository files navigation

wgsnano

Whole Genome Sequencing by Nanopore data analysis

Introduction

Pipeline summary

Quick Start

Usage

Required parameters:

Nanopore and runtime default parameters:

Running a pipeline test in LSF cluster (configured to WashU RIS cluster environment)

1) Directly from GitHub:

2) Alternatively, clone the repository and run the pipeline from local directory:

Directory tree for test run output:

Notes:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages