Skip to content

A Nextflow pipeline for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics), with a focus on ncov2019

Notifications You must be signed in to change notification settings

jts/ncov2019-artic-nf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ncov2019-artic-nf

A Nextflow pipeline for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics), with a focus on ncov2019.

This version was forked from COG-UK and customized for CanCOGeN-VirusSeq by adding a dehosting step, switching the variant caller from ivar to freebayes, and adding additional artifact filtering steps. This version is specialized for the Illumina workflow - nanopore support is retained unchanged from the COG-UK version. This documentation focuses on the differences in functionality from the COG-UK version linked above.

Introduction


This Nextflow pipeline automates the ARTIC network nCoV-2019 novel coronavirus bioinformatics protocol. It is being developed to aid the harmonisation of the analysis of sequencing data generated by the COG-UK project. It will turn SARS-COV2 sequencing data (Illumina or Nanopore) into consensus sequences and provide other helpful outputs to assist the project's sequencing centres with submitting data.

Quick-start

Illumina
nextflow run /path/to/repo/ncov2019-artic-nf [-profile conda,singularity,docker,slurm,lsf] \
             --illumina \
             --prefix "output_file_prefix" \
             --directory /path/to/reads \
             --bed /path/to/resources/nCoV-2019_v3_fixed.bed \
             --primer_pairs_tsv /path/to/resources/nCoV-2019_outer_primernames.tsv \
             --ref /path/to/resources/nCoV-2019.reference.fasta \
             --composite_ref /path/to/resources/composite_human_virus_reference.fasta \
             --viral_contig_name MN908947.3 \
             --cpus 8

The composite_ref and viral_contig_name options control the dehosting process. The composite reference genome should be created by merging the SARS-CoV-2 reference genome with the human reference genome then indexing it with bwa index. The primer_pairs_tsv argument is a simple two-column tab-delimited file describing the outer pair of primers for each amplicon. This allows additional amplification artifact filtering.

Installation

An up-to-date version of Nextflow is required because the pipeline is written in DSL2. Following the instructions at https://www.nextflow.io/ to download and install Nextflow should get you a recent-enough version.

Conda

The repo contains a environment.yml files which automatically build the correct conda env if -profile conda is specifed in the command. Although you'll need conda installed, this is probably the easiest way to run this pipeline.

Config

Common configuration options are set in conf/base.config. Workflow specific configuration options are set in conf/nanopore.config and conf/illumina.config They are described and set to sensible defaults (as suggested in the nCoV-2019 novel coronavirus bioinformatics protocol)

Workflows
Illumina

Use --illumina to run the Illumina workflow. Use --directory to point to an Illumina output directory usually coded something like: <date>_<machine_id>_<run_no>_<some_zeros>_<flowcell>. The workflow will recursively grab all fastq files under this directory, so be sure that what you want is in there, and what you don't, isn't!

Important config options are:

Option Description
allowNoprimer Allow reads that don't have primer sequence? Ligation prep = false, nextera = true
illuminaKeepLen Length of illumina reads to keep after primer trimming
illuminaQualThreshold Sliding window quality threshold for keeping reads after primer trimming (illumina)
mpileupDepth Mpileup depth for ivar
varFreqThreshold frequency threshold for variants
varMinDepth Minimum coverage depth to call variants

Output

A subdirectory for each process in the workflow is created in --outdir.

About

A Nextflow pipeline for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics), with a focus on ncov2019

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Nextflow 60.5%
  • Python 34.3%
  • Singularity 2.4%
  • Dockerfile 2.1%
  • Shell 0.7%