This script strings together the following softwares:
- ONT Albacore Sequencing Pipeline Software (version 2.3.1)
- Porechop (0.2.0)
- NanoPlot (1.13.0)
- LAST-926
- minimap2 (2.10-r761)
- NanoSV (1.1.2)
The script is based off David Coffey's variant with the following changes:
-
Nanoplot, LAST, minimap2, and NanoSV all run on every barcode separated bin created by porechop.
-
Added options to choose LAST vs minimap2 vs both.
-
New organization: a folder is created with the name of the run (taken as input) and the timestamp of when the script was started. Directories within are also labeled accordingly.
-
The settings for each software have been optimized to run over 12 cores with 250gb memory.
-
Parallel processing
The biggest improvements made to this pipeline have been in running the programs after porechop's demultiplexing in parallel. In experimenting with the effects of running post-porechop programs, runtimes have been greatly reduced. All of the following runs were performed on a 3 Gbp read with the programs NanoPlot, LAST/NanoSV, and minimap2/NanoSV:
- The basic pipeline script, which runs with no background processes on every demuxed barcode, takes 235m25.799s.
- An individually parallelized version, where each program is run separately but all the barcodes are run in parallel within each software, takes 69m0.263s.
- A fully parallel version, where the programs are all run in parallel and are each individually parallelized among themselves, takes 56m46.236s.