Skip to content

schneiderthomas/OTA-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OTA-pipeline

Open source tumor amplicon pipeline, ie an alternative Bioinformatic Pipeline for AmpliconDS, that works for any ampliconDS library given a proper manifest file

This program is designed to run through a NextSeq or MiSeq run directory looking for fastq files located in ${current directory or specified directory}/Data/Intensities/BaseCalls/ Note: MiSeq folders start with the structure YYMMDD_machinename_NNNN where machinename starts with a 'M'ex: 140729_M01382_0050_000000000-AAE8K and NextSeq folder start with the structure YYMMDD_machinename_NNNN where machinename starts with a 'N'ex: 140729_N01382_0050_000000000-AAE8K AlAmpDS expects YYMMDD_machinename with machinename starting with 'M' or 'N'. Modification of code or changing the name of folders is necessary if run on hiseq

The main script file to run is runAltPipeline.sh.

example usage

bash /<OTA-pipeline directory>/runAltPipeline -h #to get help and see the different parameters
bash /<OTA-pipeline directory>/runAltPipeline -s /<OTA-pipeline directory>/trusight_tumor_pipeline.sh > output_alt_pipeline_run.txt 2>&1&
nohup sh /<OTA-pipeline directory>/runAltPipeline.sh -debugging true -validation true> output_alt_pipeline_run.txt 2>&1&

It is highlest suggested to make script alias to make running the pipeline easier

cd ~
vim ./.bashrc

in the bashrc file under the # User specific aliases and functions section (modify as appropriate for your machine)

alias runAltPipeline='nohup bash /<OTA-pipeline directory>/runAltPipeline.sh > output_alt_pipeline_run.txt 2>&1&'
alias debugRunAltPipeline='nohup bash /<OTA-pipeline directory>/runAltPipeline.sh -debugging true -validation true> output_alt_pipeline_run.txt 2>&1&'
alias validationRunAltPipeline='nohup sh /<OTA-pipeline directory>/runAltPipeline.sh -validation true > output_alt_pipeline_run.txt 2>&1&'

where runAltPipeline is the default, debugRunAltPipeline and validationRunAltPipeline do not get rid of temporary files, debugRunAltPipeline has less restrictions region depth (to use when testing pipeline with very small artifical fastqs)

Note:when running the above code the user needs to be in the top directory of a NextSeq or MiSeq folder as a home_dir was not specified

Parameters

PIPELINLE_DIR, this variable needs to be set in ./.bashrc file

cd ~
vim ./.bashrc
###add this under the alias section, modify to point to the main directory folder of this repository
PIPELINE_DIR=/home/ec2-user/ampDsTs;export PIPELINE_DIR

THREADS - number of threads to use when calling functions that support multi-threaded workflow (default 25)
this parameter can be changed by specifying the -threads parameter when calling runAltPipeline.sh
MEMORY - integer: the amount of memory to specify for the java virtual manager to use: default 16
active_case_limit - integer: number of cases to process at one time, default is 8

nohup sh $PIPELINE_DIR/runAltPipeline.sh -threads 25 -memory 16 -active_case_limit 8 > output_alt_pipeline_run.txt 2>&1&'

Dependencies

There is a script file called download_dependencies.sh that will help you download all of these programs if running on ubuntu, similar code for Red-hat is commented out which can be removed if necessary. Please note that this file will NOT download GATK and Annovar as those programs have license agreements. To launch the script in the terminal type, this simple script will download all dependencies in the directory it currently resides in

bash download_dependencies.sh

bash -variables need to be set in ~./.bashrc file -some of the code uses bash syntax so need to make sure bash installed on linux distribution

To install, proceed to install in the order below

git

sudo yum install git #Red-hat
sudo apt-get install git #ubuntu

zip

sudo yum install unzip #Red-hat
sudo apt-get install unzip #ubuntu

java

sudo yum install java-1.8.0-openjdk-devel #Red-hat
sudo apt-get install openjdk-8-jdk #Ubuntu

wget

sudo yum install wget #Red-hat
sudo apt-get install wget #Ubuntu

gcc

sudo yum install gcc #red-hat
sudo apt-get install gcc #ubuntu

python-devel

sudo yum install python-devel #Red-hat
sudo yum install python-dev #Ubuntu

zlib

sudo yum install zlib-devel #Red-hat
sudo apt-get install zlib1g-dev #ubuntu

g++

sudo yum install gcc-c++ #red-hat
sudo apt-get install g++ #ubuntu

curses

sudo apt-get install libncurses5-dev libncursesw5-dev #ubuntu
yum install ncurses-devel ncurses #red-hat

download the git repository

gitclone https://github.com/schneiderthomas/AltAmpDs

Python Dependencies

pip

wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm #red-hat
sudo yum install epel-release-7.noarch.rpm #red-hat
sudo yum install python-pip #red-hat
sudo apt-get install python-pip #ubuntu

biopython (1.66)

sudo pip install biopython==1.66

pysam (0.8.4)

sudo pip install pysam==0.8.4

pyvcf (0.6.7)

sudo pip install pyvcf==0.6.7

Pandas (0.16.2)

sudo pip install pandas==0.16.2

regex (2015.3.18)

sudo pip install regex==2015.3.18

Linux/Shell Dependencies

Zenity

to display dialog boxes from shell script (to let tech know that processing is done)

sudo yum install zenity #red-hat
sudo apt-get install zenity #ubuntu

xterm

sudo yum install xterm #red-hat
sudo yum install xorg-x11-xauth.x86_64 xorg-x11-server-utils.x86_64 dbus-x11.x86_64 #red-hat
sudo apt-get install xterm xorg dbus #ubuntu

bcl2fastq (v2.17) to convert files from bcl to fastq files

#optional, already provided as zip
#Red-hat
wget 'ftp://webdata2:webdata2@ussd-ftp.illumina.com/downloads/software/bcl2fastq/bcl2fastq2-v2.17.1.14-Linux-x86_64.zip'
unzip bcl2fastq2-v2.17.1.14-Linux-x86_64.zip
yum localinstall bcl2fastq2-v2.17.1.14-Linux-x86_64.rpm
#if unbuntu
sudo apt-get install alien dpkg-dev debhelper build-essential #needed for unbuntu
sudo alien bcl2fastq2-v2.17.1.14-Linux-x86_64.rpm
sudo dpkg -i bcl2fastq2-v2.17.1.14-Linux-x86_64.deb

Major Program dependencies

these programs need to be downloaded and/or compiled and their resulting directories need to be placed in this directory, a more recent version may be used but there may be some compatibility issues with the pipeline as it is

GATK 3.5 (VERY IMPORTANT AT LEAST 3.5)
annovar - 2014-11-12

freebayes v0.9.20
bcftools-1.2
FastQC v0.11.3
htslib-1.2.1
IGVTools 2.3.57
picard 2.10
samtools_1.2
snpeff 4.1g 2015-05-17
varscan v2.3.9
bwa 0.7.10
vcflib v.1.0.0
CoverageQC - for debugging
bedtools2 -> Version 2.26.0
Trimmomatic 0.33

Please note annovar and GATK have license agreements must be accepted before you download them and therefore they cannot be downloaded using the above script. Instructions to download these files are below:

#annovar please download annovar, version 2014-11-12 was used originally (therefore is the preferred version to ensure compatibility), to download Annovar click here. After downloading annovar place the annovar folder entitled "annovar" in the current directory Note: the original splicing threshold for annovar is to 2, this can be modified if one goes to file table_annovar.pl and modifies the line

$sc = "annotate_variation.pl -geneanno -buildver $buildver -dbtype $protocol -hgvs -outfile $tempfile.$protocol -exonsort $queryfile $dbloc";

to

$sc = "annotate_variation.pl -geneanno -buildver $buildver -dbtype $protocol -splicing_threshold 5 -hgvs -outfile $tempfile.$protocol -exonsort $queryfile $dbloc";

GATK

version 3.5 is being used for this pipeline get the latest software here if download version higher than 3.5, need to change line 34 in amplicon_ds_pipeline.sh as appropriate

Extra

In this repository there is a folder called ART, in here you will find shell that can be used to create artifical FASTQ files similar to an ampliconDS run. ART version ChocolateCherryCake-03-19-2015 was used in these scripts.

Download the latest ART program here.

Reference Files

hg19

will be downloaded if use download_dependencies.sh script

ANNOVAR reference files

will download_dependencies.sh install clinvar, cosmic, exac, snp and 1000g in the annovar directory, see download_dependencies.sh if curious

NOTES ON MAJOR FILES

runAltPipeline.sh

-the shell script which runs through the current directory (unless given) and feeds files to the pipeline shell script (location can be specified with -s command but default parameters are at the top of the shell script which can be changed if one moves the directory

ASSUMPTIONS:

  • the directory has a folder structure
    BaseDirectory -> Data -> Intensities -> BaseCalls
    will exit if this is not seen

  • there needs to be an even number of fastq files (not including the Undetermined FASTQ files) because there always be either two fastq files (or 8 when a NextSeq Folder with no lane splitting) in Amplicon DS pipeline, will exit if does not see this

  • if no FASTQ files are present then there needs to be tiffs in BaseDirectory-> Images folder or bcl files in BaseFolder -> Data -> Intensities -> BaseCalls -> L001 & L002 & L003 & L004 so bcl2fastq can turn the images or bcl filtes to fastq files

  • The BaseFolder name has to start to like 160518_N or 160518_M where the first part is a number and then there is an underscore and either an N letter or a M letter (this tells the script if it is dealing with a NextSeq or MiSeq folder), if the name does not start like this it should exit in an error

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published