-
Notifications
You must be signed in to change notification settings - Fork 38
Requirements
- UNIX-based OS
- MacOS
All the following dependencies will be installed by default running the install.sh
script (see the Installation section):
-
Python 2.7 (www.python.org, provided within the Anaconda distribution, https://www.continuum.io/downloads)
-
Samtools (https://sourceforge.net/projects/samtools/files/samtools/)
-
ZLIB (http://zlib.net/)
-
GATK (optional, only in case the user may want to run GATK IndelRealigner. https://software.broadinstitute.org/gatk/download/). The user is asked to place the GATK package into the
MToolBox/ext_tools
folder:mv GenomeAnalysisTK.jar /path/to/MToolBox/MToolBox/ext_tools/
By default, MToolBox adopts the RSRS (Reconstructed Sapiens Reference Sequence, PMID: 22482806) as mitochondrial reference genome and hg19 as nuclear reference genome. Alternatively, users can choose to use the rCRS (revised Cambridge Reference Sequence).
Fasta files of hg19 nuclear genome and mitochondrial DNA used by MToolBox are available at https://sourceforge.net/projects/mtoolbox/ where have been uploaded for users' convenience. However, with the v.1.0 of the pipeline they are now downloaded and installed by default by the install.sh
MToolBox script, together with the required GSNAP databases.
The MToolBox folder includes the MITOMAP_HMTDB_known_indels.vcf
file, containing 127 known indels annotated in MITOMAP and HmtDB, and the related intervals_file.list
used by GATK's GenomeAnalysisTK.jar
module.
The MToolBox folder also includes 2 tab-separated files, patho_table.txt
and sitevar_modified.txt
, containing variant-specific and site-specific information, respectively, used in the annotation step.
The basename for output folder and files will be parsed from the input filename, for each sample.
-
BAM|SAM files: BAM or SAM files MUST be renamed as
<sample_name>.ext
, e.g.:mv HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam HG00096.bam
and "HG00096" will be the output basename.
-
FASTQ files: FASTQ files MUST be renamed as
<sample_name>.R1.fastq
and<sample_name>.R2.fastq
for PAIRED-END data, and<sample_name>.fastq
for SINGLE END data. FASTQ compressed input files could be accepted with *.fastq.gz extension.
IMPORTANT: Please note that MToolBox cannot recognize more than one PAIRED-END couple of fastq files (R1+R2) and one SINGLE-END fastq file per sample.