Skip to content
Roberto Preste edited this page Nov 16, 2019 · 9 revisions

SYSTEM REQUIREMENTS

  • UNIX-based OS
  • MacOS

All the following dependencies will be installed by default running the install.sh script (see the Installation section):

OTHER FILES REQUIRED

By default, MToolBox adopts the RSRS (Reconstructed Sapiens Reference Sequence, PMID: 22482806) as mitochondrial reference genome and hg19 as nuclear reference genome. Alternatively, users can choose to use the rCRS (revised Cambridge Reference Sequence). Fasta files of hg19 nuclear genome and mitochondrial DNA used by MToolBox are available at https://sourceforge.net/projects/mtoolbox/ where have been uploaded for users' convenience. However, with the v.1.0 of the pipeline they are now downloaded and installed by default by the install.sh MToolBox script, together with the required GSNAP databases.

The MToolBox folder includes the MITOMAP_HMTDB_known_indels.vcf file, containing 127 known indels annotated in MITOMAP and HmtDB, and the related intervals_file.list used by GATK's GenomeAnalysisTK.jar module. The MToolBox folder also includes 2 tab-separated files, patho_table.txt and sitevar_modified.txt, containing variant-specific and site-specific information, respectively, used in the annotation step.

NOTE ON FILE NAMES

The basename for output folder and files will be parsed from the input filename, for each sample.

  • BAM|SAM files: BAM or SAM files MUST be renamed as <sample_name>.ext, e.g.:

    mv HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam HG00096.bam
    

    and "HG00096" will be the output basename.

  • FASTQ files: FASTQ files MUST be renamed as <sample_name>.R1.fastq and <sample_name>.R2.fastq for PAIRED-END data, and <sample_name>.fastq for SINGLE END data. FASTQ compressed input files could be accepted with *.fastq.gz extension.

IMPORTANT: Please note that MToolBox cannot recognize more than one PAIRED-END couple of fastq files (R1+R2) and one SINGLE-END fastq file per sample.

Clone this wiki locally