Skip to content
Claudia Calabrese edited this page Feb 25, 2019 · 5 revisions

MTOOLBOX OUTPUTS

MToolBox default outputs are:

  • VCF_file.vcf contains all the mitochondrial variant positions against RSRS/rCRS and other meta-information.
  • mt_classification_best_results.csv reports for each sequence the best haplogroup prediction. If the sorting results in more than one best haplogroup prediction with equal probability, the output will enclose all of them.
  • prioritized_variants.txt contains annotation only for prioritized variants for each sample analyzed, defined as variants recognized by the three reference sequences (rCRS, RSRS and MHCS), sorted by increasing nucleotide variability.
  • summary_<date_time>.txt reporting a brief summary of selected options, predicted haplogroups, number of total and prioritized variants for each sample and, for NGS data only, coverage of reconstructed genomes, number of homoplasmic and heteroplasmic variants.
  • OUT_<sample_name> folders containing:
  • outmt.sam: reads mapped onto the human mitochondrial DNA;
  • logmt.txt: GSNAP log file for mitochondrial DNA mapping;
  • outmt.fastq: fastq file of single reads extracted from outmt.sam file;
  • outmt1.fastq: fastq file of paired reads extracted from outmt.sam file;
  • outmt2.fastq: fastq file of paired reads extracted from outmt.sam file;
  • outhumanS.sam: single reads mapped onto the entire human genome;
  • loghumanS.txt: GSNAP log file for human genome mapping (single reads);
  • outhumanP.sam: paired reads mapped onto the entire human genome;
  • loghumanP.txt: GSNAP log file for human genome mapping (paired reads); - OUT.sam: alignments of reads uniquely mapped on mitochondrial genome;
  • OUT2.sam: alignments of reads uniquely mapped on mitochondrial genome, after processing with IndelRealigner and/or MarkDuplicates. This file will be generated anyway, even if these two processes have been disabled;
  • mtDNAassembly-table.txt: the main table describing the assembly position by position;
  • mtDNAassembly-Contigs.fasta: a fasta file including all reconstructed contigs;
  • mtDNAassembly-coverage.txt: a text file including the coverage per contig and per mitochondrial known annotation;
  • logassemble.txt, which is the log file of the assembleMTgenome.py script
  • sorted.csv contains a table with each haplogroup whose prediction is > 90%. It contains the following fields: 1.N = the number of SNPs in the fragment sequence vs RSRS; 2. Nph = the number of SNPs (among N) mapped in Phylotree; 3. Nph_tot = the number of SNPs defining the haplogroup in the whole genome; 4. Nph_exp = the number of SNPs defining the haplogroup in the fragment region; 5. P_Hg = the prediction percentage value for the haplogroup (Nph/Nph_exp*100); 6. Missing sites = the mutation events that are not present in the query genome but were expected from its respective path to the RSRS. These mutations may also point to a sequencing error;
  • merged\_diff.csv file reports the SNPs between the query genome and each of the three sequences RSRS, rCRS and hg_MHCS (Macro-Haplogroup Consensus Sequence);
  • \<sample\_name>.csv contains a table where, for all the haplogroups present in the Phylotree Build 15, are reported the same data as in the file <sequence_name>.sorted.csv, except for the Missing Sites field;
  • annotation.csv is a further elaboration of the file \merged_diff.csv, providing, for each mt variant allele between the query genome and each of the three sequences RSRS, rCRS and hg_MHCS, several annotations:
    1. Sample = sample name;
    2. Variant Allele = nucleotide position in mitochondrial genome followed by the variant allele;
    3. HF = Heteroplasmic Fraction, as reported in the VCF file;
    4. CI_lower;CI_upper = lower and upper limits of the confidence interval of the heteroplasmic fraction;
    5. RSRS = if "yes", the variant is recognized by RSRS;
    6. MHCS = if "yes", the variant is recognized by the Macro-Haplogroup Consensus Sequence;
    7. rCRS = if "yes", the variant is recognized by rCRS;
    8. Haplogroup = the best predicted haplogroup;
    9. Other Haplogroups = if "+", the variant defines other haplogroups beside the sample specific haplogroup;
    10. Locus = mitochondrial gene locus;
    11. Nt Variability = SiteVar variability value;
    12. Codon Position = nucleotide position within the codon;
    13. Aa Change = amino Acid Change;
    14. Aa variability = MitVarProt amino acid variability value;
    15. tRNA annotation = specific information regarding mitochondrial tRNA genes (position in tRNA; tRNA type; cloverleaf secondary region; mature nucleotide; involvement of the specific position in tRNA folding);
    16. Disease score = an overall Disease Score, generated as a weighted average of pathogenicity prediction scores for non-synonymous variants. For details, see the related publication (PMID: 26621530);
    17. RNA predictions = score added for 49 variants in rRNA genes (PMID: 24092330) and 207 variants in tRNA genes (PMID: 21882289; PMID:23696415). Scores were correlated on a scale from 0 to 1. Threshold for rRNAs=0.51. Threshold for tRNAs= 0.31. Low pathogenicity under the fixed thresholds;
    18. MutPred pred = MutPred predictions (High pathogenicity, Low pathogenicity);
    19. MutPred Score = MutPred Pathogenicity Score (0.000-1.000);
    20. PolyPhen-2 HumDiv Pred = Polyphen-2 HumDiv predictions (Benign, Possibly damaging, Probably damaging, Unknown);
    21. PolyPhen-2 HumDiv Prob = Polyphen-2 HumDiv probabilities (0.000-1.000);
    22. PolyPhen-2 HumVar Pred = Polyphen-2 HumVar predictions (Benign, Possibly damaging, Probably damaging, Unknown);
    23. PolyPhen-2 HumVar Prob* = Polyphen-2 HumVar probabilities (0.000-1.000);
    24. PANTHER Pred = PANTHER predictions (Neutral, Disease, Unclassified) by SNPs&GO software;
    25. PANTHER Prob = PANTHER probabilities (0.000-1.000) by SNPs&GO software;
    26. PhD-SNP Pred = PhD-SNP predictions (Neutral, Disease, Unclassified) by SNPs&GO software;
    27. PhD-SNP Prob = PhD-SNP probabilities (0.000-1.000) by SNPs&GO software;
    28. SNPs&GO Pred = SNPs&GO predictions (Neutral, Disease, Unclassified) by SNPs&GO software;
    29. SNPs&GO Prob = SNPs&GO probabilities (0.000-1.000) by SNPs&GO software;
    30. Mitomap Associated Disease(s) = MITOMAP annotation of disease-associated mutations;
    31. Mitomap Homoplasmy = MITOMAP annotation of homoplasmy condition;
    32. Mitomap Heteroplasmy = MITOMAP annotation of heteroplasmy condition;
    33. Somatic Mutations = MITOMAP annotation of cell or tissue type for somatic mutations;
    34. SM Homoplasmy = MITOMAP annotation of homoplasmy condition in somatic mutations;
    35. SM Heteroplasmy = MITOMAP annotation of heteroplasmy condition in somatic mutations;
    36. ClinVar = ClinVar annotation of associated disease(s);
    37. OMIM Link = link to OMIM entry;
    38. dbSNP ID = rs ID reported in dbSNP database;
    39. Mamit-tRNA link = link to Mamit-tRNA site annotation;
    40. PhastCons20Way = PhastCons conservation score calculated on 20 vertebrates using hg38+rCRS as reference sequence;
    41. PhyloP20Way = PhyloP conservation score calculated on 20 vertebrates using hg38+rCRS as reference sequence;
    42. AC/AN 1000 Genomes = Ratio between allele count and allele number of possibly pathogenic variants found in 1000 Genomes using MToolBox;
    43. 1000 Genomes Homoplasmy = annotation of homoplasmy status in 1000 Genomes variants;
    44. 1000 Genomes Heteroplasmy = annotation of the heteroplasmy status in 1000 Genomes variants.

WARNING! Please note that the heteroplasmy fractions and the related confidence interval will be reported only for variants found against the reference sequence chosen for read mapping.

Clone this wiki locally