Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any gold standard variant calls for mitochondria? #92

Open
mtorralvo opened this issue Feb 27, 2020 · 1 comment
Open

Is there any gold standard variant calls for mitochondria? #92

mtorralvo opened this issue Feb 27, 2020 · 1 comment

Comments

@mtorralvo
Copy link

Hi,

I've been trying to compare some variant calling methods to use on mitochondria NGS data.
I've seen that you test your pipeline with HG00119 and I was wondering if you had curated a set of high confidence variants from this or another 1000Genomes's individuals. I have not been able to find anything similar anywhere.

Thanks in advance,
María

@clody23
Copy link
Member

clody23 commented Feb 27, 2020

Hi,

1000 Genomes consortium has also provided a phase 3 mtDNA variant call set (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/) you can use as comparison, although I am not aware of the quality control advised on that variant calling. We do not provide a curated set of MToolBox variants calls for 1000 Genomes at the moment, but we can definitely recommend to :

  • remove mtDNA calls with poor Phred-like quality score (e.g. < 20-30)
  • remove mtDNA variants showing strand bias, while keeping only variants supported by both strands
  • double check the alignment of variants occurring in homopolymeric stretches (also by visual inspection of the BAM alignment at IGV or similar tools) and remove suspicious ones due to alignment / sequencing errors
  • set a sensible heteroplasmy threshold cutoff, tuned on sequencing metrics (e.g. read depth)
  • perform haplogroup prediction and compare predictions generated with different tools, e.g. MToolBox and Haplogrep2, to make sure they match and flag samples with discordant predictions. Discordant predictions might be the result of poor quality of the sequencing or to low mtDNA coverage leading to few informative SNPs available for prediction. Samples with multiple MToolBox haplogroup predictions are usually also indicative of a poor sequencing quality or low coverage.

There are also couple of useful readings on this topic, e.g. :
https://www.ncbi.nlm.nih.gov/pubmed/30098421
https://www.ncbi.nlm.nih.gov/pubmed/20696290
https://www.ncbi.nlm.nih.gov/pubmed/25319266

Finally, feedback and suggestions on how to improve MToolBox variant calling quality control are very much appreciated.

best regards,
Claudia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants