Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about insertions benchmarking #7

Open
MaestSi opened this issue Dec 11, 2018 · 3 comments
Open

Questions about insertions benchmarking #7

MaestSi opened this issue Dec 11, 2018 · 3 comments

Comments

@MaestSi
Copy link

MaestSi commented Dec 11, 2018

Dear all,
I would like to use preliminary reference SV calls from GIAB to benchmark some SV calling tools.
For NA24385 sample I am planning to use GIAB VCF file ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz, and maybe I could perform comparisons with some software predictions using Truvari.
While, for NA12878 sample I have found GIAB bed files ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/technical/svclassify_Manuscript/Supplementary_Information/Person
alis_1000_Genomes_deduplicated_deletions.bed
and ftp://ftp-trace.ncbi.nih.nlm.gov/giab/ftp/technical/svclassif
y_Manuscript/Supplementary_Information/Spiral_Genetics_insertions.bed
.
Since I have no reference VCF file, I think you should use bedtools intersect to make comparisons with my predictions.
My questions are:

  • Since the Spiral_Genetics_insertions.bed file contains only complex insertions, are duplications available?
  • In case of an insertion, does the third column of the file represent the starting coordinate + the insertion length? That is my guess since, usually, VCF files represent insertions with starting coordinate equal to ending coordinate.
  • Since in the NA24385 VCF file all variants have SVTYPE equal to DEL or INS, do INS variants include also duplications, or are these only complex insertions?

Thank you very much,
Simone

@jzook
Copy link

jzook commented Dec 11, 2018

In general, I would recommend only using NA24385 as a benchmark for SV calls at this time, since the NA12878 data is a few years old. For NA24385 you should use both the Tier 1 vcf and Tier 1 bed file with truvari to get FPs and FNs, as suggested in the README here:
ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/

@MaestSi
Copy link
Author

MaestSi commented Dec 11, 2018

Ok, so I can use VCF file in NIST_SVs_Integration_v0.6, even though Truvari documentation says that "This currently only works with GIAB SV v0.5", right?
Thank you very much for the kind information.

@jzook
Copy link

jzook commented Dec 11, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants