-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✏️ added docs for benchmarking #188
Conversation
docs/SOMATIC_SNV_BENCHMARK.md
Outdated
A helpful [YouTube video](https://www.youtube.com/watch?v=pDsEo0xdHWA&t=1s) also exists to explain their methodology. | ||
## Comparison to "Gold Standard" Dataset | ||
BAM files relevant to our workflow (BWA-aligned) were called using our standard [soamtic workflow](https://github.com/kids-first/kf-somatic-workflow/releases/tag/v4.4.2) followed up by our [consensus caller](https://github.com/kids-first/kf-somatic-workflow/blob/v4.3.5/workflow/kfdrc_consensus_calling.cwl). Synthetic BAM files did need to have read groups corrected as `SM` identifier was left to match the normal sample that was used for spike-in instead of modifying it for a proper tumor-normal comparison. Gold standard VCFs provided by the authors all all samples aggregated. To make 1:1 comparisons, we did the following: | ||
1. Gold standard VCFs were downloaded from https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/ with [SNV](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/high-confidence_sSNV_in_HC_regions_v1.2.vcf.gz) and INDEL VCFs merged into a single VCF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Gold standard VCFs were downloaded from https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/ with [SNV](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/high-confidence_sSNV_in_HC_regions_v1.2.vcf.gz) and INDEL VCFs merged into a single VCF | |
1. Gold standard VCFs were downloaded from https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/ with [SNV](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/high-confidence_sSNV_in_HC_regions_v1.2.vcf.gz) and [INDEL](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/high-confidence_sINDEL_in_HC_regions_v1.2.vcf.gz) VCFs merged into a single VCF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, did we use the high confidence VCFs or did we just use the raw superset for these comparisons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to say High Conf VCFs because those are the links I had here: https://www.notion.so/d3b/SNV-Benchmarking-Project-352f4bb803d04498858cf1a6403c33c8?pvs=4#bb64dfe23cbe4b9d9023bda0a21c9e87 and I think when I wrote this doc, I was much closer to the work being done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but looking at the comparisons, we have vs superset (high + med + low + unclassified)
. I don't think there are med, low, and unclassified variants in the high confidence VCF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. Perhaps started off with high conf, but then bounced around in order to do some extra analyses. High conf file does have the medium in it:
PASS;HighConf
PASS;MedConf
but yeah, the low ones are not. I can update the link
Co-authored-by: Dan Miller <dmiller15@users.noreply.github.com>
Co-authored-by: Dan Miller <dmiller15@users.noreply.github.com>
Description
To close out https://d3b.atlassian.net/browse/BIXU-2124, adding a README to the somatic repo that goes over our benchmarking process and results so that they are publicly available.
Fixes # (issue)
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Test Configuration:
Checklist: