Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✏️ added docs for benchmarking #188

Merged
merged 5 commits into from
Jun 28, 2024
Merged

✏️ added docs for benchmarking #188

merged 5 commits into from
Jun 28, 2024

Conversation

migbro
Copy link
Collaborator

@migbro migbro commented Jun 27, 2024

Description

To close out https://d3b.atlassian.net/browse/BIXU-2124, adding a README to the somatic repo that goes over our benchmarking process and results so that they are publicly available.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Test A
  • Test B

Test Configuration:

  • Environment:
  • Test files:

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings
  • I have committed any related changes to the PR

@migbro migbro added documentation Adds documentation bix-dev This issue or pull request is bix-dev work labels Jun 27, 2024
@migbro migbro self-assigned this Jun 27, 2024
docs/SOMATIC_SNV_BENCHMARK.md Outdated Show resolved Hide resolved
docs/SOMATIC_SNV_BENCHMARK.md Outdated Show resolved Hide resolved
A helpful [YouTube video](https://www.youtube.com/watch?v=pDsEo0xdHWA&t=1s) also exists to explain their methodology.
## Comparison to "Gold Standard" Dataset
BAM files relevant to our workflow (BWA-aligned) were called using our standard [soamtic workflow](https://github.com/kids-first/kf-somatic-workflow/releases/tag/v4.4.2) followed up by our [consensus caller](https://github.com/kids-first/kf-somatic-workflow/blob/v4.3.5/workflow/kfdrc_consensus_calling.cwl). Synthetic BAM files did need to have read groups corrected as `SM` identifier was left to match the normal sample that was used for spike-in instead of modifying it for a proper tumor-normal comparison. Gold standard VCFs provided by the authors all all samples aggregated. To make 1:1 comparisons, we did the following:
1. Gold standard VCFs were downloaded from https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/ with [SNV](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/high-confidence_sSNV_in_HC_regions_v1.2.vcf.gz) and INDEL VCFs merged into a single VCF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Gold standard VCFs were downloaded from https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/ with [SNV](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/high-confidence_sSNV_in_HC_regions_v1.2.vcf.gz) and INDEL VCFs merged into a single VCF
1. Gold standard VCFs were downloaded from https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/ with [SNV](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/high-confidence_sSNV_in_HC_regions_v1.2.vcf.gz) and [INDEL](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/v1.2/high-confidence_sINDEL_in_HC_regions_v1.2.vcf.gz) VCFs merged into a single VCF

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, did we use the high confidence VCFs or did we just use the raw superset for these comparisons?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to say High Conf VCFs because those are the links I had here: https://www.notion.so/d3b/SNV-Benchmarking-Project-352f4bb803d04498858cf1a6403c33c8?pvs=4#bb64dfe23cbe4b9d9023bda0a21c9e87 and I think when I wrote this doc, I was much closer to the work being done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but looking at the comparisons, we have vs superset (high + med + low + unclassified). I don't think there are med, low, and unclassified variants in the high confidence VCF.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. Perhaps started off with high conf, but then bounced around in order to do some extra analyses. High conf file does have the medium in it:

PASS;HighConf
PASS;MedConf

but yeah, the low ones are not. I can update the link

docs/SOMATIC_SNV_BENCHMARK.md Outdated Show resolved Hide resolved
migbro and others added 2 commits June 28, 2024 10:52
Co-authored-by: Dan Miller <dmiller15@users.noreply.github.com>
migbro and others added 2 commits June 28, 2024 11:10
Co-authored-by: Dan Miller <dmiller15@users.noreply.github.com>
@migbro migbro requested a review from dmiller15 June 28, 2024 15:11
@migbro migbro merged commit 1203884 into master Jun 28, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bix-dev This issue or pull request is bix-dev work documentation Adds documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants