Added a new suite of tools for variant filtering based on site-level annotations. #7954

samuelklee · 2022-07-21T02:35:33Z

This adds the following tools, which supplant the VQSR workflow: ExtractVariantAnnotations, TrainVariantAnnotationsModel, and ScoreVariantAnnotations. See meta issue #7724.

codecov · 2022-07-21T03:59:16Z

Codecov Report

Merging #7954 (32ce261) into master (f1e7265) will decrease coverage by 0.030%.
The diff coverage is 83.582%.

❗ Current head 32ce261 differs from pull request most recent head 9de8b1c. Consider uploading reports for the commit 9de8b1c to get more accurate results

@@               Coverage Diff               @@
##              master     #7954       +/-   ##
===============================================
- Coverage     86.689%   86.659%   -0.030%     
- Complexity     38394     38771      +377     
===============================================
  Files           2308      2328       +20     
  Lines         180119    181659     +1540     
  Branches       19823     19946      +123     
===============================================
+ Hits          156143    157424     +1281     
- Misses         17036     17239      +203     
- Partials        6940      6996       +56

Impacted Files	Coverage Δ
...hellbender/tools/copynumber/CollectReadCounts.java	`85.484% <ø> (ø)`
...ools/copynumber/CreateReadCountPanelOfNormals.java	`89.831% <ø> (ø)`
...e/hellbender/tools/copynumber/utils/HDF5Utils.java	`79.787% <ø> (ø)`
...scalable/modeling/BGMMVariantAnnotationsModel.java	`0.000% <0.000%> (ø)`
...calable/modeling/BGMMVariantAnnotationsScorer.java	`0.000% <0.000%> (ø)`
...oadinstitute/hellbender/utils/NaturalLogUtils.java	`77.143% <0.000%> (ø)`
...ls/clustering/BayesianGaussianMixtureModeller.java	`0.000% <0.000%> (ø)`
.../tools/walkers/vqsr/scalable/data/VariantType.java	`60.000% <60.000%> (ø)`
.../walkers/vqsr/scalable/SystemCommandUtilsTest.java	`60.870% <60.870%> (ø)`
.../scalable/data/LabeledVariantAnnotationsDatum.java	`72.222% <72.222%> (ø)`
... and 20 more

* adding filtering wdl * renaming pipeline * addressing comments * added bash * renaming json * adding glob to extract for extra files * changing dollar signs * small comments

…nd environment.

…tions.

src/main/java/org/broadinstitute/hellbender/utils/MathUtils.java

samuelklee · 2022-07-28T13:59:27Z

I still need to finish up the tool-level Javadocs for the TrainVariantAnnotationsModel tool. But since I'll be off on vacation until the end of the week, I wanted to go ahead and open this up for review.

There's a lot here, but not too much of it is production code (<2k LOC). I've split things up into commits that should hopefully make it more easy to review. The first commit contains the WDL added in #7932 and has already been reviewed by me, although it may benefit from a second pass. The second commit updates that WDL to account for some changes I added after review.

There are TODOs scattered throughout the code, but some of them are intentionally left as an exercise for future developers. See the meta issue linked above to get an idea of what might be appropriate to leave to future work. Also note that tools are marked BETA, so there’s certainly room for improvement or changes!

There are also stubs throughout for the BGMM implementation, which will be added in a separate PR. Hopefully we can get some ML club reviewers then.

@meganshand @droazen @davidbenjamin mind taking a look or suggesting other reviewers? I would hope that we can get this in by the next release after the other flow-based methods are released, since the IsolationForest filtering method added here is also used in that pipeline. It would also be nice to get this merged by the next release to keep us on track on the malaria side.

build.gradle

…annotations.

samuelklee · 2022-07-28T14:23:31Z

Also, if the WDL-generation and tab-completion tests continue to fail, I’ll address it after I get back. I think this has something to do with non-ASCII characters in the Javadocs, but I thought I had gotten all of them…

meganshand

This looks great @samuelklee! I thought the layout of classes was very clear and the testing was thorough. I had a few questions and comments. Also, I noticed a few places where there were temp files to read and write data within the tool and I was surprised that was necessary.

scripts/vcf_site_level_filtering_wdl/JointVcfFiltering.wdl

...ava/org/broadinstitute/hellbender/tools/walkers/vqsr/scalable/ExtractVariantAnnotations.java

...institute/hellbender/tools/walkers/vqsr/scalable/ScoreVariantAnnotationsIntegrationTest.java

...tute/hellbender/tools/walkers/vqsr/scalable/TrainVariantAnnotationsModelIntegrationTest.java

...stitute/hellbender/tools/walkers/vqsr/scalable/ExtractVariantAnnotationsIntegrationTest.java

davidbenjamin

This looks really clean. The documentation is excellent.

src/main/java/org/broadinstitute/hellbender/tools/walkers/vqsr/scalable/data/VariantType.java

...g/broadinstitute/hellbender/tools/walkers/vqsr/scalable/LabeledVariantAnnotationsWalker.java

.../java/org/broadinstitute/hellbender/tools/walkers/vqsr/scalable/ScoreVariantAnnotations.java

.../org/broadinstitute/hellbender/tools/walkers/vqsr/scalable/TrainVariantAnnotationsModel.java

samuelklee · 2022-08-08T16:21:09Z

OK, thanks for the thorough reviews, @meganshand and @davidbenjamin! I think I've addressed everything or left them as TODOs; I'll break these out into issues (or at least add them to the meta issue) later today. I also added the docs for the training tool.

Apologies for the slight delay, I had to get my eyes dilated on Friday morning and was completely useless for the rest of the day!

…view comments.

meganshand

👍 Thanks @samuelklee! The added documentation for the training tool looks great.

…annotations. (#7954) * Adds wdl that tests joint VCF filtering tools (#7932) * adding filtering wdl * renaming pipeline * addressing comments * added bash * renaming json * adding glob to extract for extra files * changing dollar signs * small comments * Added changes for specifying model backend and other tweaks to WDLs and environment. * Added classes for representing a collection of labeled variant annotations. * Added interfaces for modeling and scoring backends. * Added a new suite of tools for variant filtering based on site-level annotations. * Added integration tests. * Added test resources and expected results. * Miscellaneous changes. * Removed non-ASCII characters. * Added documentation for TrainVariantAnnotationsModel and addressed review comments. Co-authored-by: meganshand <mshand@broadinstitute.org>

* Added a new suite of tools for variant filtering based on site-level annotations. (#7954) * Adds wdl that tests joint VCF filtering tools (#7932) * adding filtering wdl * renaming pipeline * addressing comments * added bash * renaming json * adding glob to extract for extra files * changing dollar signs * small comments * Added changes for specifying model backend and other tweaks to WDLs and environment. * Added classes for representing a collection of labeled variant annotations. * Added interfaces for modeling and scoring backends. * Added a new suite of tools for variant filtering based on site-level annotations. * Added integration tests. * Added test resources and expected results. * Miscellaneous changes. * Removed non-ASCII characters. * Added documentation for TrainVariantAnnotationsModel and addressed review comments. Co-authored-by: meganshand <mshand@broadinstitute.org> * Added toggle for selecting resource-matching strategies and miscellaneous minor fixes to new annotation-based filtering tools. (#8049) * Adding use_allele_specific_annotation arg and fixing task with empty input in JointVcfFiltering WDL (#8027) * Small changes to JointVCFFiltering WDL * making default for use_allele_specific_annotations * addressing comments * first stab * wire through WDL changes * fixed typo * set model_backend input value * add gatk_override to JointVcfFiltering call * typo in indel_annotations * make model_backend optional * tabs and spaces * make all model_backends optional * use gatk 4.3.0 * no point in changing the table names as this is a POC * adding new branch to dockstore * adding in branching logic for classic VQSR vs VQSR-Lite * implementing the separate schemas for the VQSR vs VQSR-Lite branches, including Java changes necessary to produce the different tsv files * passing classic flag to indel run of CreateFilteringFiles * Update GvsCreateFilterSet.wdl cleaning up verbiage * Removed mapping error rate from estimate of denoised copy ratios output by gCNV and updated sklearn. (#7261) * cleanup up sloppy comment --------- Co-authored-by: samuelklee <samuelklee@users.noreply.github.com> Co-authored-by: meganshand <mshand@broadinstitute.org> Co-authored-by: Rebecca Asch <rasch@broadinstitute.org>

samuelklee mentioned this pull request Jul 25, 2022

(DO NOT MERGE) Draft PR for reimplementation of annotation-based filtering tools. #7659

Draft

broadinstitute deleted a comment from gatk-bot Jul 28, 2022

Adds wdl that tests joint VCF filtering tools (#7932)

5b65be0

* adding filtering wdl * renaming pipeline * addressing comments * added bash * renaming json * adding glob to extract for extra files * changing dollar signs * small comments

samuelklee force-pushed the sl_sklearnvarianttrain_scalable branch 2 times, most recently from 1cbd55b to 3e00758 Compare July 28, 2022 13:44

samuelklee added 3 commits July 28, 2022 09:49

Added changes for specifying model backend and other tweaks to WDLs a…

85f0213

…nd environment.

Added classes for representing a collection of labeled variant annota…

da2fa17

…tions.

Added interfaces for modeling and scoring backends.

7bd4f27

samuelklee force-pushed the sl_sklearnvarianttrain_scalable branch from 3e00758 to 1e0da0e Compare July 28, 2022 13:49

samuelklee commented Jul 28, 2022

View reviewed changes

src/main/java/org/broadinstitute/hellbender/utils/MathUtils.java Show resolved Hide resolved

samuelklee commented Jul 28, 2022

View reviewed changes

build.gradle Show resolved Hide resolved

samuelklee added 4 commits July 28, 2022 10:04

Added a new suite of tools for variant filtering based on site-level …

0d53285

…annotations.

Added integration tests.

8921abc

Added test resources and expected results.

335c743

Miscellaneous changes.

122cb18

samuelklee force-pushed the sl_sklearnvarianttrain_scalable branch from 1e0da0e to 122cb18 Compare July 28, 2022 14:04

meganshand self-requested a review July 28, 2022 14:26

Removed non-ASCII characters.

c507f96

broadinstitute deleted a comment from gatk-bot Jul 28, 2022

meganshand reviewed Aug 1, 2022

View reviewed changes

davidbenjamin approved these changes Aug 1, 2022

View reviewed changes

broadinstitute deleted a comment from gatk-bot Aug 3, 2022

broadinstitute deleted a comment from gatk-bot Aug 5, 2022

samuelklee force-pushed the sl_sklearnvarianttrain_scalable branch from 32ce261 to b3d6fec Compare August 8, 2022 16:19

samuelklee force-pushed the sl_sklearnvarianttrain_scalable branch from b3d6fec to d853337 Compare August 8, 2022 16:23

Added documentation for TrainVariantAnnotationsModel and addressed re…

9de8b1c

…view comments.

samuelklee force-pushed the sl_sklearnvarianttrain_scalable branch from d853337 to 9de8b1c Compare August 9, 2022 15:31

meganshand approved these changes Aug 9, 2022

View reviewed changes

samuelklee mentioned this pull request Aug 9, 2022

New tools for annotation-based filtering. #7724

Open

samuelklee merged commit 05a7634 into master Aug 9, 2022

samuelklee deleted the sl_sklearnvarianttrain_scalable branch August 9, 2022 17:11

samuelklee mentioned this pull request Feb 2, 2023

Performed a round of ablation on new annotation-based filtering tools. #8131

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a new suite of tools for variant filtering based on site-level annotations. #7954

Added a new suite of tools for variant filtering based on site-level annotations. #7954

samuelklee commented Jul 21, 2022 •

edited

Loading

codecov bot commented Jul 21, 2022 •

edited

Loading

samuelklee commented Jul 28, 2022 •

edited

Loading

samuelklee commented Jul 28, 2022

meganshand left a comment

davidbenjamin left a comment

samuelklee commented Aug 8, 2022

meganshand left a comment

Added a new suite of tools for variant filtering based on site-level annotations. #7954

Added a new suite of tools for variant filtering based on site-level annotations. #7954

Conversation

samuelklee commented Jul 21, 2022 • edited Loading

codecov bot commented Jul 21, 2022 • edited Loading

Codecov Report

samuelklee commented Jul 28, 2022 • edited Loading

samuelklee commented Jul 28, 2022

meganshand left a comment

Choose a reason for hiding this comment

davidbenjamin left a comment

Choose a reason for hiding this comment

samuelklee commented Aug 8, 2022

meganshand left a comment

Choose a reason for hiding this comment

samuelklee commented Jul 21, 2022 •

edited

Loading

codecov bot commented Jul 21, 2022 •

edited

Loading

samuelklee commented Jul 28, 2022 •

edited

Loading