-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added numerical-stability tests and updated test data for all ModelSegments single-sample and multiple-sample modes. #7652
Conversation
f6f44a4
to
532533a
Compare
Hmm, tests are failing on Travis. To be precise, the allelic-count-only multisample segmentation test is failing---but only in Java 11 (it passes in Java 8). "Write once, run anywhere" indeed! This is one reason why I’m not a fan of exact-match numerical tests… (Another is that they often impede future development by defining behavior too rigidly or adding maintenance costs.) Let me look into it and get back to you. |
OK, relaxed the exact match to a delta of 1E-6 (chosen because doubles are formatted in somatic CNV outputs as Another interesting note: I tried to clean up the offending use of log10factorial in AlleleFractionLikelihoods, but this introduced numerical differences at the ~1E-3 level. I think all of the round tripping between log and log10 actually adds up. Some digging revealed that this was introduced way back in gatk-protected in broadinstitute/gatk-protected@aeec297#diff-34bd76cb2a416a212e25cbfb11298207265fb9cced775918aefcdb6b91ebc247. Despite the fact that we could easily replace the use of log10factorial with a private logGamma cache, at this point I think it makes more sense to freeze the current behavior. But if similar numerical changes are introduced to ModelSegments in the future, then it might make sense to clean this up at that point as well. Anyway, changed the title of the PR to reflect this update. Should be ready to go! |
aef0006
to
ade55a9
Compare
820daf1
to
fc02f7a
Compare
@tmelman I almost forgot about this PR. Any chance we could get it in before it gets too stale? Thanks! |
Reminder to myself: I’ll add a flag to overwrite the expected results, what with the possible numerical changes in #6351 linked above. Should be quick. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, as long as the tests pass consistently
…o zero when running ModelSegments in matched-normal mode.
…ple-sample modes, including new tests of downstream single-sample scatters using joint segmentation.
…unt for Java 8 and Java 11 differences on Travis.
…lleleFractionLikelihoods.
fc02f7a
to
6590c26
Compare
@asmirnov239 I've borrowed the CopyNumberTestUtils class from #7889 into which you moved the method for detecting deltas in the doubles. I'm going to merge this PR once tests pass, so just be aware of this when rebasing your branch if you make any further changes. We might consider adding a simple test of the test method itself. I'll let you do it in your branch, or we can file an issue and tackle it after everything is merged. |
Codecov Report
@@ Coverage Diff @@
## master #7652 +/- ##
===============================================
+ Coverage 86.936% 86.998% +0.062%
- Complexity 36945 36970 +25
===============================================
Files 2221 2221
Lines 173803 173911 +108
Branches 18775 18790 +15
===============================================
+ Hits 151097 151299 +202
+ Misses 16075 15977 -98
- Partials 6631 6635 +4
|
In light of the discovery of the (relatively minor) numerical differences caused by changes to non-CNV code outlined in #7649, and because we are still awaiting coverage from pipeline-level/CARROT testing, I decided to go ahead and add these exact-match tests. This essentially freezes current ModelSegments behavior, which has been exactly stable since #5814; that is, from sometime between 4.1.0.0/4.1.1.0 almost 3 years ago up to 4.2.4.1 today.
Note that the original test files were generated from the test BAMs (e.g., src/test/resources/large/cnv_somatic_workflows_test_files/HCC1143-t1-chr20-downsampled.deduplicated.bam), since these BAMs have been used in the past to consistently generate test files for other tools in the ModelSegments and GermlineCNVCaller pipelines. However, these original test files contained insufficient data to activate the changes found in #7649, even had exact-match tests been present.
I thus took some old HCC1143T 100% WES data that I had and snippeted it to chr20. I've confirmed that the added tests with these files would've picked up the regression of log10factorial seen in #7649 for all relevant modes (i.e., all those that take in the allele counts as input, since that regression only affected allele-fraction MCMC sampling).
Tests take maybe an additional minute to run and there was about ~12MB of additional large resources checked in, but I didn't try too hard to bring either down.
I also added some early-fail parameter validation to check that the minimum total allele count in the case sample is zero in matched-normal mode. There are actually some open questions in my mind as to what the best behavior should be here, but given some of the discussion in #6499 and possible plans for using joint segmentation to do filtering of germline events, I think it's best to enforce that all het sites coming out of the genotyping step are the same across all samples.
Recall that we added this parameter in #5556 because some users were running matched normals with much lower depth than their cases. This meant that many normal sites fell below the default threshold of 30 counts and were thus not pulled from the case, even though the latter had much higher depth.
It's conceivable that there will be some use cases for which we might want to relax this and allow a non-zero case threshold; for example, if the case is low depth and there's significant noise in the allele fractions, which would affect the segmentation. But for now, I just added a suggestion that the allele counts be dropped entirely, in that case. Hopefully, for typical usage, case samples will always be much higher depth.
@tmelman hope you don't mind reviewing, as we discussed. There are a lot of new lines here, but mostly from test data (there are only ~150 non-test LOC added) and I've split up the work into separate commits, which will hopefully be easier to review. Happy to answer any questions!