Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint of necessary changes for DRAGEN v3.7.8 concordance #8083

Merged
merged 9 commits into from
Mar 21, 2023

Conversation

jamesemery
Copy link
Collaborator

This code is fairly concordant now with a new meta-argument added for enabling the dragen code. However there are still 2 algorithms that we know are necessary for final concordance.

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 3365045711
Failures in the following jobs:

Test Type JDK Job ID Logs
variantcalling 8 3365045711.2 logs

Comment on lines 150 to 160
@Argument(fullName= PILEUP_DETECTION_FILTER_ASSEMBLY_HAPS_THRESHOLD, doc = "If enabled (set to non-zero), will apply the \"badness\" filter to compatable assembled haplotypes.", optional = true)
public double assemblyBadReadThreshold = 0.4;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment to self. This default is dangerous, almost certainly there should be an explicit OPT-IN to start filtering assembly haplotypes when in the normal pileupcaller mode under any circumstances!

final AssemblyResultSet assemblyResultSet, final List<VariantContext> pileupAllelesFoundShouldFilter,
final List<VariantContext> pileupAllelesPassingFilters, final boolean debug) {
List<Haplotype> haplotypesWithFilterAlleles = new ArrayList<>();
if (!pileupAllelesFoundShouldFilter.isEmpty() && !argumentCollection.pileupDetectionArgs.generatePDHaplotypes) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is messy and dangerous, There should be an explicit opt-in and probably a user warning printed out if we are actually throwing away haplotypes here... The non-pdhmm version of this code should really be considered an off-label approach and it really wasn't extensively tested except as a stepping stone to the full PDHmm implementation so maybe we should re-evaluate if we want this at all..

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery Back to you with my comments

// .filter(v -> PileupBasedAlleles.passesFilters(argumentCollection.pileupDetectionArgs, v))
// .collect(Collectors.toList());
//
// applyPileupEventsAsForcedAlleles(region, argumentCollection, aligner, refHaplotype, assemblyResultSet, pileupAllelesFoundShouldFilter, pileupAllelesPassingFilters);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed the forcedPileupAlleles arg, but the call into what I assume is your replacement for this functionality (applyPileupEventsAsForcedAlleles()) is commented out. Should it be uncommented? Or is this replaced by the pdhmm?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh this looks worse than it is....

The old GGA mode and the PileupCallerAllele modes were located here (after assembly) but at some point @davidbenjamin moved the GGA allele application to after assembly trimming because it was causing edge case problems. At the time I elected to leave the pileup calling application where it was but it was causing me too many headaches here so I moved it to the same place as the GGA code. The new home for this call is here: HaplotypeCaller.java 739. It is however not in Mutect...

* @param pileupAllelesPassingFilters
* @return
*/
public static AssemblyResultSet applyPileupEventsAsForcedAlleles(final AssemblyRegion region, final AssemblyBasedCallerArgumentCollection argumentCollection,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method have a unit test? Or is there any code path covered by integration tests that exercises it?

If it's unused in the pdhmm codepath, can we just remove it? Or do we actually need it for some reason?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its captured by some HaplotypeCallerIntegrationTest consistent-with-past-results tests... Its the old method for pileupcalling that Bhanu added for the bacterial pipeline. I leave it in for legacy reasons as its a reasonable alternative approach to the PDHMM dependent one that is more usable by other users than DRAGEN-GATK.

// we resort to a messy approach where we filter alleles by throwing away every haplotype supporting an allele. This is
// very dangerous since this could easily destroy phased variants with the haplotype.
if (!pileupAllelesFoundShouldFilter.isEmpty() && !argumentCollection.pileupDetectionArgs.generatePDHaplotypes) {
// TODO this is a bad algorithm for bad people
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is it faster than the Java version of the pdhmm? Does it do effectively the same thing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its slow, faster than PDHMM by a longshot though because currently the JavaPDHMM is veeery slow

// for (VariantContext v : variants) {
// VariantContext actualVC = resultEMap.get(v.getStart());
// Assert.assertNotNull(actualVC);
// Assert.assertEquals(actualVC.getAlleles(), v.getAlleles());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these assertions commented out because they fail? Is this test still valid?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this test failed because the tests i wrote above have alternative representations that realignment was causing the event map to be mismatching on... I've rewritten it to at least somewhat account for this fact... not perfect but alright and better than not testing the event set at all

}
//TODO TESTS TO MAKE:
// ASSSERT IT FAILS IF STARTS BEFORE OR AFTER
// ASSERT it gives a good exception if the order is incorrect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe now is the time to add these tests?


//This test is here for the sake of them being related operations
//Test some real cases where we were seeing failures:
// (62,Rlen=1,[C])->(82,Rlen=1,[C])->(84,Rlen=13,[C])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a little difficult to understand...

@@ -1978,6 +1978,120 @@ public void testPileupCallingDRAGENModeConsistentWithPastResults(final String in
}
}

@Test(dataProvider="HaplotypeCallerTestInputs")
public void testPileupCallingDRAGEN378ModeConsistentWithPastResults(final String inputFileName, final String referenceFileName) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to testing for consistency with past results, could we have a test that checks for concordance vs. DRAGEN output (within some tolerance)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to punt on this for now as its going to take some time to regenerate those and I would like to hand this off soon.

@jamesemery jamesemery force-pushed the je_DRAGEN3.7.8ConcordanceReleaseCandidate branch from a70603e to 1fec7b8 Compare March 17, 2023 21:42
@jamesemery jamesemery merged commit 9afa998 into master Mar 21, 2023
@jamesemery jamesemery deleted the je_DRAGEN3.7.8ConcordanceReleaseCandidate branch March 21, 2023 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants