Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed contamination calculation, added error bars to output #3385

Merged
merged 2 commits into from
Jul 31, 2017

Conversation

davidbenjamin
Copy link
Contributor

@takutosato This corrects the math error you pointed out in your code review of the docs. While we're at it, it also outputs the error bars according to the formula given in the docs. I have tested it on an in silico contamination series and it improves results slightly.

@davidbenjamin davidbenjamin added this to the Popularize Mutect 2 at the Broad milestone Jul 31, 2017
@codecov-io
Copy link

codecov-io commented Jul 31, 2017

Codecov Report

Merging #3385 into master will decrease coverage by 0.007%.
The diff coverage is 95.455%.

@@               Coverage Diff               @@
##              master     #3385       +/-   ##
===============================================
- Coverage     80.475%   80.468%   -0.007%     
- Complexity     17507     17508        +1     
===============================================
  Files           1173      1173               
  Lines          63376     63384        +8     
  Branches        9878      9878               
===============================================
+ Hits           51002     51004        +2     
- Misses          8426      8432        +6     
  Partials        3948      3948
Impacted Files Coverage Δ Complexity Δ
...ols/walkers/contamination/ContaminationRecord.java 91.111% <100%> (+1.111%) 6 <3> (+1) ⬆️
.../walkers/contamination/CalculateContamination.java 91.379% <90.909%> (+0.47%) 20 <1> (+1) ⬆️
...e/hellbender/engine/spark/SparkContextFactory.java 66.667% <0%> (-3.333%) 10% <0%> (ø)
...oadinstitute/hellbender/utils/gcs/BucketUtils.java 72.368% <0%> (-1.974%) 36% <0%> (ø)
...der/tools/walkers/contamination/PileupSummary.java 91.176% <0%> (-1.471%) 13% <0%> (-1%)

Copy link
Contributor

@takutosato takutosato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments, no need to pass it back to me.

final double proportionRefReads = (double) contaminationRefCount / totalReadCount;
final double proportionOfContaminatingReadsThatAreRef = homAltSites.stream().mapToDouble(PileupSummary::getRefFrequency).average().getAsDouble();
final double contamination = proportionRefReads / proportionOfContaminatingReadsThatAreRef;
final double totalDepthWeightedByRefFraction = homAltSites.stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totalDepthWeightedByRefFrequency instead of RefFraction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh fraction is probably just as good. You use fraction in the docs. Your call

"Therefore, we estimate a contamination of %.3f", proportionRefReads, proportionOfContaminatingReadsThatAreRef, contamination));

return contamination;
logger.info(String.format("Based on population data, we would expect %d reference reads in a contaminant with the same depths at these sites.", (long) totalDepthWeightedByRefFraction));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as* these sites? Alternatively, maybe we can say that totalDepthWeightedByRefFraction is the expected number of ref reads if we had 100% contamination

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants