-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed contamination calculation, added error bars to output #3385
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3385 +/- ##
===============================================
- Coverage 80.475% 80.468% -0.007%
- Complexity 17507 17508 +1
===============================================
Files 1173 1173
Lines 63376 63384 +8
Branches 9878 9878
===============================================
+ Hits 51002 51004 +2
- Misses 8426 8432 +6
Partials 3948 3948
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple comments, no need to pass it back to me.
final double proportionRefReads = (double) contaminationRefCount / totalReadCount; | ||
final double proportionOfContaminatingReadsThatAreRef = homAltSites.stream().mapToDouble(PileupSummary::getRefFrequency).average().getAsDouble(); | ||
final double contamination = proportionRefReads / proportionOfContaminatingReadsThatAreRef; | ||
final double totalDepthWeightedByRefFraction = homAltSites.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totalDepthWeightedByRefFrequency
instead of RefFraction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh fraction is probably just as good. You use fraction in the docs. Your call
"Therefore, we estimate a contamination of %.3f", proportionRefReads, proportionOfContaminatingReadsThatAreRef, contamination)); | ||
|
||
return contamination; | ||
logger.info(String.format("Based on population data, we would expect %d reference reads in a contaminant with the same depths at these sites.", (long) totalDepthWeightedByRefFraction)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as* these sites? Alternatively, maybe we can say that totalDepthWeightedByRefFraction
is the expected number of ref reads if we had 100% contamination
@takutosato This corrects the math error you pointed out in your code review of the docs. While we're at it, it also outputs the error bars according to the formula given in the docs. I have tested it on an in silico contamination series and it improves results slightly.