-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INDEL realigner binary search conditional is flipped #1402
Comments
The conditional says if the read starts before the start of the indel realignment target, which is the head of the tail after splitting targets in half, then return the head else return the tail. Seems right to me, unless the conditional in |
Ah see, the trouble is, what you wrote is |
And yeah, I think the method scaladoc is wrong. |
Lo and behold, the |
Resolves bigdatagenomics#1402. Includes fixes to consensus generator and reference scorer. Improve INDEL realigner performance: * Exit early when realigning will not yield a better score. * Eliminate substring call in sweep over reference. * Change datastructures to be immutable wherever possible. * Add bound checking and other error checking. * Rewrite target association code to use array instead of set, and improve load balancing. * Delete high coverage targets with reduceByKey. Additionally: * Improve telemetry/logging to sort out load balance issue. * Support using reference file in INDEL realignment. * Log reads with negative alignment sizes. * Improved test coverage for insertion realignment. * Fix CIGARs on reads that partially overlap INDEL. * Soft clip reads that partially align to an insertion. * Eliminate non-determinism.
Resolves bigdatagenomics#1402. Includes fixes to consensus generator and reference scorer. Improve INDEL realigner performance: * Exit early when realigning will not yield a better score. * Eliminate substring call in sweep over reference. * Change datastructures to be immutable wherever possible. * Add bound checking and other error checking. * Rewrite target association code to use array instead of set, and improve load balancing. * Delete high coverage targets with reduceByKey. Additionally: * Improve telemetry/logging to sort out load balance issue. * Support using reference file in INDEL realignment. * Log reads with negative alignment sizes. * Improved test coverage for insertion realignment. * Fix CIGARs on reads that partially overlap INDEL. * Soft clip reads that partially align to an insertion. * Eliminate non-determinism.
Resolves bigdatagenomics#1402. Includes fixes to consensus generator and reference scorer. Improve INDEL realigner performance: * Exit early when realigning will not yield a better score. * Eliminate substring call in sweep over reference. * Change datastructures to be immutable wherever possible. * Add bound checking and other error checking. * Rewrite target association code to use array instead of set, and improve load balancing. * Delete high coverage targets with reduceByKey. Additionally: * Improve telemetry/logging to sort out load balance issue. * Support using reference file in INDEL realignment. * Log reads with negative alignment sizes. * Improved test coverage for insertion realignment. * Fix CIGARs on reads that partially overlap INDEL. * Soft clip reads that partially align to an insertion. * Eliminate non-determinism. * Fixed reference file. * Serialization fixes and debug. * Fix bad score. * Clean up clipping code? * Unclip clipped reads.
Resolves #1402. Includes fixes to consensus generator and reference scorer. Improve INDEL realigner performance: * Exit early when realigning will not yield a better score. * Eliminate substring call in sweep over reference. * Change datastructures to be immutable wherever possible. * Add bound checking and other error checking. * Rewrite target association code to use array instead of set, and improve load balancing. * Delete high coverage targets with reduceByKey. Additionally: * Improve telemetry/logging to sort out load balance issue. * Support using reference file in INDEL realignment. * Log reads with negative alignment sizes. * Improved test coverage for insertion realignment. * Fix CIGARs on reads that partially overlap INDEL. * Soft clip reads that partially align to an insertion. * Eliminate non-determinism. * Fixed reference file. * Serialization fixes and debug. * Fix bad score. * Clean up clipping code? * Unclip clipped reads.
This is a bad one. See https://github.com/bigdatagenomics/adam/blame/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/read/realignment/RealignIndels.scala#L88-L93. However, what's weird with this is that it should indicate that the realigner is borked, and that isn't consistent with validation data, nor our instrumentation telemetry. I've started to pull in some additional debug infrastructure to capture what's going on at each target in order to suss this out.
The text was updated successfully, but these errors were encountered: