INDEL realigner binary search conditional is flipped #1402

fnothaft · 2017-02-25T00:30:28Z

This is a bad one. See https://github.com/bigdatagenomics/adam/blame/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/read/realignment/RealignIndels.scala#L88-L93. However, what's weird with this is that it should indicate that the realigner is borked, and that isn't consistent with validation data, nor our instrumentation telemetry. I've started to pull in some additional debug infrastructure to capture what's going on at each target in order to suss this out.

heuermh · 2017-02-25T20:46:25Z

The conditional says if the read starts before the start of the indel realignment target, which is the head of the tail after splitting targets in half, then return the head else return the tail.

Seems right to me, unless the conditional in TargetOrdering.lt doesn't match its method scaladoc (target.readRange.compareTo(_) < 0).

fnothaft · 2017-02-25T22:04:56Z

Ah see, the trouble is, what you wrote is read < target. What TargetOrdering.lt(tail.head._1, read) implements is target < read.

fnothaft · 2017-02-25T22:07:06Z

And yeah, I think the method scaladoc is wrong.

fnothaft · 2017-02-25T23:30:18Z

Lo and behold, the TargetOrdering.lt line is missed, as is the tree split.

Resolves bigdatagenomics#1402.x

Resolves bigdatagenomics#1402.

Resolves bigdatagenomics#1402. Includes fixes to consensus generator and reference scorer. Improve INDEL realigner performance: * Exit early when realigning will not yield a better score. * Eliminate substring call in sweep over reference. * Change datastructures to be immutable wherever possible. * Add bound checking and other error checking. * Rewrite target association code to use array instead of set, and improve load balancing. * Delete high coverage targets with reduceByKey. Additionally: * Improve telemetry/logging to sort out load balance issue. * Support using reference file in INDEL realignment. * Log reads with negative alignment sizes. * Improved test coverage for insertion realignment. * Fix CIGARs on reads that partially overlap INDEL. * Soft clip reads that partially align to an insertion. * Eliminate non-determinism.

Resolves bigdatagenomics#1402. Includes fixes to consensus generator and reference scorer. Improve INDEL realigner performance: * Exit early when realigning will not yield a better score. * Eliminate substring call in sweep over reference. * Change datastructures to be immutable wherever possible. * Add bound checking and other error checking. * Rewrite target association code to use array instead of set, and improve load balancing. * Delete high coverage targets with reduceByKey. Additionally: * Improve telemetry/logging to sort out load balance issue. * Support using reference file in INDEL realignment. * Log reads with negative alignment sizes. * Improved test coverage for insertion realignment. * Fix CIGARs on reads that partially overlap INDEL. * Soft clip reads that partially align to an insertion. * Eliminate non-determinism. * Fixed reference file. * Serialization fixes and debug. * Fix bad score. * Clean up clipping code? * Unclip clipped reads.

Resolves #1402. Includes fixes to consensus generator and reference scorer. Improve INDEL realigner performance: * Exit early when realigning will not yield a better score. * Eliminate substring call in sweep over reference. * Change datastructures to be immutable wherever possible. * Add bound checking and other error checking. * Rewrite target association code to use array instead of set, and improve load balancing. * Delete high coverage targets with reduceByKey. Additionally: * Improve telemetry/logging to sort out load balance issue. * Support using reference file in INDEL realignment. * Log reads with negative alignment sizes. * Improved test coverage for insertion realignment. * Fix CIGARs on reads that partially overlap INDEL. * Soft clip reads that partially align to an insertion. * Eliminate non-determinism. * Fixed reference file. * Serialization fixes and debug. * Fix bad score. * Clean up clipping code? * Unclip clipped reads.

fnothaft added the bug label Feb 25, 2017

fnothaft self-assigned this Feb 25, 2017

fnothaft mentioned this issue Feb 25, 2017

Add coveralls #1403

Closed

fnothaft added a commit to fnothaft/adam that referenced this issue Feb 26, 2017

[ADAM-1402] Fix INDEL realigner bad binary search.

89ce9df

Resolves bigdatagenomics#1402.x

fnothaft mentioned this issue Feb 27, 2017

Add regression test suite #1407

Closed

9 tasks

fnothaft added a commit to fnothaft/adam that referenced this issue Feb 28, 2017

[ADAM-1402] Fix INDEL realigner bad binary search.

98bc349

Resolves bigdatagenomics#1402.x

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 2, 2017

[ADAM-1402] Fix INDEL realigner bad binary search.

b57c01e

Resolves bigdatagenomics#1402.x

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 2, 2017

[ADAM-1402] Fix INDEL realigner bad binary search.

0f160d6

Resolves bigdatagenomics#1402.

fnothaft mentioned this issue Mar 2, 2017

INDEL realigner cleanup #1412

Merged

fnothaft added this to the 0.22.0 milestone Mar 3, 2017

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 15, 2017

[ADAM-1402] Fix INDEL realigner bad binary search.

ac9fcf4

Resolves bigdatagenomics#1402.

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 23, 2017

[ADAM-1402] Fix INDEL realigner bad binary search.

f7c6578

Resolves bigdatagenomics#1402.

heuermh closed this as completed in #1412 Mar 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INDEL realigner binary search conditional is flipped #1402

INDEL realigner binary search conditional is flipped #1402

fnothaft commented Feb 25, 2017

heuermh commented Feb 25, 2017

fnothaft commented Feb 25, 2017

fnothaft commented Feb 25, 2017

fnothaft commented Feb 25, 2017

INDEL realigner binary search conditional is flipped #1402

INDEL realigner binary search conditional is flipped #1402

Comments

fnothaft commented Feb 25, 2017

heuermh commented Feb 25, 2017

fnothaft commented Feb 25, 2017

fnothaft commented Feb 25, 2017

fnothaft commented Feb 25, 2017