-
-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix mattes metric with sse instructions #1762
BUG: Fix mattes metric with sse instructions #1762
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, some in-line remarks.
Modules/Registration/Metricsv4/include/itkMattesMutualInformationImageToImageMetricv4.hxx
Show resolved
Hide resolved
Modules/Registration/Metricsv4/include/itkMattesMutualInformationImageToImageMetricv4.hxx
Outdated
Show resolved
Hide resolved
When the pointer to the local variable persists outside of the local function. When the local variable goes out of scope the mutable class variable should be set to nullptr.
Pre-compute commmon logFixedImagePDFValue outside of inner loop.
Multiplication is more efficient than division, so use multiply by reciprical to make 1 division.
Replace 'if(! condition) {continue;}' with 'if( condition ) { large block of code }' to remove early loop termination.
Use alias variable names that more clearly document what the intent of the workspace ivars that are re-used memory this stage of the processing. The Per-thread arrays of array accumulators have been reduced into thread 0 at this point, and will be normalized into PDF values by dividing by the total mass of the joint accumulator PDF. Also move exception checking for pre-existing states to the top of the function call ( there is no need to do computations before checking for pre-state exception cases).
Simplify computations avoiding duplicate work to compute same value for totalMassOfPDF that is already available from this->m_JointPDFSum. This also allows for simplifying loops and the use of std::for_each transforms Avoid unnecessary indexing by looping over vectors with range-based loops.
A dummy value is used in the case where the log computation is not used and would otherwise be invalid so that the log() can be precomputed outside the looping structure.
f9347a4
to
fc63ed3
Compare
NOTE: ANTS and BRAINSTools (both use this filter extensively) pass all regression tests. ➜ ANTs-RelWithDebInfo-build ctest 100% tests passed, 0 tests failed out of 130 Total Test time (real) = 344.88 sec 100% tests passed, 0 tests failed out of 87 |
When either the fixedMarginalPDF or movingMarginalPDF (which are sums of rows or columns of the 2D joint histogram) then the intersection point must also be zero. Both the contribution to the final values of such a point is no change, so one can avoid much testing and subsequent inner conditional processing in such cases.
This test started failing due to very small numerical accumulation changes. Other packages (i.e. ANTs & BRAINSTools) registration tools that are more complete versions of this example all pass their more extensive testing.
6294dca
to
aa53465
Compare
Preferably wait until Monday noon to give other people time to chime in. |
This may be a compiler bug with clang and using SSE instructions with floating point exceptions, but cleaning up the code avoids the bug altogether.
Inspection of the assembly code indicated that the loop was being unrolled, and SSE instructions used to do division. The second division (which was never used due to a "continue" statement in the loop when the second denominator was zero) caused a floating point exception to be thrown on the otherwise unused value.
The code review subsequently identified several places that could be simplified, increase the likelihood of using parallelization or vectorization, and removed redundant code.
PR Checklist