Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed difference between binaries and python wrappers #703

Closed
benoitberanger opened this issue Sep 3, 2024 · 14 comments · Fixed by #705
Closed

Speed difference between binaries and python wrappers #703

benoitberanger opened this issue Sep 3, 2024 · 14 comments · Fixed by #705

Comments

@benoitberanger
Copy link

Describe the bug
There is a massive difference of computation time between Terminal CLI and the Python wrappers for N4BiasFieldCorrection and DenoiseImage

From the terminal :
N4BiasFieldCorrection : 22s
DenoiseImage : 76s

From Python:
N4BiasFieldCorrection : 33s
DenoiseImage : 602s

To reproduce
I use a classic 3DT2 in 0.8mm iso:

************************************************
Image name:          "in.nii"
************************************************
  Dimensions:        256 x 320 x 320
  Voxel size:        0.8 x 0.8 x 0.8
  Data strides:      [ 1 2 3 ]
  Format:            NIfTI-1.1
  Data type:         signed 16 bit integer (little endian)
  Intensity scaling: offset = 0, multiplier = 1
  Transform:                    1           0           0      -106.6
                                0           1           0       -93.8
                                0           0           1        -136
  comments:          TE=4.2e+02;Time=164035.610;phase=1

From the terminal :

cd /tmp/test_antspy
N4BiasFieldCorrection -i in.nii -o n4_in.nii -v 1
DenoiseImage -i n4_in.nii -o dn_n4_in.nii -v 1 -n Rician

From Python :

import ants

img = ants.image_read('/tmp/test_antspy/in.nii')
img = ants.n4_bias_field_correction(img, verbose=True)
img = ants.denoise_image(img, v=1)
ants.image_write(img, '/tmp/test_antspy/out.nii')

Expected behavior
Since I built from source ANTs and ANTsPy, I would expect roughly the same computation.
x1.5 N4BiasFieldCorrection is unexpected but ok, however x8 for DenoiseImage is very weird.

ANTsPy installation (please complete the following information):

  • Hardware [ PC ] i9-11900K is 8 cores (x2 threads) CPU
  • OS: [ Linux 4.15.0-20-generic x86_64 // Linux Mint 19 ]
  • System details [ None ]
  • Sub-system: [ Built from source // commit 751fec7 ]
  • ANTsPy version: [ 0.5.4 ]
  • Installation type: [ git clone + built from source with python -m pip install . ]

Additional context
When running both tests, I can see in htop that all 16 CPUs are running at 100%, with both Terminal CLI and Python wrappers. So it's not an obvious multi-threading problem.

@cookpa
Copy link
Member

cookpa commented Sep 3, 2024

For DenoiseImage, it's a difference in defaults, the search radius is 2 in the CLI program and 3 in ANTsPy. If I call ants.denoise_image(img, r=2, v=1), the difference in performance is similar to that for N4.

@ntustison @stnava shall we harmonize defaults, which one to adopt? I'll go with faster (2) unless you have a preference to make antspy the standard.

@ntustison
Copy link
Member

Yeah, I'm all for harmonizing the defaults and I'd go with what's in the original ANTs DenoiseImage.

@gdevenyi
Copy link

gdevenyi commented Sep 3, 2024

For denoiseimage in particular, I would hope you could normalize with the OG implementation minc_anlm from minc-toolkit-v2

@cookpa
Copy link
Member

cookpa commented Sep 3, 2024

Is there a usage with defaults you could paste here?

@ntustison
Copy link
Member

DenoiseImage was ported from Jose's original Matlab code. Minc code was not referenced at all.

@gdevenyi
Copy link

gdevenyi commented Sep 3, 2024

Well minc_anlm was written by/with Jose, given the citation has L Collins of the MNI as one of the senior authors 👍🏻

$ minc_anlm
This program implements adaptative non-local denoising algorithm published in
Jose V. Manjon, Pierrick Coupe, Luis Marti-Bonmati, D. Louis Collins, Montserrat Robles "Adaptive non-local means denoising of MR images with spatially varying noise levels" Journal of Magnetic Resonance Imaging Volume 31, Issue 1, pages 192–203, January 2010
 DOI: 10.1002/jmri.22003

@cookpa
Copy link
Member

cookpa commented Sep 3, 2024

I profiled n4_bias_correction with the line_profiler / kernprof, the library function execution accounts for 99.4% of the execution time. So there's not a lot of work happening at the wrapper level.

@ntustison
Copy link
Member

Yes, I realize that. But you were referring to a specific implementation in the context of defaults and that's why I clarified that it was Jose's original Matlab code.

@ntustison
Copy link
Member

Just a bit more historical context---I happened to be invited by an MNI-adjacent friend for a get-together during MICCAI 2013 in Nagoya, Japan. Fortunately, I sat right across the table from Jose and, after discussing common interests (such our enjoyment of Luis Miguel), he realized I was "one of the ANTs guys" and he asked me if I would like to put his denoising algorithm in ANTs. I said sure and he pointed me to his Matlab code which I eventually ported to ITK-style. After the FreeSurfer folk began using the implementation in their pipeline a couple years ago, I asked Jose about the possibility of making it an ITK module and he was all for it.

@cookpa
Copy link
Member

cookpa commented Sep 4, 2024

I think there might also be optimization differences in the ITK / ANTs compilation.

It seems CMAKE_BUILD_TYPE=Release is only set on Mac?

if [[ `uname` -eq Darwin ]] ; then
CMAKE_BUILD_TYPE=Release
fi
if [[ $TRAVIS -eq true ]] ; then
CMAKE_BUILD_TYPE=Release
JTHREADS=2
fi

Also the ITK compilation has -O2 where an ANTs superbuild has -O3

-DCMAKE_C_FLAGS="${CMAKE_C_FLAGS} -Wno-c++11-long-long -fPIC -O2 -DNDEBUG "\
-DCMAKE_CXX_FLAGS="${CMAKE_CXX_FLAGS} -Wno-c++11-long-long -fPIC -O2 -DNDEBUG "\

@benoitberanger
Copy link
Author

For DenoiseImage, it's a difference in defaults, the search radius is 2 in the CLI program and 3 in ANTsPy. If I call ants.denoise_image(img, r=2, v=1), the difference in performance is similar to that for N4.

Correct, I did not notice the difference with r default parameter.

Here is what I have now :

DenoiseImage r=2 r=3
CLI 76s 179s
Python 248s 602s

@cookpa
Copy link
Member

cookpa commented Sep 4, 2024

Thanks for testing, @benoitberanger

Would you mind trying out #705 ? If you have the Github CLI, you can do

gh pr checkout 705

It appears to close the gap on my Mac.

@benoitberanger
Copy link
Author

Using 550400a in #705 here is the new computation times :

DenoiseImage r=2
CLI 76s
Python 73s

So it went from 248s to 73s !

@cookpa
Copy link
Member

cookpa commented Sep 4, 2024

Wow! Thanks for reporting this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants