Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce segstats.py script for fast partial volume computation #239

Merged
merged 4 commits into from
Jan 23, 2023

Conversation

dkuegler
Copy link
Member

The FreeSurfer mri_segstats program computes volume statistics from a combination of a segmentation and bias-field corrected image. Fundamentally, it assumes local regional intensity consistency.

In the PR, we introduce three alternatives to this script:

  1. python port of the mri_segstats algorithm (called legacy)
    a. python implementation using python and numpy code throughout including multiple processes for parallel execution (this is extremely inefficient and should be avoided, but avoids the python global interpreter lock)
    b. python+numba (python code can be properly compiled by llvm) implementation including multiple threads for parallel execution and numba-based parallelization (this code is about as efficient as the FreeSurfer mri_segstats code, and avoids the global interpreter lock)
  2. updated partial volume algorithm implemented in python and numpy including multiple threads for parallel execution (this code is significantly more efficient, but does not avoid the global interpreter lock as it does not require numba and compilation)
  3. updated partial volume algorithm implemented in python and torch with almost no multi-threading but calculation on the GPU (this code is also significantly faster than other version, but optimized using sparse tensors and for SIMD processors)

I am currently running some benchmarks to document the runtime of the respective versions.

@m-reuter there is also a question of whether we want to even include and distribute the "legacy algorithms". I suspect, including them in one commit but removing them again might be an option, but I do not expect we want to add numba as a dependency (which is really required for acceptable run times).

@dkuegler dkuegler changed the title introduce segstats.py script for fast partial volume computation Introduce segstats.py script for fast partial volume computation Jan 11, 2023
add code for gpu/torch implementation (currently seems to require too much gpu memory)
@dkuegler
Copy link
Member Author

dkuegler commented Jan 13, 2023

benchmarking on landau
6-core, 12 thread
Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz

FreeSurfer 7.3.2 mri_segstats

python -m timeit "import subprocess; subprocess.run('mri_segstats --seed 1234 --seg $TSUB/mri/wmparc.DKTatlas.mapped.mgz --sum $TSUB/stats/wmparc.DKTatlas.mapped.fsst
ats --pv $TSUB/mri/norm.mgz --excludeid 0 --brainmask $TSUB/mri/brainmask.mgz --in $TSUB/mri/norm.mgz --in-intensity-name norm --in-intensity-units MR --subject OAS1_0061_MR1_modified --surf-wm-vol --ctab $FREESURFER/WMParcStatsLUT.txt'.split(' '))"

1 loop, best of 5: 251 sec per loop

251 sec = 4min11

Basically only fully loads a single core/thread -- theoretically, it should be parallelized through openmp, but I cannot find the associated flags.

If we assume parallel execution, best case would be around 21 sec.

FastSurfer 2.0.1* segstats.py

on CPU

python -m timeit "from FastSurferCNN.segstats import main, make_arguments; main(make_arguments().parse_args('-norm $TSUB/mri/norm.mgz -i $TSUB/mri/wmparc.DKTatlas.mapped.mgz -o $TSUB/stats/wmparc.DKTatlas.mapped.pyvstats --lut $FREESURFER/WMParcStatsLUT.txt --device cpu'.split(' ')))"

1 loop, best of 5: 4.06 sec per loop

Loads about 11 of 12 threads (additional overhead by other processes)

Theoretically, there might be a very small advantage here for the python code, because I am not firing up a new process.

on GPU, I am still facing memory constraints, so it remains unclear, if that is possible. Also, it seems to be much smaller because of the sparsity of PV-affected voxels. The solution probably has to be to "patchify" the code similar to the cpu code, but then even more of the potential speed advantage of gpu goes away because of synchronizations and overhead.

@dkuegler dkuegler marked this pull request as ready for review January 13, 2023 14:56
Note: numpy might grab additional cores to accelerate the workload no matter the threads flag.
@dkuegler
Copy link
Member Author

I removed all non-core code so we do not have to maintain multiple versions of the code. Specifically, even the numba accelerated legacy algorithm was too slow and torch requires very high memory allocation. Furthermore, even the overhead for torch was already slower than the cpu implementation. Therefore, for now there will not be a dedicated torch implementation.

I also added a "--threads" flag to specify the number of threads. This seems to somewhat reliably load one cpu thread. However, numpy has inherent parallelism through MKL and openmp.

https://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy

Performance

Even on one thread, the script is still pretty fast:

~/projects/FastSurfer$ python -m timeit "from FastSurferCNN.segstats import main, make_arguments; main(make_arguments().parse_args('-norm $TSUB/mri/norm.mgz -i $TSUB/mri/wmparc.DKTatlas.mapped.mgz -o $TSUB/stats/wmparc.DKTatlas.mapped.pyvstats --lut $FREESURFER/WMParcStatsLUT.txt'.split(' ')))"
1 loop, best of 5: 4.03 sec per loop

This is effectively on 12 threads.

~/projects/FastSurfer$ python -m timeit "from FastSurferCNN.segstats import main, make_arguments; main(make_arguments().parse_args('-norm $TSUB/mri/norm.mgz -i $TSUB/mri/wmparc.DKTatlas.mapped.mgz -o $TSUB/stats/wmparc.DKTatlas.mapped.pyvstats --lut $FREESURFER/WMParcStatsLUT.txt --threads 1'.split(' ')))"
1 loop, best of 5: 11.6 sec per loop

FastSurferCNN/segstats.py Outdated Show resolved Hide resolved
@m-reuter m-reuter merged commit 2665f91 into Deep-MI:dev Jan 23, 2023
@dkuegler dkuegler deleted the feature/segstats branch January 23, 2023 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants