Create simulation volumes on the GPU #243

lkeegan · 2023-09-20T13:39:35Z

use gpu in create_simulation_volume() if available
construct structures as needed instead of all at once
empty cache between each structure to reduce GPU ram usage
use float32 type throughout when constructing arrays
increase allowed tolerance for so2 in test from 1e-8 to 1e-7 due to gpu float32 precision

Please check the following before creating the pull request (PR):

Did you run automatic tests?
Did you run manual tests?
Is the code provided in the PR still backwards compatible to previous SIMPA versions?

Runtime of create_simulation_volume() in test script reduces from 30.5s -> 7.2s

- use gpu in `create_simulation_volume()` if available - construct structures as needed instead of all at once - empty cache between each structure to reduce GPU ram usage - use float32 type throughout when constructing arrays - increase allowed tolerance for so2 in test from 1e-8 to 1e-7 due to gpu float32 precision

kdreher

Concerning the way we handle float32 and float64, I gues we have the options of:

Doing all the calculations in torch and numpy in float32
Doing the calculations in torch in float32 but then cast them to float64 for the rest of the pipeline.

What do you think?

kdreher · 2023-09-20T15:48:26Z

...core/simulation_modules/volume_creation_module/volume_creation_module_model_based_adapter.py

-        torch.cuda.empty_cache()
+        # convert volumes back to CPU
+        for key in volumes.keys():
+            volumes[key] = volumes[key].cpu().numpy()


I think this might cause problems later in the pipeline as the array that are returned here are of the type np.float32 and later they will be used in conjunction with other np array that are natively generated as np.float64. So for me, the acoustic simulation crashes because of that but if we put here np.float64(...), then it runs about as fast.

Are you running the generate_in_silico_data script you sent me earlier? I'd like to reproduce the crash (I agree with your comment, but when I run your script the acoustic part seems to work as before)

Yes that scripts breaks in Matlab when I don't cast the volumes to float64 with this error message:

< M A T L A B (R) > Copyright 1984-2023 The MathWorks, Inc. R2023a Update 5 (9.14.0.2337262) 64-bit (glnxa64) July 24, 2023

Warning: Unrecognized command line option: automation.
Warning: Unrecognized command line option: wait.

To get started, type doc.
For product information, visit www.mathworks.com.

639

750

0.3823

Error using kWaveGrid/set.t_array
t_array must be evenly spaced.

Error in simulate_2D (line 94)
kgrid.t_array = makeTime(kgrid, medium.sound_speed, 0.3);

Strange, I can't reproduce it:

2023-09-21 14:55:03,421 - INFO - ['/usr/local/bin/matlab', '-nodisplay', '-nosplash', '-automation', '-wait', '-r', "addpath('/export/home/lkeegan/simpa/simpa/core/simulation_modules/acoustic_forward_module');simulate_2D('/tmp/tmp_simpa/Forearm_10000/Forearm_10000_Wavelength_800.hdf5.mat');exit;"] < M A T L A B (R) > Copyright 1984-2020 The MathWorks, Inc. R2020a (9.8.0.1323502) 64-bit (glnxa64) February 25, 2020 Warning: Unrecognized command line option: automation. Warning: Unrecognized command line option: wait. To get started, type doc. For product information, visit www.mathworks.com. 639 750 0.3823 Making time! Running k-Wave simulation... start time: 21-Sep-2023 14:55:07 reference sound speed: 1624m/s Warning: Support for ver('distcomp') will be removed in a future release. Use ver('parallel') instead. > In ver>locGetSingleToolboxInfo (line 283) In ver (line 56) In verLessThan (line 39) In kspaceFirstOrder_inputChecking (line 1306) In kspaceFirstOrder2D (line 537) In kspaceFirstOrder3DC (line 532) In kspaceFirstOrder2DG (line 76) In simulate_2D (line 157) WARNING: visualisation plot scale may not be optimal for given source. dt: 18.4729ns, t_end: 66.4655us, time steps: 3599 input grid size: 639 by 750 grid points (63.9 by 75mm) maximum supported frequency: 7.3999MHz by 7.4115MHz expanding computational grid... computational grid size: 675 by 800 grid points precomputation completed in 0.69861s saving input files to disk... completed in 0.78956s ┌───────────────────────────────────────────────────────────────┐ │ kspaceFirstOrder-CUDA v1.3 │ ├───────────────────────────────────────────────────────────────┤ │ Reading simulation configuration: Done │ │ Selected GPU device id: 0 │ │ GPU device name: NVIDIA A100-PCIE-40GB │ │ Number of CPU threads: 64 │ │ Processor name: AMD EPYC 7452 32-Core Processor │ ├───────────────────────────────────────────────────────────────┤ │ Simulation details │ ├───────────────────────────────────────────────────────────────┤ │ Domain dimensions: 675 x 800 │ │ Medium type: 2D │ │ Simulation time steps: 3599 │ ├───────────────────────────────────────────────────────────────┤ │ Initialization │ ├───────────────────────────────────────────────────────────────┤ │ Memory allocation: Done │ │ Data loading: Done │ │ Elapsed time: 0.02s │ ├───────────────────────────────────────────────────────────────┤ │ FFT plans creation: Done │ │ Pre-processing phase: Done │ │ Elapsed time: 0.43s │ ├───────────────────────────────────────────────────────────────┤ │ Computational resources │ ├───────────────────────────────────────────────────────────────┤ │ Current host memory in use: 329MB │ │ Current device memory in use: 2236MB │ │ Expected output file size: 178MB │ ├───────────────────────────────────────────────────────────────┤ │ Simulation │ ├──────────┬────────────────┬──────────────┬────────────────────┤ │ Progress │ Elapsed time │ Time to go │ Est. finish time │ ├──────────┼────────────────┼──────────────┼────────────────────┤ │ 0% │ 0.001s │ 1.803s │ 21/09/23 14:55:11 │ │ 5% │ 0.112s │ 2.110s │ 21/09/23 14:55:12 │ │ 10% │ 0.226s │ 2.026s │ 21/09/23 14:55:12 │ │ 15% │ 0.327s │ 1.846s │ 21/09/23 14:55:11 │ │ 20% │ 0.423s │ 1.687s │ 21/09/23 14:55:11 │ │ 25% │ 0.519s │ 1.553s │ 21/09/23 14:55:11 │ │ 30% │ 0.614s │ 1.431s │ 21/09/23 14:55:11 │ │ 35% │ 0.711s │ 1.318s │ 21/09/23 14:55:12 │ │ 40% │ 0.807s │ 1.208s │ 21/09/23 14:55:12 │ │ 45% │ 0.903s │ 1.101s │ 21/09/23 14:55:12 │ │ 50% │ 0.998s │ 0.997s │ 21/09/23 14:55:11 │ │ 55% │ 1.094s │ 0.894s │ 21/09/23 14:55:11 │ │ 60% │ 1.190s │ 0.792s │ 21/09/23 14:55:11 │ │ 65% │ 1.286s │ 0.691s │ 21/09/23 14:55:11 │ │ 70% │ 1.382s │ 0.591s │ 21/09/23 14:55:11 │ │ 75% │ 1.477s │ 0.491s │ 21/09/23 14:55:11 │ │ 80% │ 1.573s │ 0.392s │ 21/09/23 14:55:11 │ │ 85% │ 1.670s │ 0.293s │ 21/09/23 14:55:12 │ │ 90% │ 1.766s │ 0.195s │ 21/09/23 14:55:12 │ │ 95% │ 1.861s │ 0.097s │ 21/09/23 14:55:12 │ ├──────────┴────────────────┴──────────────┴────────────────────┤ │ Elapsed time: 1.96s │ ├───────────────────────────────────────────────────────────────┤ │ Sampled data post-processing: Done │ │ Elapsed time: 0.00s │ ├───────────────────────────────────────────────────────────────┤ │ Summary │ ├───────────────────────────────────────────────────────────────┤ │ Peak host memory in use: 329MB │ │ Peak device memory in use: 2238MB │ ├───────────────────────────────────────────────────────────────┤ │ Total execution time: 3.48s │ ├───────────────────────────────────────────────────────────────┤ │ End of computation │ └───────────────────────────────────────────────────────────────┘ time_step = 1.8473e-08 number_time_steps = 3599 2023-09-21 14:55:18,334 - INFO - Simulating the acoustic forward process...[Done]

kdreher · 2023-09-20T15:49:35Z

simpa/utils/libraries/structure_library/HorizontalLayerStructure.py

I guess the other structures will also be adjusted like I mentioned in PR #242

Yes, good point, I'll add them to this PR

lkeegan · 2023-09-21T05:51:57Z

Concerning the way we handle float32 and float64, I gues we have the options of:

Doing all the calculations in torch and numpy in float32

Doing the calculations in torch in float32 but then cast them to float64 for the rest of the pipeline.

What do you think?

I think it depends what benefit you get from using float64:

if all the pipeline steps use float32 (which at this point it looks like they mostly do?) then using float32 everywhere probably makes sense
but if you have (or may in the future have) pipeline steps that use float64 (and where you notice the loss of precision if you use float32), then option 2 is probably better
as you said there's not much difference in terms of run-time, but using float64 does double the disk space you need to store them (and increases the time taken to read/write them to disk)

If you do choose option 1, I guess that changing the format you write data to disk is probably a breaking change(?) which should be done in a separate PR, so either way this PR should be fixed to cast them back to float64 as you suggested.

…ng API

…tructure

kdreher · 2023-09-21T09:28:46Z

I don't think that we actually need the precision of float64 anywhere so I don't see a reason why we shouldn't have everything in float32

…rCuboidStructure

…bularStructure

…ipedStructure

…cture

remove print statements

kdreher

looks good

kdreher mentioned this pull request Sep 20, 2023

Reduce gpu ram use in EllipticalTubularStructure #242

Merged

3 tasks

kdreher requested changes Sep 20, 2023

View reviewed changes

lkeegan added 3 commits September 21, 2023 08:17

ensure volume arrays have dtype float64 for compatibility with existi…

17b5517

…ng API

use float32 type for geometrical_volume in StructureBase

0a0528d

reduce GPU ram use and runtime of get_enclosed_indices for SphericalS…

80b3f33

…tructure

lkeegan and others added 6 commits September 21, 2023 12:46

reduce GPU ram use and runtime of get_enclosed_indices for Rectangula…

e027360

…rCuboidStructure

reduce GPU ram use and runtime of get_enclosed_indices for CircularTu…

8235426

…bularStructure

reduce GPU ram use and runtime of get_enclosed_indices for Parallelep…

1c6678a

…ipedStructure

reduce GPU ram use and runtime of get_enclosed_indices for VesselStru…

2a8301c

…cture

Merge branch 'main' into gpu_volume_creation

987fd71

Update RectangularCuboidStructure.py

709804a

remove print statements

kdreher approved these changes Sep 21, 2023

View reviewed changes

kdreher merged commit ed02703 into IMSY-DKFZ:main Sep 21, 2023
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create simulation volumes on the GPU #243

Create simulation volumes on the GPU #243

lkeegan commented Sep 20, 2023

kdreher left a comment

kdreher Sep 20, 2023

lkeegan Sep 21, 2023

kdreher Sep 21, 2023

lkeegan Sep 21, 2023

kdreher Sep 20, 2023

lkeegan Sep 21, 2023

lkeegan commented Sep 21, 2023

kdreher commented Sep 21, 2023

kdreher left a comment

Create simulation volumes on the GPU #243

Create simulation volumes on the GPU #243

Conversation

lkeegan commented Sep 20, 2023

kdreher left a comment

Choose a reason for hiding this comment

kdreher Sep 20, 2023

Choose a reason for hiding this comment

lkeegan Sep 21, 2023

Choose a reason for hiding this comment

kdreher Sep 21, 2023

Choose a reason for hiding this comment

lkeegan Sep 21, 2023

Choose a reason for hiding this comment

kdreher Sep 20, 2023

Choose a reason for hiding this comment

lkeegan Sep 21, 2023

Choose a reason for hiding this comment

lkeegan commented Sep 21, 2023

kdreher commented Sep 21, 2023

kdreher left a comment

Choose a reason for hiding this comment