-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create simulation volumes on the GPU #243
Conversation
- use gpu in `create_simulation_volume()` if available - construct structures as needed instead of all at once - empty cache between each structure to reduce GPU ram usage - use float32 type throughout when constructing arrays - increase allowed tolerance for so2 in test from 1e-8 to 1e-7 due to gpu float32 precision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning the way we handle float32 and float64, I gues we have the options of:
- Doing all the calculations in torch and numpy in float32
- Doing the calculations in torch in float32 but then cast them to float64 for the rest of the pipeline.
What do you think?
torch.cuda.empty_cache() | ||
# convert volumes back to CPU | ||
for key in volumes.keys(): | ||
volumes[key] = volumes[key].cpu().numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might cause problems later in the pipeline as the array that are returned here are of the type np.float32 and later they will be used in conjunction with other np array that are natively generated as np.float64. So for me, the acoustic simulation crashes because of that but if we put here np.float64(...), then it runs about as fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you running the generate_in_silico_data script you sent me earlier? I'd like to reproduce the crash (I agree with your comment, but when I run your script the acoustic part seems to work as before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that scripts breaks in Matlab when I don't cast the volumes to float64 with this error message:
< M A T L A B (R) >
Copyright 1984-2023 The MathWorks, Inc.
R2023a Update 5 (9.14.0.2337262) 64-bit (glnxa64)
July 24, 2023
Warning: Unrecognized command line option: automation.
Warning: Unrecognized command line option: wait.
To get started, type doc.
For product information, visit www.mathworks.com.
639
750
0.3823
Error using kWaveGrid/set.t_array
t_array must be evenly spaced.
Error in simulate_2D (line 94)
kgrid.t_array = makeTime(kgrid, medium.sound_speed, 0.3);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange, I can't reproduce it:
2023-09-21 14:55:03,421 - INFO - ['/usr/local/bin/matlab', '-nodisplay', '-nosplash', '-automation', '-wait', '-r', "addpath('/export/home/lkeegan/simpa/simpa/core/simulation_modules/acoustic_forward_module');simulate_2D('/tmp/tmp_simpa/Forearm_10000/Forearm_10000_Wavelength_800.hdf5.mat');exit;"]
< M A T L A B (R) >
Copyright 1984-2020 The MathWorks, Inc.
R2020a (9.8.0.1323502) 64-bit (glnxa64)
February 25, 2020
Warning: Unrecognized command line option: automation.
Warning: Unrecognized command line option: wait.
To get started, type doc.
For product information, visit www.mathworks.com.
639
750
0.3823
Making time!
Running k-Wave simulation...
start time: 21-Sep-2023 14:55:07
reference sound speed: 1624m/s
Warning: Support for ver('distcomp') will be removed in a future release. Use ver('parallel') instead.
> In ver>locGetSingleToolboxInfo (line 283)
In ver (line 56)
In verLessThan (line 39)
In kspaceFirstOrder_inputChecking (line 1306)
In kspaceFirstOrder2D (line 537)
In kspaceFirstOrder3DC (line 532)
In kspaceFirstOrder2DG (line 76)
In simulate_2D (line 157)
WARNING: visualisation plot scale may not be optimal for given source.
dt: 18.4729ns, t_end: 66.4655us, time steps: 3599
input grid size: 639 by 750 grid points (63.9 by 75mm)
maximum supported frequency: 7.3999MHz by 7.4115MHz
expanding computational grid...
computational grid size: 675 by 800 grid points
precomputation completed in 0.69861s
saving input files to disk...
completed in 0.78956s
┌───────────────────────────────────────────────────────────────┐
│ kspaceFirstOrder-CUDA v1.3 │
├───────────────────────────────────────────────────────────────┤
│ Reading simulation configuration: Done │
│ Selected GPU device id: 0 │
│ GPU device name: NVIDIA A100-PCIE-40GB │
│ Number of CPU threads: 64 │
│ Processor name: AMD EPYC 7452 32-Core Processor │
├───────────────────────────────────────────────────────────────┤
│ Simulation details │
├───────────────────────────────────────────────────────────────┤
│ Domain dimensions: 675 x 800 │
│ Medium type: 2D │
│ Simulation time steps: 3599 │
├───────────────────────────────────────────────────────────────┤
│ Initialization │
├───────────────────────────────────────────────────────────────┤
│ Memory allocation: Done │
│ Data loading: Done │
│ Elapsed time: 0.02s │
├───────────────────────────────────────────────────────────────┤
│ FFT plans creation: Done │
│ Pre-processing phase: Done │
│ Elapsed time: 0.43s │
├───────────────────────────────────────────────────────────────┤
│ Computational resources │
├───────────────────────────────────────────────────────────────┤
│ Current host memory in use: 329MB │
│ Current device memory in use: 2236MB │
│ Expected output file size: 178MB │
├───────────────────────────────────────────────────────────────┤
│ Simulation │
├──────────┬────────────────┬──────────────┬────────────────────┤
│ Progress │ Elapsed time │ Time to go │ Est. finish time │
├──────────┼────────────────┼──────────────┼────────────────────┤
│ 0% │ 0.001s │ 1.803s │ 21/09/23 14:55:11 │
│ 5% │ 0.112s │ 2.110s │ 21/09/23 14:55:12 │
│ 10% │ 0.226s │ 2.026s │ 21/09/23 14:55:12 │
│ 15% │ 0.327s │ 1.846s │ 21/09/23 14:55:11 │
│ 20% │ 0.423s │ 1.687s │ 21/09/23 14:55:11 │
│ 25% │ 0.519s │ 1.553s │ 21/09/23 14:55:11 │
│ 30% │ 0.614s │ 1.431s │ 21/09/23 14:55:11 │
│ 35% │ 0.711s │ 1.318s │ 21/09/23 14:55:12 │
│ 40% │ 0.807s │ 1.208s │ 21/09/23 14:55:12 │
│ 45% │ 0.903s │ 1.101s │ 21/09/23 14:55:12 │
│ 50% │ 0.998s │ 0.997s │ 21/09/23 14:55:11 │
│ 55% │ 1.094s │ 0.894s │ 21/09/23 14:55:11 │
│ 60% │ 1.190s │ 0.792s │ 21/09/23 14:55:11 │
│ 65% │ 1.286s │ 0.691s │ 21/09/23 14:55:11 │
│ 70% │ 1.382s │ 0.591s │ 21/09/23 14:55:11 │
│ 75% │ 1.477s │ 0.491s │ 21/09/23 14:55:11 │
│ 80% │ 1.573s │ 0.392s │ 21/09/23 14:55:11 │
│ 85% │ 1.670s │ 0.293s │ 21/09/23 14:55:12 │
│ 90% │ 1.766s │ 0.195s │ 21/09/23 14:55:12 │
│ 95% │ 1.861s │ 0.097s │ 21/09/23 14:55:12 │
├──────────┴────────────────┴──────────────┴────────────────────┤
│ Elapsed time: 1.96s │
├───────────────────────────────────────────────────────────────┤
│ Sampled data post-processing: Done │
│ Elapsed time: 0.00s │
├───────────────────────────────────────────────────────────────┤
│ Summary │
├───────────────────────────────────────────────────────────────┤
│ Peak host memory in use: 329MB │
│ Peak device memory in use: 2238MB │
├───────────────────────────────────────────────────────────────┤
│ Total execution time: 3.48s │
├───────────────────────────────────────────────────────────────┤
│ End of computation │
└───────────────────────────────────────────────────────────────┘
time_step =
1.8473e-08
number_time_steps =
3599
2023-09-21 14:55:18,334 - INFO - Simulating the acoustic forward process...[Done]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the other structures will also be adjusted like I mentioned in PR #242
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point, I'll add them to this PR
I think it depends what benefit you get from using float64:
If you do choose option 1, I guess that changing the format you write data to disk is probably a breaking change(?) which should be done in a separate PR, so either way this PR should be fixed to cast them back to float64 as you suggested. |
I don't think that we actually need the precision of float64 anywhere so I don't see a reason why we shouldn't have everything in float32 |
remove print statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
create_simulation_volume()
if availablePlease check the following before creating the pull request (PR):
Runtime of
create_simulation_volume()
in test script reduces from 30.5s -> 7.2s