OpenACC/CUDA performance on juwels-booster #59
Replies: 2 comments 7 replies
-
Looks good! If only pgi would be as effective as intel for the rest of the code :) @dsidoren Maybe we could think of adding some artificial additional tracers for testing, to imitate BGC model, to show the benefit of using GPU better? |
Beta Was this translation helpful? Give feedback.
-
How many gpus are used in OpenACC and CUDA? |
Beta Was this translation helpful? Give feedback.
-
I have tested the STORM test case on juwels-booster, a machine featuring 2 24-core AMD EPYC-ROME procesors and 4 NVIDIA A100 GPU cards per node. I have used 288 MPI tasks (6 nodes) and the ParaStation MPI implementation. The results are
I have tuned both the CUDA and OpenACC versions to use 64 threads per block, to fit the 47 vertical levels.
Beta Was this translation helpful? Give feedback.
All reactions