Name		Name	Last commit message	Last commit date
parent directory ..
heat-equation		heat-equation
README.md		README.md

README.md

Performance aspects when using multiple GPUs

In this exercise we examine some important performance aspects of MPI programs that use multiple GPUs. As an example, we use the three dimensional heat equation solver.

Build the basic version with the provided Makefile. Remember to load the correct modules before building for GPU.

Try to run the code with different number of GPUs, e.g. 1, 2, 4, 8 and 16 (two nodes). Use the same number of MPI tasks and GPUs per node.

Does the program scale? How does it perform in comparison to the CPU version in the "Scalability" exercise?
In the basic version of the program, there is no multi-GPU awareness. Due to that, all the MPI tasks within a node are using the same GPU (id=0) and performance suffers.

It is possible to overcome this by using a wrapper script, which makes only particular GPU visible for each MPI task. Modify the batch job script as follows and try to rerun the experiment in the step 1.
```
#SBATCH ...

cat << EOF > select_gpu
#!/bin/bash

export ROCR_VISIBLE_DEVICES=\$SLURM_LOCALID
exec \$*
EOF
chmod u+x select_gpu

srun ./select_gpu ./heat_offload
rm -rf ./select_gpu
```
Enable multi-GPU awareness in the program by uncommenting in the Makefile the SETDEVICE variable. Now, do not use the select_gpu wrapper (it does not affect performance, however, number of GPUs reported by the application won't be correct)
In LUMI, it is possible to do MPI communications directly from the GPUs. However, the basic version of program performs all the MPI communication from CPUs which can have significant performance impact.

Build a version for GPU aware MPI by uncommenting in the Makefile the GPU_MPI variable. In addition, at runtime one needs to set the environment variable MPICH_GPU_SUPPORT_ENABLED, so add to the batch job script
```
export MPICH_GPU_SUPPORT_ENABLED=1
```
How is the performance and scalability now? Compare the performance again to the CPU version of "Scalability" exercise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiple-gpu-performance

multiple-gpu-performance

README.md

Performance aspects when using multiple GPUs

Files

multiple-gpu-performance

Directory actions

More options

Directory actions

More options

Latest commit

History

multiple-gpu-performance

Folders and files

parent directory

README.md

Performance aspects when using multiple GPUs