Here, the purpose is to write a simple program using Kokkos that evaluates a loop from i = 0
to ì = 5
. The loop calculates c[i] = a[i] * b[i]
for all i
on a GPU. Initialize the arrays a
and b
, eg, as follows:
for (unsigned i = 0; i < n; i++){
a[i] = i;
b[i] = 1;
}
At the end, the program should evaluate a verification loop (without Kokkos) to print out each c[i]
to see the results are correct.
In addition to Kokkos::initialize()
and Kokkos::finalize
, you will need Kokkos::parallel_for
and Kokkos::fence
in this exercise. Furthermore, two different memory management strategies are investigated here:
Kokkos::kokkos_malloc
and Kokkos::kokkos_free
are needed here.
Kokkos::View
and Kokkos::deep_copy
are needed here.
-
If not already done, enter directory
/path/higher-level-gpu-programming/exercises/kokkos/
and clone kokkos bygit clone https://github.com/kokkos/kokkos.git
. Now the kokkos repo should be located in/path/higher-level-gpu-programming/exercises/kokkos/kokkos/
(you can use different location but this location is hardcoded in the solution Makefiles). -
Then just create a source file and Makefile and type
make
. If you encounter compilation errors, make sure the backend compiler for the desired architecture is available, ie,nvcc
for Mahti (usemodule load cuda
) orhipcc
for Lumi (usemodule load rocm
). Hint! You can use the Makefile from the solution folder as a reference and just change the Kokkos path and the file name. -
Run on Lumi or Mahti by
srun ./executable
(add required flags according to the underlying system and user, eg, --account=XXX, --partition=YYY, etc.)
Example Kokkos implementations (.cpp) based on ENCCS material (CC-BY-4.0 license) are given in the solution folders. However, the intention is to try solving the exercise first without looking into these.