-
Notifications
You must be signed in to change notification settings - Fork 58
VTuneConnector
The Kokkos Tools VTuneConnector inserts instrumentation for Intel VTune. Kernels are marked through VTune's domain/frame interface. That is a kernel that is identified with a specific domain, with each individual call to the kernel being a frame of that domain. If the developer provides a string label for the parallel region, then it is used as the domain identifier. Otherwise, the C++ type name of the functor or lambda is used.
The tool is located at: https://github.com/kokkos/kokkos-tools/tree/develop/profiling/vtune-connector
The Makefile needs to know where VTune's home directory is. Other than that, simply type make
inside the source directory. When compiling for specific platforms modify the simple Makefile to use the correct compiler and link flags. Alternatively, you can use cmake to build the VTuneConnector along with other connectors, by creating a new folder and then typing cmake ..
.
This is a standard tool which does not yet support tool chaining. Modify your VTune run environment to include:
KOKKOS_PROFILE_LIBRARY={PATH_TO_TOOL_DIRECTORY}/kp_vtune_connector.so
This tool additional memory footprint is dwarfed by the memory usage of VTune during profiling.
Switch to the domain/frame based view inside of VTune to analyze your applications kernel focused.
Consider the following code:
#include<Kokkos_Core.hpp>
int main(int argc, char* argv[]) {
Kokkos::initialize(argc,argv);
{
int N = 100000000;
Kokkos::View<double*> a("A",N);
Kokkos::View<double*> b("B",N);
Kokkos::View<double*> c("C",N);
Kokkos::parallel_for(N, KOKKOS_LAMBDA (const int& i) {
a(i) = 1.0*i;
b(i) = 1.5*i;
c(i) = 0.0;
});
double result = 0.0;
for(int k = 0; k<50; k++) {
Kokkos::parallel_for("AXPB", N, KOKKOS_LAMBDA (const int& i) {
c(i) = 1.0*k*a(i) + b(i);
});
double dot;
Kokkos::parallel_reduce("Dot", N, KOKKOS_LAMBDA (const int& i, double& lsum) {
lsum += c(i)*c(i);
},dot);
result += dot;
}
printf("Result: %lf\n",result);
}
Kokkos::finalize();
}
And here is a screenshot in VTune of the Bottom-up Frame/Domain view. The Kernel names are used for the domains, and individual calls with the same name are frames in that domain. Note how the lambda got a compiler generated type name (Z4mainEUlRKiE_) assigned. Demangling can translate this into "main::{lambda(int const&)#1}". These lambda names are compiler dependent.
SAND2017-3786