Skip to content
Vivek Kale edited this page Jul 18, 2024 · 51 revisions

Kokkos Tools

Kokkos Tools provides a lightweight set of profiling and debugging tools and utilities to enhance application programmer productivity in developing Kokkos performance portable parallel programs. Kokkos Tools provides a tools interface with instrumentation hooks built directly into the Kokkos runtime. Compared to 3rd party tools, Kokkos Tools provides much cleaner and more context-specific information: in particular, the tools allow kernel-centric analysis and they use labels provided to Kokkos objects, e.g., Kokkos kernel launches and Kokkos Views.

Under most circumstances, the profiling hooks are compiled into Kokkos executables by default. That means the set of tools works for your existing Kokkos application programs, assuming that the version for the profiling hooks is compatible with the tools version. No recompilation or changes to your build procedures are required.

Note: Kokkos_ENABLE_LIBDL must be on to load profiling hooks dynamically. It should be on by default, however.

General Use

To use one of the tools, you have to compile it, which will generate a dynamic library. Then, just before executing the Kokkos application, you have to set the environment variable KOKKOS_TOOLS_LIBS to point to the dynamic library. Let us say that you are in the directory of your application and want a reproducible run of random sample of half the memory events for each Kokkos kernel library function. You first need to build the sampler utility in Kokkos Tools and memory events Kokkos Tools.

To build the libraries with cmake , you must go to the source directory of Kokkos Tools, say, YOUR_KTO_SRC_DIR and then go into the subdirectory of that particular utility and the tool, and type cmake .. with the appropriate place that you want the libraries to be installed, say, YOUR_KTO_INSTALL, to configure the build and install of the libraries.

cd ${YOUR_KTO_SRC_DIR}; mkdir mybuild; cmake ..` -DCMAKE_INSTALL_PREFIX=${YOUR_KTO_INSTALL}'

If the above is successful, you then type make; make install to actually build and install Kokkos Tools on your machine.

Then, go back to your application code directory, and set Kokkos_TOOLS_LIBS to a library with the sampling utility and the memory event dynamic library.

cd -; export KOKKOS_TOOLS_LIBS='${YOUR_KTO_INSTALL}/libkp_kokkos_sampler.so;${YOUR_KTO_INSTALL}/libkp_memory_events.so'

set any options for the tool and utility such as the sampling skip rate of 2 (every other Kokkos kernel invocation) and global fencing to capture state of memory events:

export KOKKOS_TOOLS_SAMPLER_SKIP=2;  export KOKKOS_TOOLS_GLOBALFENCES=1; 

and then run your application normally, e.g., in Bash:

myKokkosApp.exe;

Using a Kokkos Tool

To use a particular Kokkos library that is at YOUR_KOKKOS_INSTALL for building your Kokkos Tools library, add the flag -DKokkos_ROOT=${YOUR_KOKKOS_INSTALL} to the cmake .. command shown above. This is particularly important when one wants to using third-party Kokkos Tool connector libraries like nvtx-connector.

Specifying Kokkos Tools as a Flag

Alternative to the environment variable, one can pass a parameterized flag to a Kokkos program executable --kokkos-tools-library, e.g., in Bash:

myApp.exe --kokkos-tools-library='${YOUR_KTO_SRC_DIR}/kokkos-tools/src/tools/memory-events/kp_memory_event.so' 

Many of the tools of Kokkos Tools will produce an output file which uses the hostname as well as the process id as part of the filename.

Using Makefiles

Though using cmake is recommended to build the dynamic libraries for tool utilities and tools, you can also use the Makefiles if you wish. Instead of using any cmake commands mentioned in the above, you would do the following:

cd ${YOUR_KTO_SRC_DIR}; cd profiling/memory-events; make; cd ../../common/kokkos-sampler/; make;

Then, you specify the resulting dynamic library files in the source directory in KOKKOS_TOOLS_LIBS, in a similar fashion shown above.

Explicit Instrumentation

One can explicitly add instrumentation to a library or an application. Currently, the only hooks intended for explicit programmer use are the region-related and section-related hooks. These use a push/pop model to mark coarser regions in your code.

void foo() {
   Kokkos::Profiling::pushRegion("foo");
   bar();
   stool();
   Kokkos::Profiling::popRegion();
}

Tools

Utilities

  • KernelFilter

    A tool utility which is used in conjunction with analysis tools, to restrict the tooling to a subset of the application's Kokkos library functions.

  • KernelSampler

    A tool utility which is used in conjunction with analysis tools, to restrict the tooling to samples of Kokkos kernel invocations.

Memory Analysis

  • MemoryHighWater

    Outputs high water mark of memory usage of the application.

  • MemoryUsage

    Generates a per Memory Space timeline of memory utilization.

  • MemoryEvents

    Tool to track memory events such as allocation and deallocation. It also provides the information of the MemoryUsage tool.

Kernel Inspection

3rd Party Profiling Tool Hooks

Vendor-provided

  • nvtxConnector

    Provides Kokkos Kernel Names to nvtx so that analysis can be performed on a per-kernel basis. This was previously called the nvprof-connector.

  • nvtxFocusedConnector

    Like nvtxConnector but it turns profiling off for those kernels filtered out. It should be used in conjunction with the KernelFilter utility. This was previously called the nvprof-focused-connector.

  • roctxConnector

    Provides Kokkos Kernel Names to Roctx so that analysis can be performed on a per-kernel basis.

  • VTuneConnector:

    Provides Kokkos Kernel Names to VTune so that analysis can be performed on a per-kernel basis.

  • VTuneFocusedConnector:

    Like vTuneConnector but it turns profiling off for those kernels filtered out. It should be used in conjunction with the KernelFilter utility.

Open-source

  • Timemory:

    Modular connector for accumulating timing, memory usage, hardware counters, and other various metrics. Supports controlling VTune, CUDA profilers, and TAU + kernel name forwarding to VTune, NVTX, TAU, Caliper, and LIKWID.

    If you need to write your own plug-in, this provides a straight-forward API to writing the plug-in.

    Defining a timemory component will enable your plug-in to output to stdout, text, and JSON, accumulate statistics, and utilize various portable function calls for common needs w.r.t. timers, resource usage, etc.

Automated Tuning

  • Apex

    This is a tool for automated tuning of a large variety of programming models and languages. It is available as a submodule in the Kokkos Tools git repository. It has a hook for Kokkos and sophisticated mechanisms to tune parameters of Kokkos functions' parameters, e.g., the team size in a Kokkos parallel_for. See Apex for more information.

  • Apollo

This a tool that complements Apex. Apollo provides a framework ML-guided auto-tuning capabilities to tune arbitrary performance parameters. The Apollo Kokkos Tools connector allows for such a capability for Kokkos programs.

Automated Analysis

  • Caliper

    This is a tool for automated tuning of a large variety of programming models and languages. It is available as a git submodule in the Kokkos Tools git repository. It has a hook for Kokkos and sophisticated mechanisms to tune parameters of Kokkos functions' parameters. See Caliper for more information.

Performance Monitoring

  • Sampler for LDMS

    LDMS, which stands for Lightweight Data Monitoring System, is software for performance monitoring HPC Systems. The LDMS Kokkos Tools connector invokes functions of LDMS to extract profiling data samples from a Kokkos application program.

Contributing

The success of Kokkos Tools comes from having a collection of libraries built in-house and by the broader Kokkos community. The Kokkos Tools developers from the Kokkos team welcome contributions from other developers and from users of Kokkos alike. Contributions are welcome, particularly in the form of:

  1. Developing and improving on existing tools in the set of Kokkos Tools (each tool is a connector)
  2. Creating and contributing a new tool (connector) to add to the set of tools
  3. Improving documentation for Kokkos Tools
  4. Providing experiences of your Kokkos program use cases when using Kokkos Tools, telling about a success stories and/or failures with respect to Kokkos Tools.

The Kokkos Tools developers have (1) general guidelines and overview of development in Kokkos Tools and (2) specific guidelines and tips for contributing via each of the four ways listed. The wiki page for the general guidelines is here and the wiki pages for specific guidelines are referenced in the paragraphs below.

For items 1 and 2, one contributes by first identifying the problem by creating a Github Issue in the Kokkos Tools repo at github.com/kokkos/kokkos-tools/issues/ and then provides a suggested solution to the Github Issue via a PR against the develop branch of the Kokkos Tools repo.

For 3 and 4, please email vlkale@sandia.gov and crtrott@sandia.gov or, if you can, provide suggestions to the Kokkos Team slack channel. For 3, you can also submit a PR for the files involving documentation in the Kokkos Tools repo, e.g., README.md, Build.md. For 4, if appropriate, the Kokkos Tools developers will showcase them in Kokkos Tools tutorials and other presentations, with your permission.

Tutorials

If you would like to learn about Kokkos Tools in a formal format, tutorials are available. You can find a good overview of all tooling support for Kokkos (including Kokkos TOols) in the slides and recording at NERSC 2024 Advanced Kokkos Tutorial: https://www.nersc.gov/users/training/past-training-events/2024/portability-series-kokkos-apr2024/

You can also look at https://github.com/kokkos/kokkos-tutorials/blob/main/Profiling/Kokkos-Profiling.pdf for slides for an overview of Kokkos Tools. A more in depth discussion is in the YouTube video here: https://www.youtube.com/watch?v=MH6zFYGw0HU. See the latest tutorials at https://github.com/kokkos/kokkos-tutorials.

Related Projects on Tools for Kokkos

Kokkos Tools provides the gateway to analysis and improvement of Kokkos Programs. Each of the libraries of Kokkos Tools hooks directly into the Kokkos runtime library. There are other projects that also support tooling for Kokkos. These are laid out here.

  1. HPCToolkit for Kokkos - POC: John Mellor-Crummey (johnmc@rice.edu)
  2. Tau support for Kokkos - POC: Sameer Shende (sameer@cs.uoregon.edu)
  3. Automated Testing of Kokkos Programs - POC: Vivek Kale (vlkale@sandia.gov)

Additional Resources

  1. Examples: You can find examples of Kokkos Tools being used at https://github.com/DavidPoliakoff/kokkos-tools-examples.
  2. Kokkos.org has a top-level view of the Kokkos project, with a page on Kokkos Tools.
  3. Issue on using nsys and ncu for TensorRT: https://github.com/NVIDIA/TensorRT-LLM/issues/183
Clone this wiki locally