Benchmarking Suite for Heterogenenous Architectures

This program provides benchmarking tools for data movement on heterogenenous architecutres, including:

Inter-CPU data movement
Inter-GPU data movement
Cuda Memcpys
Injection bandwidth limitations

Compiling

This codebase uses cmake. To compile the code :

mkdir build
cd build
cmake ..

CUDA-Aware MPI

If your system does not have CUDA-Aware MPI, you can benchmark and model communication routed through the CPU. To compile the code without CUDA-Aware MPI:

mkdir build
cd build
cmake -DCUDA_AWARE=OFF ..

Summit SuperComputer

Example runscripts for each example on Summit are available in the folder 'benchmarks/summit'. These runscripts all use Spectrum MPI. Figures for benchmarks on Summit are available in the folder 'figures/summit'. A subset of these figures are published in [Modeling Data Movement Performance on Heterogeneous Architectures]{https://arxiv.org/pdf/2010.10378.pdf}.

Lassen SuperComputer

Example runscripts for each example on Lassen are available in the folder 'benchmarks/lassen'. Spectrum MPI results are in the subfolder 'spectrum', while benchmarks with MVAPICH2-GDR are in the subfolder 'mvapich'. Corresponding figures are in the folder 'figures/lassen'. A subset of these figures are published in [Modeling Data Movement Performance on Heterogeneous Architectures]{https://arxiv.org/pdf/2010.10378.pdf}.

Benchmarks

Each of the existing benchmarks is explained below. For each benchmark, you will need to run code from the 'examples' folder. Then, you will be able to plot measurements and models with scripts in the 'plots' folder.

Memcpy Benchmark

The memcpy benchmark measures the cost of the cudaMemcpyAsync operation. This benchmark compares the cost of the following transfers:

host to device
device to host
device to device All data remains on a single NUMA node. For example, the host and device are both on the same NUMA node. For device to device transfers, both devices along with the calling CPU core are all located on a single NUMA node.

Running the Memcpy Benchmark

Create a folder within 'benchmarks' containing the name of the computer on which you are running this benchmark. Create a runscript within this folder if necessary. All output from the benchmark should be saved in a file titled 'memcpy.<job_id>.out' where 'job_id' is a unique identifier for the individual run. Run the file 'examples/time_memcpy' on a single node, with one CPU core available per GPU. For best performance, the CPU core controlling each GPU should be located on the same NUMA node as the GPU. For example, this benchmark can be run on Lassen with the following :

jsrun -a4 -c4 -g4 -r1 -n1 -M "-gpu" --latency_priority=gpu-cpu --launch_distribution=packed ./time_memcpy

Plotting the Memcpy Benchmark

The memcpy benchmark can be plotted using the scripts within the 'plots' folder. To run these scripts, make sure you have the benchpress module in 'plots' added to your PYTHONPATH. For each of the plots, you can pass display_plot=True to display the plot rather than saving it to a file. The memcpy benchmarks can be plotted with the following:

from benchpress.memcpy import memcpy_plots
# Plot Host to Device and Device To Host Copies
memcpy_plots.plot_memcpy()
# Plot Device to Device Copies
memcpy_plots.plot_memcpy_d2d()
'''


# License

This code is distributed under BSD: http://opensource.org/licenses/BSD-2-Clause

Please see `LICENSE.txt` for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
benchmarks		benchmarks
cmake		cmake
examples		examples
figures		figures
improve_maxrate		improve_maxrate
plots		plots
spmv		spmv
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking Suite for Heterogenenous Architectures

Compiling

CUDA-Aware MPI

Summit SuperComputer

Lassen SuperComputer

Benchmarks

Memcpy Benchmark

Running the Memcpy Benchmark

Plotting the Memcpy Benchmark

About

Releases

Packages

Contributors 2

Languages

bienz2/BenchPress

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Suite for Heterogenenous Architectures

Compiling

CUDA-Aware MPI

Summit SuperComputer

Lassen SuperComputer

Benchmarks

Memcpy Benchmark

Running the Memcpy Benchmark

Plotting the Memcpy Benchmark

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages