DPBench - Numba/Native Benchmarks

This repository contains a set of benchmarks that are used for evaluating the performance Numba's JIT compilation functionality for Intel GPUs. The repository is structured as follows.

numba : Contains Numba implementations of the benchmarks. Each benchmark directory contains two sub-directories - CPU and GPU. These sub-directories contain the CPU and GPU Numba implementations of the benchmark, respectively.
native : Contains C++/OpenMP implementations of the benchmarks. The CPU implementations (in CPU sub-directory) use OpenMP parallel semantics. The GPU implementations use OpenMP offload
native_dpcpp : Contains DPC++ implementations of the benchmarks.
dpnp : Contains dpnp implementations for a subset of the benchmarks

In addition to the implementations, this repository contains a set of Python scripts to exercise the implementations. These Python scripts provide mechanisms to the user to build and run benchmark programs. The Python scripts can plot bar graphs showing the performance throughput of the benchmarks for the executed implementations.

The primary interface to running the benchmarks is automate_run.py script. It accepts the following options:

-r, --run : "execute" the benchmark/s or "plot" performance data to generate graphs (Default: "all" if option unspecified. Runs both)
-ws : name/s of benchmark/s to execute or "all" to execute every benchmark (Default: "all" if option unspecified)
-i, --impl : execute "native" or "numba" or "dpnp" or "native_dpcpp" implementation (Default: "all" if option unspecified. Runs both implementations)
-k : execute dppy.kernel implementation if available. This option can be used only if "-i" is set to "numba"
-p, --platform : execute "cpu" or "gpu" implementation (Default: "all" if option unspecified. Runs both)
-a, --analysis : selects the type of execution. Currently we support four analysis options. "test" runs the benchmark with the smallest input and is suitable for testing the functionality of the benchmark. "perf" runs the benchmark on varying inputs and generates performance data. "vtune" and "advisor" run the benchmark with Intel VTune and Intel Advisor profiling tools

python automate_run.py -h specifies the list of options and the arguments that can be provided to them.

Note: To obtain GPU roofline graph using Intel Advisor, set the value of the dev.i915.perf_stream_paranoid sysctl option to 0 using sudo sysctl -w dev.i915.perf_stream_paranoid=0. This command makes a temporary-only change that is lost on the next reboot. Hence, every time the machine is rebooted the command needs to be executed. More details on obtaining GPU Roofline using Intel Advisor can be found at this link.

Examples of running the benchmarks

To generate performance data, plot graph, VTune profile and Advisor roofline graph for CPU and GPU implementations of all benchmarks
```
 $ python automate_run.py
```
To generate performance data for numba implementations only (CPU and GPU)
```
 $ python automate_run.py -r execute -i numba -a perf
```
To generate advisor roofline graph for native GPU implementations of all benchmarks
```
 $ python automate_run.py -a advisor -i native -p gpu
```
Roofline graph for each benchmark can be found at <path/to/native/benchmark/directory>/GPU/roofline/roofline.html.

Note: To obtain GPU roofline graph using Intel Advisor, ensure the value of dev.i915.perf_stream_paranoid sysctl option is set to 0. If not set to 0, use sudo sysctl -w dev.i915.perf_stream_paranoid=0 to set it to 0
Generate VTune profile for kmeans and pairwise_distance benchmarks numba CPU implementations
```
 $ python automate_run.py -a vtune -i numba -p cpu -ws kmeans pairwise_distance
```

Run "test" version of l2_distance Numba GPU benchmark

 $ python automate_run.py -ws l2_distance -a test -p gpu -i numba

Plot graph from all performance data (No execution)
```
 $ python automate_run.py -r plot
```
Plot graph to compare specific benchmark's numba performance (cpu vs gpu)
```
 $ python automate_run.py -r plot -ws kmeans -i numba
```

Plot graph to compare a set of benchmarks' gpu performance (numba vs native)

 $ python automate_run.py -r plot -ws kmeans blackscholes l2_distance -p gpu

Name		Name	Last commit message	Last commit date
Latest commit History 372 Commits
.github/workflows		.github/workflows
dpbench		dpbench
dpnp		dpnp
native		native
native_dpcpp		native_dpcpp
numba		numba
utils		utils
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
automate_run.py		automate_run.py
execute_implementations.py		execute_implementations.py
options.py		options.py
plot_graphs.py		plot_graphs.py
pyproject.toml		pyproject.toml
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DPBench - Numba/Native Benchmarks

Examples of running the benchmarks

About

Releases

Packages

Languages

License

mingjie-intel/dpbench

Folders and files

Latest commit

History

Repository files navigation

DPBench - Numba/Native Benchmarks

Examples of running the benchmarks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages