Skip to content

DeepLink-org/DLOP-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

DLOP-Bench is an open-source benchmark suite for deep learning operators. It has the following three major features:

  • Operators at the deep learning framework level

We focus on the operator at the deep learning framework level (such as torch.convolution) and do not dive into the implementation details of each operator (implicit gemm implementation or winograd implementation and the related algorithm selection). One can easily benchmark the operators on a certain AI accelerator as long as they finish the adaption on a deep learning framework.

  • Basic operators and domain-specific long-tail operators

Besides basic operators like convolution, pooling, and normalization, we also collect many representative domain-specific operators mainly from object detection, instance segmentation, and other computer vision directions in OpenMMLab. These operators have no dedicated implementation of deep learning accelerators and have to resort to the Python interpreter. As such, they will always be broken down into large numbers of basic operators. They incur a lot of function calls, as well as data transfer and context switching costs. We name them long-tail operators.

  • Benchmarking deep learning accelerators, frameworks, and compilers

From the operator level, this benchmark suite can provide a more microscopic assessment from multiple aspects, including accelerator hardware specifications, deep learning frameworks, and deep learning compilers.

Highlights

  • Execution framework. The main body is an execution engine, compatible with different deep learning frameworks (PyTorch, TensorFlow, JAX, and so on) with different execution modes, such as eager and graph mode.
  • 200+ basic operators. We collected the operators from models in OpenMMLab. The input information consists of two parts: input tensor shape and attributes information. We run the models and record the input configurations of each operator. For each input configuration, we save them in CSV format for evaluation.
  • 100+ long-tail samples. It has collected 100+ long-tail samples from different deep learning models with representative syntax features, mainly from OpenMMLab, see samples for more detail.

Getting Started Instruction

First, download the latest source code:

git clone https://github.com/OpenComputeLab/DLOP-Bench.git

To show the structure of source code, we can use the following command:

cd DLOP-Bench
tree -d -L 1 ./bench

The implementation functions of basic and long tail operators are located in ./bench/samples/.

Dependencies

The code is tested under Python 3, with different deep learning frameworks (PyTorch, TensorFlow, JAX, and so on). You can select a specific version of the framework according to the version of CUDA/cuDNN. For more details please refer to their official websites.

Some samples are dependent on OpenCV2.

pip install opencv-python
pip install opencv-python-headless

Basic Operators

Here is a command demo that illustrates how you can use DLOP-Bench to test basic operators.

# config bench PYTHONPATH
cd DLOP-Bench
export PYTHONPATH=./bench:$PYTHONPATH
# If you want to test sample performance using torch backend, you can see the demo as follows:
# prepare pytorch environment, python 3 & torch 1.10 or 1.12 best
...
# run the operator abs using torch backend, more profiling results can refer to profiler_reulsts, reulsts, and time_reulsts
FRAMEWORK=torch python ./bench/api/api.py -c abs -st 1 
# run the operator abs and absBP using torch backend
FRAMEWORK=torch python ./bench/api/api.py -c abs,absBP -st 1
# get more usage information
FRAMEWORK=torch python ./bench/api/api.py --help

Long-tail Operators

From long-tail operators, this benchmark suite provides several stages to test their performance as below:

  • stage 1 : eager mode.
  • stage 2 : graph mode with jit.

This benchmark suite supports the execution of all long-tail operators in stage 1, while some operators fail to run in 2 because they are unsupported in the given deep learning compiler. Here is a command demo to test long-tail operators.

# run the operator bbox2delta using torch backend in eager mode
FRAMEWORK=torch python ./bench/api/api.py -c bbox2delta -st 1
# run the operator bbox2delta using torch backend in both eager mode and graph mode
FRAMEWORK=torch python ./bench/api/api.py -c bbox2delta -st 1,2
# run the operator bbox2delta and l2_loss using torch backend in both eager mode and graph mode
FRAMEWORK=torch python ./bench/api/api.py -c bbox2delta,l2_loss -st 1,2

These apis can also be used in backend torch, tensorflow, or xla, just set corresponding FRAMEWORK environment. While all the operators can be tested using torch backend, some operators may raise an AssertionError in other backends if their corresponding implementation codes have not been added yet. You can wait for our update or add the codes yourself.

If you want to test sample performance using tensorflow, or XLA backend, you can see the demo as follows:

# prepare tensorflow environment
...
# run the operator bbox2offset using tf backend in eager mode
FRAMEWORK=tf TF_XLA_FLAGS=--tf_xla_auto_jit=2 XLA_FLAGS=--xla_gpu_cuda_data_dir=.../cuda-10.1 python ./bench/api/api.py -c bbox2offset -st 1
# run the operator bbox2offset using tf backend in both eager mode and graph mode
FRAMEWORK=tf TF_XLA_FLAGS=--tf_xla_auto_jit=2 XLA_FLAGS=--xla_gpu_cuda_data_dir=.../cuda-10.1 python ./bench/api/api.py -c bbox2offset -st 1,2

How to add a new operator

  • Create a folder named after the operator in the ./bench/samples/basic directory
  • Copy the json file of the operator parameter information table generated by the operator acquisition module into the folder
  • Create __init__.py and torch_impl.py files, if you need to test other framework operators, you can refer to torch_impl.py In __init__.py, you need to implement two functions get_sample_config and gen_np_args, and then register the two functions using register_sample. In torch_impl.py you need to implement the function args_adaptor, which performs data preparation and the operator definition you are going to add. Then, executor_creator function is needed to register the above two functions into the benchmark.

About

A benchmark suited especially for deep learning operators

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages