OLive - ONNX Runtime Go Live

OLive, meaning ONNX Runtime(ORT) Go Live, is a python package that automates the process of accelerating models with ONNX Runtime(ORT). It contains two parts including model conversion to ONNX with correctness checking and auto performance tuning with ORT. Users can run these two together through a single pipeline or run them independently as needed.

Model conversion to ONNX

Simplify multiple frameworks to ONNX conversion experience by integrating existing ONNX conversion tools into a single package, as well as validating the converted models' correctness. Currently supported frameworks are PyTorch and TensorFlow.

TensorFlow: OLive supports conversion with TensorFlow model in saved model, frozen graph, and checkpoint format. User needs to provider inputs' names and outputs' names for frozen graph and checkpoint conversion.
PyTorch: User needs to provide inputs' names and shapes to convert PyTorch model. Besides, user needs to provide outputs' names and shapes to convert torchscript PyTorch model.

Auto performance tuning with ORT

ONNX Runtime(ORT) is a high performance inference engine to run ONNX model. It enables many advanced tuning knobs for user to further optimize inference performance. OLive heuristically explores optimization search space in ORT to select the best ORT settings for a specific model on a specific hardware. It outputs the option combinations with the best performance for latency or for throughput.

Optimization fileds:

Execution Providers:
- MLAS(default CPU EP), Intel DNNL and OpenVino for CPU
- Nvidia CUDA and TensorRT for GPU
Environment Variables:
- OMP_WAIT_POLICY
- OMP_NUM_THREADS
- KMP_AFFINITY
- OMP_MAX_ACTIVE_LEVELS
Session Options:
- inter_op_num_threads
- intra_op_num_threads
- execution_mode
- graph_optimization_level
INT8 Quantization
Transformer Model Optimization

Getting Started

OLive package can be downloaded here and installed with command pip install onnxruntime_olive-0.4.0-py3-none-any.whl.

Supported python version: 3.7, 3.8, 3.9

User needs to install CUDA and cuDNN dependencies for perf tuning with OLive on GPU. The table below shows the ORT version and required CUDA and cuDNN version in the latest OLive.

ONNX Runtime	CUDA	cuDNN
1.11.0	11.4	8.2

There are three ways to use OLive:

Use With Command Line: Run the OLive with command line using Python.
Use With Jupyter Notebook: Quickstart of the OLive with tutorial using Jupyter Notebook.
Use With OLive Server: Setup local OLive server for model conversion, optimizaton, and visualization service.

Inference your model with OLive result from auto performance tuning

Get best tuning result with best_test_name, which includes inference session settings, environment variable settings, and latency result.
Set related environment variables in your environment.
- OMP_WAIT_POLICY
- OMP_NUM_THREADS
- KMP_AFFINITY
- OMP_MAX_ACTIVE_LEVELS
- ORT_TENSORRT_FP16_ENABLE

Create onnxruntime inference session with related settings.

inter_op_num_threads
intra_op_num_threads
execution_mode
graph_optimization_level
execution_provider

import onnxruntime as ort
sess_options = ort.SessionOptions()
sess_options.inter_op_num_threads = inter_op_num_threads
sess_options.intra_op_num_threads = intra_op_num_threads
sess_options.execution_mode = execution_mode
sess_options.graph_optimization_level = ort.GraphOptimizationLevel(graph_optimization_level)
onnx_session = ort.InferenceSession(model_path, sess_options, providers=[execution_provider])

Key Updates

10/28/2021

Update OLive from docker container based usage to python package based usage for more flexibilities.

Enable more optimization options for performance tuning with ORT, including INT8 quantization, mix precision in ORT-TensorRT, and transformer model optimization.

Contributing

We’d love to embrace your contribution to OLive. Please refer to CONTRIBUTING.md.

License

Licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 255 Commits
.github/workflows		.github/workflows
ci_tests/tests		ci_tests/tests
cmd-example		cmd-example
notebook-tutorial		notebook-tutorial
olive-model_analyzer-azureML		olive-model_analyzer-azureML
olive		olive
olive_images		olive_images
server-example		server-example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.cuda		Dockerfile.cuda
LICENSE		LICENSE
Model_Analyzer_Support.md		Model_Analyzer_Support.md
README.md		README.md
SECURITY.md		SECURITY.md
ThirdPartyNotices.txt		ThirdPartyNotices.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OLive - ONNX Runtime Go Live

Model conversion to ONNX

Auto performance tuning with ORT

Getting Started

Inference your model with OLive result from auto performance tuning

Key Updates

Contributing

License

About

Releases

Packages

Languages

License

gaziqbal/onnx-golive

Folders and files

Latest commit

History

Repository files navigation

OLive - ONNX Runtime Go Live

Model conversion to ONNX

Auto performance tuning with ORT

Getting Started

Inference your model with OLive result from auto performance tuning

Key Updates

Contributing

License

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages