[Performance] Dynamic model input prediction is slow #12955

MgArcher · 2022-09-14T09:07:32Z

Describe the issue

Dynamic model input prediction is slow.
An image recognition model is passed in, and the prediction speed of each different size image is more than 1S. The same picture is predicted several times for the first 1S, and the next few times take 0.04s
Each input of different size matrix, the time consumption increases significantly
How does switching between different pictures take less time?

To reproduce

onnxruntime-gpu==1.9.0

import numpy as np
import onnxruntime as ort
import time

randArray1 = np.random.random_sample(size=(6 ,3, 48, 375)).astype(np.float32)
randArray2 = np.random.random_sample(size=(6 ,3, 48, 1044)).astype(np.float32)
randArray3 = np.random.random_sample(size=(6 ,3, 48, 1537)).astype(np.float32)

model_file_path = "cls_onnx.onnx"
providers = ["CUDAExecutionProvider"]
sess = ort.InferenceSession(model_file_path,providers=providers)

input_dict = {}
input_dict["x"] = randArray3
output_tensors = None
s = time.time()
outputs = sess.run(output_tensors, input_dict)
print(time.time() - s)

input_dict = {}
input_dict["x"] = randArray1
output_tensors = None
s = time.time()
outputs = sess.run(output_tensors, input_dict)
print(time.time() - s)

Urgency

No response

Platform

Windows

OS Version

window 10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.9.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

Model File

cls_onnx.zip

Is this a quantized model?

Unknown

wangyems · 2022-09-14T19:53:45Z

similar to this: #6978

EmreOzkose · 2022-10-24T05:49:52Z

In my experiments, I can say that onnx optimize models according to input shape. I am working with speech data , inputs always have different shape (ex. (500, 80), (300, 80), etc.. ). CPU was faster than GPU. I did warmup and it made onnx model faster. However, I had to do warmup for all possible inputs.

wangyems added the core runtime issues related to core runtime label Sep 14, 2022

hariharans29 mentioned this issue Sep 7, 2023

[Performance] first run 10x slow than the following runs with CUDAProvider #17443

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Dynamic model input prediction is slow #12955

[Performance] Dynamic model input prediction is slow #12955

MgArcher commented Sep 14, 2022

wangyems commented Sep 14, 2022

EmreOzkose commented Oct 24, 2022

[Performance] Dynamic model input prediction is slow #12955

[Performance] Dynamic model input prediction is slow #12955

Comments

MgArcher commented Sep 14, 2022

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

wangyems commented Sep 14, 2022

EmreOzkose commented Oct 24, 2022