Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Dynamic model input prediction is slow #12955

Open
MgArcher opened this issue Sep 14, 2022 · 2 comments
Open

[Performance] Dynamic model input prediction is slow #12955

MgArcher opened this issue Sep 14, 2022 · 2 comments
Labels
core runtime issues related to core runtime

Comments

@MgArcher
Copy link

Describe the issue

Dynamic model input prediction is slow.
An image recognition model is passed in, and the prediction speed of each different size image is more than 1S. The same picture is predicted several times for the first 1S, and the next few times take 0.04s
Each input of different size matrix, the time consumption increases significantly
How does switching between different pictures take less time?

To reproduce

onnxruntime-gpu==1.9.0

image

import numpy as np
import onnxruntime as ort
import time

randArray1 = np.random.random_sample(size=(6 ,3, 48, 375)).astype(np.float32)
randArray2 = np.random.random_sample(size=(6 ,3, 48, 1044)).astype(np.float32)
randArray3 = np.random.random_sample(size=(6 ,3, 48, 1537)).astype(np.float32)

model_file_path = "cls_onnx.onnx"
providers = ["CUDAExecutionProvider"]
sess = ort.InferenceSession(model_file_path,providers=providers)

input_dict = {}
input_dict["x"] = randArray3
output_tensors = None
s = time.time()
outputs = sess.run(output_tensors, input_dict)
print(time.time() - s)

input_dict = {}
input_dict["x"] = randArray3
output_tensors = None
s = time.time()
outputs = sess.run(output_tensors, input_dict)
print(time.time() - s)

input_dict = {}
input_dict["x"] = randArray1
output_tensors = None
s = time.time()
outputs = sess.run(output_tensors, input_dict)
print(time.time() - s)

input_dict = {}
input_dict["x"] = randArray1
output_tensors = None
s = time.time()
outputs = sess.run(output_tensors, input_dict)
print(time.time() - s)

Urgency

No response

Platform

Windows

OS Version

window 10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.9.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

Model File

cls_onnx.zip

Is this a quantized model?

Unknown

@wangyems wangyems added the core runtime issues related to core runtime label Sep 14, 2022
@wangyems
Copy link
Contributor

similar to this: #6978

@EmreOzkose
Copy link

In my experiments, I can say that onnx optimize models according to input shape. I am working with speech data , inputs always have different shape (ex. (500, 80), (300, 80), etc.. ). CPU was faster than GPU. I did warmup and it made onnx model faster. However, I had to do warmup for all possible inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime
Projects
None yet
Development

No branches or pull requests

3 participants