You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dynamic model input prediction is slow.
An image recognition model is passed in, and the prediction speed of each different size image is more than 1S. The same picture is predicted several times for the first 1S, and the next few times take 0.04s
Each input of different size matrix, the time consumption increases significantly
How does switching between different pictures take less time?
To reproduce
onnxruntime-gpu==1.9.0
import numpy as np
import onnxruntime as ort
import time
In my experiments, I can say that onnx optimize models according to input shape. I am working with speech data , inputs always have different shape (ex. (500, 80), (300, 80), etc.. ). CPU was faster than GPU. I did warmup and it made onnx model faster. However, I had to do warmup for all possible inputs.
Describe the issue
Dynamic model input prediction is slow.
An image recognition model is passed in, and the prediction speed of each different size image is more than 1S. The same picture is predicted several times for the first 1S, and the next few times take 0.04s
Each input of different size matrix, the time consumption increases significantly
How does switching between different pictures take less time?
To reproduce
onnxruntime-gpu==1.9.0
import numpy as np
import onnxruntime as ort
import time
randArray1 = np.random.random_sample(size=(6 ,3, 48, 375)).astype(np.float32)
randArray2 = np.random.random_sample(size=(6 ,3, 48, 1044)).astype(np.float32)
randArray3 = np.random.random_sample(size=(6 ,3, 48, 1537)).astype(np.float32)
model_file_path = "cls_onnx.onnx"
providers = ["CUDAExecutionProvider"]
sess = ort.InferenceSession(model_file_path,providers=providers)
input_dict = {}
input_dict["x"] = randArray3
output_tensors = None
s = time.time()
outputs = sess.run(output_tensors, input_dict)
print(time.time() - s)
input_dict = {}
input_dict["x"] = randArray3
output_tensors = None
s = time.time()
outputs = sess.run(output_tensors, input_dict)
print(time.time() - s)
input_dict = {}
input_dict["x"] = randArray1
output_tensors = None
s = time.time()
outputs = sess.run(output_tensors, input_dict)
print(time.time() - s)
input_dict = {}
input_dict["x"] = randArray1
output_tensors = None
s = time.time()
outputs = sess.run(output_tensors, input_dict)
print(time.time() - s)
Urgency
No response
Platform
Windows
OS Version
window 10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.9.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response
Model File
cls_onnx.zip
Is this a quantized model?
Unknown
The text was updated successfully, but these errors were encountered: