[Performance] first run 10x slow than the following runs with CUDAProvider #17443

chiaitian · 2023-09-07T02:46:47Z

Describe the issue

My model runs very slow in the first run with CUDAProvider, the following runs are normal. And if run another input shape, the new shape first run also very slow. CpuProvider does not have this issue. My model has a loop node, maybe this cause the issue.

To reproduce

import onnxruntime
import time
import numpy as np

input1 = np.random.rand(30,1,118,504).astype(np.float32)
input2 = np.random.rand(30,1,160,504).astype(np.float32)

bm_onnx_path = 'test.onnx'

onnx_session = onnxruntime.InferenceSession(bm_onnx_path, 
            providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
print(onnx_session.get_providers())

input_dict1 = {'images': input1} #
input_dict2 = {'images': input2} #

t1 = time.time()
onnx_pred1 = onnx_session.run(["word_probs", "word_ids"], input_dict1)[1]
st = time.time()
print("first run time ", (st - t1))
for i in range(10):
    onnx_pred1 = onnx_session.run(["word_probs", "word_ids"], input_dict1)[1]
et = time.time()
print("average", (et-st)/10)
onnx_pred2 = onnx_session.run(["word_probs", "word_ids"], input_dict2)[1]
st = time.time()
print("input2 first run time ", (st - et))
for i in range(10):
    onnx_pred2 = onnx_session.run(["word_probs", "word_ids"], input_dict2)[1]
et = time.time()
print("average", (et-st)/10)

Urgency

No response

Platform

Linux

OS Version

Ubuntu 18.04.6 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime-gpu 1.15.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU, CUDA

Execution Provider Library Version

CUDA 11.7

Model File

https://drive.google.com/drive/folders/10FhnkqPc6FLVCF6wbvQEZkpg6ptjUyJc?usp=sharing

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

skottmckay · 2023-09-07T03:28:07Z

It's normal and expected. On the first run with a given combination of input shapes the allocations required to execute the model are traced. On the second run, a single block of memory large enough to provide all these allocations is created, and offsets into that block are used during model execution. This avoids allocation/free calls.

This happens for each set of input shapes as the allocations required to execution the model are dependent on these. e.g. if batch size is 5 in one call and 10 in the second the allocations will be twice as large.

Both CPU and CUDA are doing the same thing on the first run, but allocation and free on CUDA are slower so it's more noticeable.

hariharans29 · 2023-09-07T05:38:59Z

Also as to why the first run for a new input shape is slow, please take a look at the following for an explanation and options to mitigate that phenomenon. Use the options listed only if you expect the model input to keep changing across runs, if it is expected to be fixed, the default setting is best suited for that.

#6978 (comment)

#12955

https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#cudnn_conv_algo_search

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Sep 7, 2023

skottmckay closed this as completed Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] first run 10x slow than the following runs with CUDAProvider #17443

[Performance] first run 10x slow than the following runs with CUDAProvider #17443

chiaitian commented Sep 7, 2023

skottmckay commented Sep 7, 2023

hariharans29 commented Sep 7, 2023 •

edited

Loading

[Performance] first run 10x slow than the following runs with CUDAProvider #17443

[Performance] first run 10x slow than the following runs with CUDAProvider #17443

Comments

chiaitian commented Sep 7, 2023

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

skottmckay commented Sep 7, 2023

hariharans29 commented Sep 7, 2023 • edited Loading

hariharans29 commented Sep 7, 2023 •

edited

Loading