Two GPU are slower than one #156

OleksandrKorovii · 2022-12-07T16:09:26Z

Hi, I run Triton web server on two GPUs NVIDIA RTX3090Ti with --shm-size 20g. When I do inference, I get time near 1.56s. But if I run web server with only one GPU set --gpus '"device=0"' after that I get the time near 860ms.
Length of input sequence was 256 tokens. I've optimized GPT2-medium by your script.

convert_model -m gpt2-medium \
    --backend tensorrt onnx \
    --seq-len 32 512 512 \
    --task text-generation --atol=2"

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two GPU are slower than one #156

Two GPU are slower than one #156

OleksandrKorovii commented Dec 7, 2022

Two GPU are slower than one #156

Two GPU are slower than one #156

Comments

OleksandrKorovii commented Dec 7, 2022