You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I run Triton web server on two GPUs NVIDIA RTX3090Ti with --shm-size 20g. When I do inference, I get time near 1.56s. But if I run web server with only one GPU set --gpus '"device=0"' after that I get the time near 860ms.
Length of input sequence was 256 tokens. I've optimized GPT2-medium by your script.
Hi, I run Triton web server on two GPUs NVIDIA RTX3090Ti with
--shm-size 20g
. When I do inference, I get time near1.56s
. But if I run web server with only one GPU set--gpus '"device=0"'
after that I get the time near860ms
.Length of input sequence was 256 tokens. I've optimized GPT2-medium by your script.
The text was updated successfully, but these errors were encountered: