How to control request size within a batch? #2030
Unanswered
cringelord000222
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I've tried
--max-batch-size
command however it doesn't work as I've expected, I thought it was supposed to limittgi_batch_current_size
. I'd like to control the queue size & how many requests per inference batch. Can someone please clarify on this?These are my launch commands and version:
My situation:
--max-batch-size 10
tgi_batch_current_size
goes over 10 after a short whileExpected situation:
--max-batch-size 10
tgi_batch_current_size
stays at 10, and the other 30 request will stay intgi_queue_size
until its done.TLDR:
tgi_batch_current_size
doesn't align with--max-batch-size
.Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions