/metrics only start returning contents after container's FIRST inference was called? #2570
Unanswered
cringelord000222
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there,
I am using docker compose for my TGI and I've tried with 3 different versions,
2.0.4
,2.2.0
&2.3.0
.I've used chrome & postman to call the metrics endpoint and apparently they will return full blank, not even metrics with value zeros.
The contents for /metrics will start showing
only after calling /generate
once.So my pipeline handles incoming request by querying metrics first (in particularly
tgi_batch_current_size
andtgi_queue_size
to check the queue), then only sends requests, meaning the first incoming request would get an error because metrics return blank.Right now I have to include a hidden "first inference call" in my deployment script, to trigger metrics to return something (I don't mind if they return zeros).
Am I doing things wrong?
Suggestion:
Can we publish all the metrics with value 0 once, when TGI server has initialized? Instead of publishing after first inference call was made.
Beta Was this translation helpful? Give feedback.
All reactions