/metrics only start returning contents after container's FIRST inference was called? #2570

cringelord000222 · 2024-09-26T03:10:57Z

cringelord000222
Sep 26, 2024

Hi there,

I am using docker compose for my TGI and I've tried with 3 different versions, 2.0.4, 2.2.0 & 2.3.0.

I've used chrome & postman to call the metrics endpoint and apparently they will return full blank, not even metrics with value zeros.
The contents for /metrics will start showing only after calling /generate once.

So my pipeline handles incoming request by querying metrics first (in particularly tgi_batch_current_size and tgi_queue_size to check the queue), then only sends requests, meaning the first incoming request would get an error because metrics return blank.

Right now I have to include a hidden "first inference call" in my deployment script, to trigger metrics to return something (I don't mind if they return zeros).

Am I doing things wrong?

Suggestion:
Can we publish all the metrics with value 0 once, when TGI server has initialized? Instead of publishing after first inference call was made.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/metrics only start returning contents after container's FIRST inference was called? #2570

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

/metrics only start returning contents after container's FIRST inference was called? #2570

cringelord000222 Sep 26, 2024

Replies: 0 comments

cringelord000222
Sep 26, 2024