What is the meaning of total_time exposed as a performance metrics #476
Unanswered
rahul-sentieo
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We are using the rerank functionality for reranking query, documents pair. I was checking the logs and seems like the queue time + tokenisation time + inference time is not adding up to the total time exposed by the metrics. Infact the total high is very high. Here is an example:
inference time ~ 286ms
tokenization time ~ 139ms
queue time ~ 338ms
Total time should be 286ms + 139ms + 338ms =763ms
but according to the log, total time ~ 3.03 sec
Beta Was this translation helpful? Give feedback.
All reactions