[Question] Handling vast amount of time series as a result of model_name label #426

GolanLevy · 2023-09-06T12:15:36Z

Following the abstract in kserve/modelmesh#60 provided by @njhill, I understand that the modelmesh project is very aware of the fact that we can have tens of thousands of models, quickly swapping across many predictor instances, and sometimes used only once before being evicted in order to make room for other models.

I'm glad to see that adding a model_name as a label to metrics is configurable (and will be off for our use-case of course).
However, this is not the case for Kserve transformers' metrics, which we use for pre/postprocessing; see kserve/kserve#2589.

I wonder how you manage to deal with the following issues:

Did you find a way to ignore the model_name in Kserve transformers?
Is there a way to set TTL on time series? The accumulation of all the possible labels for metric (for example, the tuples of pod and model_name) causes the metric report to be so big that it affects our CPU usage. Most of the time series are not going to be updated for at least a few hours and we would be happy to get rid of them.
We happen to have frequent scenarios in which a predictor or a transformer receives a request for a specific model only once.
In these cases, the time series is created and used exactly once. Each Grafana query that uses any rate function (rate/increase/delta/etc) over the range vector of the metric is useless, since there is only one data point.
Prometheus are aware of this issue and recently have started to design a solution.
The community is also aware of that problem, providing solutions which usually require heavy computation queries,
or more sophisticated prometheus clients (note that this client also solves the previous bullet).
Did you find a way to deal with these scenarios?

I feel like the issues mentioned here are relevant specifically to modelmesh (and not Kserve in general) since Kserve was not designed to manage a huge amount of models.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Handling vast amount of time series as a result of model_name label #426

[Question] Handling vast amount of time series as a result of model_name label #426

GolanLevy commented Sep 6, 2023

[Question] Handling vast amount of time series as a result of model_name label #426

[Question] Handling vast amount of time series as a result of model_name label #426

Comments

GolanLevy commented Sep 6, 2023