Additional exported metrics for the metrics apiserver #4460

BojanZelic · 2023-04-17T20:44:35Z

Proposal

Provide similar exported metrics that the Kubernetes metrics-server has that are useful for tracking SLOs for KEDA (ex latency, or % of successful requests, ect...)

ex:

apiserver_request_total{group="external.metrics.k8s.io"}
apiserver_request_slo_duration_seconds{group="external.metrics.k8s.io"}

Use-Case

I would like to get the number of requests that have failed to keda api server; currently this isn't possible, with metrics-server, I can do something like:

(
  sum(rate(apiserver_request_total{group="metrics.k8s.io", code=~"(5..|429)"}[5m])) +
  sum(rate(apiserver_request_terminations_total{group="metrics.k8s.io"}[5m]))
)

This is because apiserver_request_total{group="metrics.k8s.io"} is exposed by the metrics-server and metrics-server uses the apiserver package from kubernetes

KEDA could expose similar metrics either by using the apiserver package from Kubernetes or implementing something similar;

Is this a feature you are interested in implementing yourself?

Yes

Anything else?

the aggregation api from kubernetes tracks failed requests to external.metrics.k8s.io via apiserver_request_terminations_total ; it does not track the total # of requests though; There's currently no way to get the total # of requests to external.metrics.k8s.io since the keda metrics apiserver doesn't expose this information;

The text was updated successfully, but these errors were encountered:

JorTurFer · 2023-04-23T21:52:02Z

hi @BojanZelic
It totally makes sense, I assign the issue to you (based on your response).
I have a question about this, will you implement something similar in the operator too?

BojanZelic · 2023-04-24T18:49:32Z

I was going to implement it for the api server only; Because the operator already exposes prometheus metrics for reconciliation time, errors, # of reconciliations, ect...

controller_runtime_reconcile_time_seconds_sum
controller_runtime_reconcile_time_seconds_count
controller_runtime_reconcile_time_seconds_bucket
controller_runtime_reconcile_total
workqueue_work_duration_seconds_bucket

But the api server lacks observability for ensuring correct responses & response times;

JorTurFer · 2023-04-24T19:01:42Z

The metrics server should have also the same metrics because it's built on top of the operator framework

BojanZelic · 2023-04-24T19:59:39Z

It looks like operator framework provided metrics exist for the keda api server under port 8080, the keda_metrics_adapter_* metrics are under port 9022;

which is great for figuring out if the MetricsScaledObjectReconciler is encountering errors;

But what I want to achieve is wether or not the request is succeeding or failing from the perspective of the HPA controller
ex:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/mynamespace/mymetric"

& figure out what percentage of those requests fail;

so the concept doesn't apply to the operator? unless you mean the grpc communication between the metrics server & the operator, in that case, yes I could add those same metrics there

JorTurFer · 2023-04-24T20:04:43Z

Okey... it's complicated...
We deprecated the metrics server's prometheus server in port 9022 in KEDA v2.9 , so in theory we should remove now (as part of next release). It's true that metrics server exposes 2 prometheus server, but all the metrics should be added to the runtime server (port 8080) because the other will be removed (keda_metrics_adapter_* are deprecated).
You can add new metrics to the metrics server but in the runtime server, and there, you can add these new metrics that you proposed.

From operator pov, I guess that we can expose similar metrics based on the gRPC request, WDYT?

JorTurFer · 2023-04-24T20:08:15Z

I plan to remove them during this week, so if you wait until that, the metrics server will have only one Prometheus server, making the things easier xD

BojanZelic · 2023-04-24T20:25:02Z

makes sense! thanks for the explanation. The new metrics can be added to :8080

as for grpc metrics, it seems like we're using https://github.com/grpc-ecosystem/go-grpc-middleware so having it use the prometheus middleware should be straightforward and would give us grpc prometheus metrics ; - https://github.com/grpc-ecosystem/go-grpc-middleware/tree/main/providers/prometheus

JorTurFer · 2023-04-24T21:37:17Z

#3972

zroubalik · 2023-05-19T09:20:20Z

@BojanZelic fyi, the deprecated prometheus server from Metrics Server has been removed.

I don't follow the part about gRPC, what kind of stuff do you want to expose there? Thanks!

JorTurFer · 2023-05-19T09:49:35Z

We have removed the second Prometheus server but they can still add those metrics to the operator-runtime prometheus (exposed by port 8080 in the metrics server).

BojanZelic · 2023-09-15T22:37:17Z

So an update

after some digging I discovered that there's actually another metric endpoint not documented that exposes these metrics already without a code change. It needs to be accessed from an allowed role & I updated the docs here: kedacore/keda-docs#1222

I ended up not needing the GRPC metrics as they weren't useful and essentially would report similar latency metrics as apiserver_request_sli_duration_seconds_bucket

I tried to combine the metrics so that they're exposed via the controller-runtime port (8080) and endpoint but was unsuccessful since:

controller-runtime uses it's own abstracted global prometheus metric registry
k8s.io/apiserver also uses it's own abstracted global prometheus metric registry

and there's no easy way to combine the 2 since they both alter global variables, use internal packages and make it difficult to change implementation details and both also register the same collectors (ex: collectors.NewProcessCollector & collectors.NewGoCollector())

JorTurFer · 2023-09-17T16:01:18Z

Nice research!!!
I thought that we weren't using k8s.io/apiserver, but it looks that we do it. I'm not sure if we can handle it from KEDA directly but maybe we can do something like the custom-metrics-apiserver does: https://github.com/kubernetes-sigs/custom-metrics-apiserver/blob/master/pkg/apiserver/metrics/metrics.go#L37-L40

Exposing 2 metrics server doesn't make sense IMHO and we should choose one. If there isn't any way to merge them, maybe we have to open issues on the upstream to allow this somehow. @zroubalik ?

BojanZelic · 2023-09-17T17:20:11Z

That makes sense on exposing only 1 port; I took another stab at it, and I was able to get it working when exposing via the 6443 port, but not the controller-runtime port (8080); Just because of the abstractions that k8s.io/apiserver library put in place with their prometheus wrappers makes it difficult;

~~Because the prometheus registries have conflict, we can unregister some of the go & process collectors from controller-runtime and expose both of them to the 6443 port;~~

something like:
see: https://github.com/kubernetes-sigs/metrics-server/blob/master/pkg/server/config.go#L83

and then metric server just calls ServeHttp on both handlers:
https://github.com/kubernetes-sigs/metrics-server/blob/master/pkg/server/config.go#L117-L118

~~The only caviat here is that this endpoint has authentication and needs an rbac rule for access - see kedacore/keda-docs#1222 (comment);~~

Got it to work under the existing :8080 port

What do you think?

BojanZelic · 2023-09-18T18:26:28Z

@JorTurFer I created a PR to consolidate the metrics to one port; PTAL #4982

BojanZelic added feature-request All issues for new features that have not been committed to needs-discussion labels Apr 17, 2023

keda-automation added this to Roadmap - KEDA Core Apr 17, 2023

github-project-automation bot moved this to To Triage in Roadmap - KEDA Core Apr 17, 2023

JorTurFer assigned BojanZelic Apr 23, 2023

tomkerkhove added feature All issues for new features that have been committed to and removed feature-request All issues for new features that have not been committed to labels Apr 26, 2023

tomkerkhove moved this from To Triage to In Progress in Roadmap - KEDA Core Apr 26, 2023

BojanZelic mentioned this issue Sep 15, 2023

Add docs for additional metrics provided by metrics server kedacore/keda-docs#1222

Closed

1 task

This was referenced Sep 18, 2023

Metric Server - consolidate apiserver to one port #4982

Merged

docs: Additional metrics provided by metrics server kedacore/keda-docs#1225

Merged

zroubalik closed this as completed in #4982 Sep 25, 2023

github-project-automation bot moved this from In Progress to Ready To Ship in Roadmap - KEDA Core Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional exported metrics for the metrics apiserver #4460

Additional exported metrics for the metrics apiserver #4460

BojanZelic commented Apr 17, 2023 •

edited

Loading

JorTurFer commented Apr 23, 2023

BojanZelic commented Apr 24, 2023

JorTurFer commented Apr 24, 2023

BojanZelic commented Apr 24, 2023

JorTurFer commented Apr 24, 2023

JorTurFer commented Apr 24, 2023

BojanZelic commented Apr 24, 2023

JorTurFer commented Apr 24, 2023

zroubalik commented May 19, 2023

JorTurFer commented May 19, 2023

BojanZelic commented Sep 15, 2023

JorTurFer commented Sep 17, 2023

BojanZelic commented Sep 17, 2023 •

edited

Loading

BojanZelic commented Sep 18, 2023

Additional exported metrics for the metrics apiserver #4460

Additional exported metrics for the metrics apiserver #4460

Comments

BojanZelic commented Apr 17, 2023 • edited Loading

Proposal

Use-Case

Is this a feature you are interested in implementing yourself?

Anything else?

JorTurFer commented Apr 23, 2023

BojanZelic commented Apr 24, 2023

JorTurFer commented Apr 24, 2023

BojanZelic commented Apr 24, 2023

JorTurFer commented Apr 24, 2023

JorTurFer commented Apr 24, 2023

BojanZelic commented Apr 24, 2023

JorTurFer commented Apr 24, 2023

zroubalik commented May 19, 2023

JorTurFer commented May 19, 2023

BojanZelic commented Sep 15, 2023

JorTurFer commented Sep 17, 2023

BojanZelic commented Sep 17, 2023 • edited Loading

BojanZelic commented Sep 18, 2023

BojanZelic commented Apr 17, 2023 •

edited

Loading

BojanZelic commented Sep 17, 2023 •

edited

Loading