Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up metrics for scheduler, descheduler and estimator #2899

Merged
merged 1 commit into from
Dec 15, 2022

Conversation

Garrybest
Copy link
Member

@Garrybest Garrybest commented Dec 2, 2022

Signed-off-by: Garrybest garrybest@foxmail.com

What type of PR is this?
/kind cleanup
/kind feature

What this PR does / why we need it:

Note that scheduler, descheduler and estimator does not use controller-runtime, so we'd better use metrics pkg of k8s.io/component-base/metrics instead of primitive prometheus pkg. In kubernetes, all workqueue metrics is registered in k8s.io/component-base/metrics, so that we could get extra workqueue metrics in our metric http handler.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@karmada-bot karmada-bot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. labels Dec 2, 2022
@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 2, 2022
@Garrybest Garrybest force-pushed the pr_metrics branch 2 times, most recently from a303261 to a184892 Compare December 2, 2022 16:11
@Garrybest
Copy link
Member Author

There is something wrong with the e2e environment again.

@RainbowMango
Copy link
Member

There is something wrong with the e2e environment again.

Yeah. Just re-run it.

echo from https://github.com/karmada-io/karmada/actions/runs/3603267239/jobs/6071407672:

failed to delete nodes: command "docker rm -f -v member-e2e-4hc7h-control-plane" failed with error: exit status 1

@RainbowMango
Copy link
Member

/assign

@RainbowMango
Copy link
Member

All Kubernetes metrics are emitted from Metrics Stability Framework, and each metric will be assigned a stable status, like ALPHA/BETA/STABLE. Following is the example grabbed from karmada-scheduler:

# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0

so we'd better use metrics pkg of k8s.io/component-base/metrics instead of primitive prometheus pkg. In kubernetes, all workqueue metrics is registered in k8s.io/component-base/metrics, so that we could get extra workqueue metrics in our metric http handler.

Do you want to get more metrics from the k8s.io/component-base/metrics package or want to adopt the Metrics Stability Framework?

@Garrybest
Copy link
Member Author

Garrybest commented Dec 5, 2022

Do you want to get more metrics from the k8s.io/component-base/metrics package or want to adopt the Metrics Stability Framework?

Yes, now we cannot get the metrics of workqueue in scheduler. This is important, I think. The pkg k8s.io/component-base/metrics could help us report these internal metrics automatically.

@RainbowMango
Copy link
Member

Using k8s.io/component-base/metrics means adopting the Metrics Stability Framework, I'm not sure if the framework ready for use by other projects yet. I'll figure it out.

@RainbowMango
Copy link
Member

Just quickly went through the Kubernetes code, unfortunately, the Metrics Stability Framework can not be imported by other projects.

The main reason is the framework needs to know the current version to figure out if specific metrics should be hidden or not.
In k/k the version is get from version.Get(), like:
https://github.com/kubernetes/kubernetes/blob/17bf864c1fc20e4badc9a6e0659c5190310462e1/staging/src/k8s.io/component-base/metrics/gauge.go#L246

But, in Karmada, we can't get the version because the version is set only during the k/k building phase.
Without the version, even though we can define metrics' stable labels, we can't really deprecate them according to the stability framework.

@Garrybest
Copy link
Member Author

Thanks for reminding, I believe I should find another way to report the workqueue metrics.

@Garrybest
Copy link
Member Author

I'm trying to set all metrics with registry in controller-runtime. Now we could report the internal workqueue metrics. This is like what components using controller-runtime does, i.e., karmada-agent and karmada-controller-runtime.

@codecov-commenter
Copy link

Codecov Report

Merging #2899 (cf17626) into master (db4e2ab) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #2899   +/-   ##
=======================================
  Coverage   38.51%   38.52%           
=======================================
  Files         205      205           
  Lines       18814    18814           
=======================================
+ Hits         7247     7248    +1     
  Misses      11137    11137           
+ Partials      430      429    -1     
Flag Coverage Δ
unittests 38.52% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/scheduler/scheduler.go 18.51% <100.00%> (ø)
pkg/search/proxy/store/util.go 92.89% <0.00%> (-0.95%) ⬇️
pkg/util/worker.go 71.42% <0.00%> (+4.76%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Garrybest
Copy link
Member Author

This lint should be ignored since we must insure sigs.k8s.io/controller-runtime/pkg/metrics is imported before k8s.io/apiserver/pkg/server.

@RainbowMango
Copy link
Member

cc @Poor12 for help.

@Poor12
Copy link
Member

Poor12 commented Dec 12, 2022

It works well at my side. It mainly solves this problem by uniformly registering indicators to the controller-runtime registry.
If lint should be ignored, we can add a comment to skip it.

@Garrybest Garrybest force-pushed the pr_metrics branch 2 times, most recently from c0f341b to 23d59ab Compare December 12, 2022 08:19
Signed-off-by: Garrybest <garrybest@foxmail.com>
@Garrybest
Copy link
Member Author

cc @RainbowMango

@RainbowMango
Copy link
Member

Hmm, I'm ok with that disable the checks temporarily.
Can we make a list of the changes that should be made against the upstream(controller-runtime, k/k) to get rid of the exceptions?

@Poor12
Copy link
Member

Poor12 commented Dec 14, 2022

I have went through the kubernetes code in component-base/metrics. I'm afraid that it's not easy to make changes against upstream.
In component-base/metrics, we might export the BuildVersion Func to manual set the current version in package metrics. However, package legacyregistry encapsulates the methods provided to third parties based on defaultRegistry which used version.Get. It means that even if we set a custom registry, we cannot use any func in legacyregistry. So I suggest using controller-runtime.
It works well. The only problem is we import k8s.io/apiserver/pkg meantime, which will init workqueue metrics too.

@Garrybest
Copy link
Member Author

That's right. So there are 2 things we need to do:

  1. Register all metrics to ctrlmetrics.Registry instead of default registry in component-base/metrics or promhttp. We only need one registry.
  2. Initialize pkg sigs.k8s.io/controller-runtime/pkg/metrics before k8s.io/apiserver/pkg/server because workqueue metric could only be registered to one registry which is initialized first.

I think this PR fix these two steps.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 15, 2022
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 15, 2022
@karmada-bot karmada-bot merged commit 6111ad0 into karmada-io:master Dec 15, 2022
@Garrybest Garrybest deleted the pr_metrics branch December 15, 2022 08:56
@RainbowMango RainbowMango added this to the v1.5 milestone Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants