Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for config file changes #1916

Merged
merged 1 commit into from
Dec 7, 2022

Conversation

mrueg
Copy link
Member

@mrueg mrueg commented Nov 29, 2022

What this PR does / why we need it:
Provides metrics for config file updates when viper picks up a new config.

This uses code pieces from prometheus/alertmanager in https://github.com/prometheus/alertmanager/blob/main/config/coordinator.go#LL56C26-L56C26 licensed under Apache-2.0.

kube_state_metrics_config_hash 4.0061079457904e+13 kube_state_metrics_config_last_reload_success_timestamp_seconds 1.6697483049487052e+09 kube_state_metrics_config_last_reload_successful 1

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)
Adds three metric series to the ksm exporter, not the kubernetes exporter.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1893

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 29, 2022
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 29, 2022
pkg/options/options.go Outdated Show resolved Hide resolved
Copy link
Member

@rexagod rexagod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics will be very useful, thanks Manuel!

pkg/app/server.go Show resolved Hide resolved
pkg/app/server.go Show resolved Hide resolved
pkg/app/server.go Show resolved Hide resolved
})
configSuccess := promauto.With(ksmMetricsRegistry).NewGauge(
prometheus.GaugeOpts{
Name: "kube_state_metrics_config_last_reload_successful",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer kube_state_metrics_last_config_reload_successful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree, in future we might have other config files as well though, that's why I added the infix.
Another config we might want to create metrics for once the hot reload for it is in place as well:
kube_state_metrics_customresourcestateconfig_last_reload_successful

Alternatively we could drop it from the name and use:

kube_state_metrics_last_reload_successful{type="config"}
kube_state_metrics_last_reload_successful{type="customresourcestateconfig"}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for using labels, seems to reduce verbosity and users can also monitor all config updates through a single metric.

pkg/app/server.go Show resolved Hide resolved
pkg/app/server.go Show resolved Hide resolved
pkg/app/server.go Show resolved Hide resolved
})
configSuccessTime := promauto.With(ksmMetricsRegistry).NewGauge(
prometheus.GaugeOpts{
Name: "kube_state_metrics_config_last_reload_success_timestamp_seconds",
Copy link
Contributor

@CatherineF-dev CatherineF-dev Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about kube_state_metrics_last_config_uptime?
So that we can easily know whether there are some changes to config recently.

It equals to current_timestamp - kube_state_metrics_config_last_reload_success_timestamp_seconds

Copy link
Member Author

@mrueg mrueg Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep it as it, as a timestamp over a counter has a couple of advantages:

@mrueg
Copy link
Member Author

mrueg commented Dec 1, 2022

# HELP kube_state_metrics_config_hash Hash of the currently loaded configuration.
# TYPE kube_state_metrics_config_hash gauge
kube_state_metrics_config_hash{filename="config.yml",type="config"} 2.76423981094048e+14
# HELP kube_state_metrics_last_config_reload_success_timestamp_seconds Timestamp of the last successful configuration reload.
# TYPE kube_state_metrics_last_config_reload_success_timestamp_seconds gauge
kube_state_metrics_last_config_reload_success_timestamp_seconds{filename="config.yml",type="config"} 1.669900259628547e+09
# HELP kube_state_metrics_last_config_reload_successful Whether the last configuration reload attempt was successful.
# TYPE kube_state_metrics_last_config_reload_successful gauge
kube_state_metrics_last_config_reload_successful{filename="config.yml",type="config"} 1

So this is how metrics currently look like. Any thoughts on the config hash? We could make the value static (=1) and add it as a label instead.

@rexagod
Copy link
Member

rexagod commented Dec 1, 2022

Seems fine to me to keep it as the value.

@logicalhan
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 1, 2022
@fpetkovski
Copy link
Contributor

/lgtm
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 2, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2022
pkg/app/server.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2022
This uses code pieces from prometheus/alertmanager in https://github.com/prometheus/alertmanager/blob/main/config/coordinator.go#LL56C26-L56C26
licensed under Apache-2.0.

kube_state_metrics_config_hash{type="config", filename="config.yml"} 4.0061079457904e+13
kube_state_metrics_config_last_reload_success_timestamp_seconds{type="config", filename="config.yml"} 1.6697483049487052e+09
kube_state_metrics_config_last_reload_successful{type="config",
filename="config.yml"} 1

Signed-off-by: Manuel Rüger <manuel@rueg.eu>
@rexagod
Copy link
Member

rexagod commented Dec 2, 2022

/lgtm
/hold
/cc @dgrisonnet @logicalhan @fpetkovski

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fpetkovski, mrueg, rexagod

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mrueg
Copy link
Member Author

mrueg commented Dec 7, 2022

/hold cancel

Let's proceed here

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 7, 2022
@k8s-ci-robot k8s-ci-robot merged commit 143f94d into kubernetes:master Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for meta-metrics
6 participants