-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check stable metrics aren't changed #1836
Check stable metrics aren't changed #1836
Conversation
72fa970
to
b4d71ec
Compare
cc @dgrisonnet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally like the idea behind that PR but this is quite challenging to bring to kube-state-metrics since the metrics are generated at runtime and so you might not get all the labels by default.
One solution could be to pass a runtime object to the generator that satisfies all the conditions to generate metrics with all the labels, but that means that we should make sure that the object is updated every time a new condition is added so it isn't very safe.
Another solution that would be safer is to statically visit the generators to build a metrics object. From the generator itself, we could get all the metadata of the metrics and we could get the full list of labels by walking through the generator function and finding assignation to LabelKeys
. That would be a beefy work to do, but it could be reused to both generate and test the stable metrics, so that would be my suggestion. (maybe also to generate some doc in the future)
--
With regards to the actual content of the PR, would you mind first opening a PR to introduce the stability framework and the help text annotation? That would make it easier to review. As I commented in the PR, I am generally more in favor of using the Kubernetes framework compare to the very basic one we have in KSM, so feel free to go with the component-base one.
@@ -0,0 +1,108 @@ | |||
/* | |||
Copyright 2016 The Kubernetes Authors All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update to 2022
pkg/metric_generator/generator.go
Outdated
const ( | ||
// For metrics haven't migrated into NewFamilyGeneratorWithStability. Its | ||
// level is UNKNOWN. You can find its metric level from documentation. | ||
UNKNOWN StabilityLevel = "UNKNOWN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding an unknown stability level, I would be in favor of defaulting to experimental in a similar way as what we are doing in component-base.
pkg/metric_generator/generator.go
Outdated
// NewFamilyGenerator creates new FamilyGenerator instances. | ||
func NewFamilyGenerator(name string, help string, metricType metric.Type, deprecatedVersion string, generateFunc func(obj interface{}) *metric.Family) *FamilyGenerator { | ||
// StabilityLevel represents the API guarantees for a given defined metric. | ||
type StabilityLevel string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To align with Kubernetes metrics stability framework we could directly reuse the upstream stability type: https://github.com/kubernetes/kubernetes/blob/ea0764452222146c47ec826977f49d7001b0ea8c/staging/src/k8s.io/component-base/metrics/opts.go#L64-L74
Sure! Thanks! Agree on visiting the generators to build a metrics object. |
It's not easy to do static analysis for KSM metrics. So plan to use e2e tests to validate stable metrics (suggested by han). Also will support SKIP_METRICS_LIST, given future stable metrics might be only exposed when certain conditions happen. For these metrics which aren't exposed in e2e tests, we can use unit-test. These metrics aren't exposed in e2e test:
|
Add presubmit to check stable metrics are stable
b4d71ec
to
68c8bda
Compare
This reverts commit 68c8bda.
3b84d9b
to
f16ee89
Compare
cc @rexagod @dgrisonnet @logicalhan it's ready to review this PR which adds presubmit check for stable metrics. |
f16ee89
to
3f7cbbd
Compare
tests/stable/check_test.go
Outdated
t.Fatalf("Can't read stable metrics from file. err = %v", err) | ||
} | ||
|
||
err = compare(collectedStableMetrics, *expectedStableMetrics, skipStableMetrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, i would just sort the metrics, output a yaml string and do a simple string comparison.
This is not only simpler, but when you generate the yaml string, you'll end up with deterministic ordering which will make the diff more pleasant.
bd70811
to
53bfd5f
Compare
4737bb2
to
000a03a
Compare
Ping~ |
000a03a
to
25397eb
Compare
@@ -0,0 +1,23 @@ | |||
These stable metrics aren't covered in stablemetrics.yaml, because they aren't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more documentation on how to run this stuff?
01d1960
to
db9c59a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/hold In updating documents. |
db9c59a
to
987670e
Compare
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: CatherineF-dev, logicalhan The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
987670e
to
1bd6901
Compare
Hi Damien, could you help review as well? Thanks! cc @dgrisonnet |
@@ -0,0 +1,23 @@ | |||
These stable metrics aren't covered in stablemetrics.yaml, because they aren't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit problematic since ideally we would want a system that verifies that none of the stable metrics change.
Another problem we might encounter is that some labels will not verified because they are not present in the e2e tests. I don't have any example of stable metrics like that in mind, but we have for experimental ones: https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/endpointslice.go#L77-L148.
If we were to verify that metric today in the e2e tests, we would have to make sure that the endpointslice object is complete which is very prone to errors.
I recall we discussed implementing a static analysis solution to make sure that we were going over all the metrics and labels, was that not possible in the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a static analysis solution
It's hard to use static analysis for current kube-state-metrics repo, because labels are too dynamic.
For examples,
- It uses
labelKeys = append(labelKeys, "ready")
in kube_endpointslice_endpoints. - Labels are from another function.
One possible way to make static analysis easier is to require defining all possible labels first.
const labels []string = {"label1", "label2", ...} // const. So that it can't be appended later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- There are two things that are difficult to handle in that example:
1.1 Having an intermediate labelKeys variable mean that we can't simply have the tool look for the labelKeys field assignation but might need to also build a constant version of the variable assigned to the field. An idea could be to have the tool first look for the labelKeys field assignation and in case the RValue is a variable and not a constant, iterate over the code again to build a const version of that variable.
1.2append
should be the only callExp that we will ever see so having a case in the tool just for it would make sense to me. The idea would to replace append(labelKeys, "ready") but building a temporary slice and putting the "ready" string inside of it to convert the variable array to a constant one for our static analysis. - We won't have to handle this scenario because it only affects
_labels
and_annotations
metrics which have variable labels by design and as such shouldn't have any stability guarantees.
One possible way to make static analysis easier is to require defining all possible labels first.
This could make sense but I am afraid that this might make label values assignation error-prone since we would have to use the indexes instead of just appending the new labels.
The only option I see today in order to catch all the labels of all the metrics is via static analysis and if we don't want to make that intrusive to the developers to avoid errors, we will need to make the tool intelligent.
For the static analysis I am thinking that the following could be viable:
- Create an AST of each store and iterate over every generator code
- Find the labelKeys field assignation
- If the RValue isn't constant, reiterate over the generators to build a constant version of the variable. That should be possible because even though the slices are dynamic, there values are all constant.
Writing this I noticed that there is another scenario that we would need to handle which is the one of default labels:
descEndpointSliceLabelsDefaultLabels = []string{"endpointslice"} |
These should always be define at the top-level of each store as a constant slice so we should be able to easily add them to each metrics.
This is becoming quite complex so if you have any other suggestions I would gladly hear them but I don't think that we can avoid static analysis if we want to build consistent guarantees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we can avoid static analysis if we want to build consistent guarantees.
Agree
This could make sense but I am afraid that this might make label values assignation error-prone since we would have to use the indexes instead of just appending the new labels.
Could you provide an example for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide an example for this case?
I think my reasoning was based on examples such as https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/endpointslice.go#L77-L148. In these cases order is super important since we build the list of label values one label at a time. I thought that if we were to define label keys upfront, future addition of labels might mess with existing ordering.
But thinking about it again we could transform the code into something like:
const labelKeys []string = {"ready", "serving", ...}
ready := ""
if ep.Conditions.Ready != nil {
ready = ep.Conditions.Ready
}
serving := ""
if ep.Conditions.Serving != nil {
serving = ep.Conditions.Serving
}
m = append(m, &metric.Metric,
LabelKeys: labelKeys,
LabelValues: []string{ready, serving},
Value: 1,
})
This original reasoning for how it is coded in the example I shared above was to reduce the network footprint by removing empty labels. But it could be argued that it doesn't have much impact + we could do processing later on when we build the HTTP response.
So I think your idea of defining const label keys everywhere makes sense and would ease very much static analysis.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Check stable metrics are stable
What this PR does / why we need it: Guarantee metric name, labels aren't changed for stable metrics.
How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)
does not change cardinality
Which issue(s) this PR fixes:
Fixes #1833