Add mechanism to allow/deny high cardinality metrics #1102

dgrisonnet · 2020-03-23T11:22:17Z

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

Context

More and more users want to expose high cardinality metrics; however, it comes with great costs for the default configuration of kube-state-metrics.

#941
#1047

Proposal

In order to reduce the number of metrics exposed by the default configuration, we could deny some metrics by default (e.g. labels and annotations) and allow users to expose them via the arguments.

Predefined deny list example:

metricsDenyList := []string {
    "^*_labels$",
    "^*_annotations$",
}

The straight forward approach would be to append metrics denied by default to the existing deny list and remove the allowed one from it; however, in the example below, we can't really do that.

allowList: "kube_job_labels"
denyList: "^*_labels$"

The current AllowDenyList implementation can't really be used to solve this problem since it is only using one list to allow/deny metrics and as seen in the above example we can't always merge the allowList and the denyList into a single list.

Thus, I was thinking about answering the problem in the following way:

Introduce a new metric-list option taking over the behavior of the current metric-allowlist.
Change the behavior of metric-allowlist option to allow denied by default metrics.
Instead of having metric-allowlist and metric-denylist mutually exclusive they would be complementary and mutually exclusive to metric-list.

note: another solution would be to introduce a new option for allowing metrics but I felt that the existing metric-allowlist was more appropriate.

Rewrite the current AllowDenyList to allow and deny.
Create a new type implementing AllowDenyLister who will only allow certain metrics.
Create a deny list containing exact metric names and/or regex patterns.

The text was updated successfully, but these errors were encountered:

lilic · 2020-03-23T13:04:13Z

cc @brancz @tariq1890 ^

tariq1890 · 2020-03-23T17:33:30Z

Thanks for starting this conversation @dgrisonnet . The issue where we can't merge an allow and deny directive is definitely a flaw that needs fixing.

However, I lean towards some simplicity. This is my personal opinion; having a metric-list and then a dichotomous allowList and denyList does not lend itself to so much simplicity.

What I've had in mind was having a simple interface for users to provide this configuration and some rules of precedence. For eg:- A specific metric name would take higher precedence over a regex pattern. In your case, when we allow kube_job_labels and deny ^*_labels$, we would have logic in ksm which would pre-empt the labels regex from denying kube_job_labels.

Open to hearing the thoughts of the other maintainers on this. I am definitely open to being convinced otherwise as it's definitely possible that my proposal has some holes in it :)

brancz · 2020-03-24T08:59:34Z

I think we made a mistake by ever trying to use the allow/deny lists for what are essentially feature gates. I think we should have explicit feature gates for metrics we want to be off by default. Then having allow/deny list being mutually exclusive is not a problem anymore.

That said, I think label and annotation metrics are yet another case, where each label that we expose needs to be allow/deny listed (maybe even only allow-listed), instead of the metric itself. And that for each resource.

jw-s · 2020-03-24T11:09:25Z

Hi guys!

After reading the OP's proposal, I definitely think this is needed in the project however I do think it could be ugly to use when it comes to using regex and precedence logic.

I like @brancz's proposal of having feature flags/gates for each metric and then an allow/deny list IMO is simple and doesn't complicate things.

dgrisonnet · 2020-03-24T14:05:43Z

Thank you for all the comments. I also really like @brancz's proposal of having feature gates to handle denied by default metrics.

As for labels, do you think we can also apply feature gates to them? Having metrics as key and a list of labels defaulting to an empty list as value maybe?

brancz · 2020-03-24T16:43:58Z

As for labels, do you think we can also apply feature gates to them? Having metrics as key and a list of labels defaulting to an empty list as value maybe?

Yeah I think only allow-listing the labels would be the same here, unless I'm misunderstanding.

dgrisonnet · 2020-03-24T17:52:43Z

Yes, if done this way it would be the same.

I am actually more in favor of the allow-listing of labels approach rather than the allow/deny listing one. As mentioned before, we should apply labels listing to metrics like labels and annotations which I understood as applying it to metrics having an unbound number of labels. In my opinion, if we do allow/deny listing, it would only amount to reducing something that is still unbound in the end. Whilst on the other hand, by only allow-listing labels we could bound the number of labels of these metrics.

brancz · 2020-03-26T14:32:00Z

Precisely my thought as well.

lilic · 2020-03-26T15:24:20Z

Should we update this issue to reflect your comment, as now they are different from what I can tell.

tariq1890 · 2020-03-27T19:02:12Z

I'd prefer creating a new to reflect the new decisions and closing this in favour of that.

lilic · 2020-03-30T12:25:47Z

That sounds good as well! 👍 And link to this one for clarity.

fejta-bot · 2020-06-28T13:20:25Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

brancz · 2020-06-29T05:49:00Z

/remove-lifecycle stale

fejta-bot · 2020-09-27T06:40:27Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-10-27T07:24:22Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

dgrisonnet · 2020-10-27T09:01:05Z

Closing as #1125 added the feature.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 23, 2020

dgrisonnet mentioned this issue Mar 23, 2020

add annotation metrics and ability to exclude. #1041

Closed

brancz mentioned this issue Apr 7, 2020

Unexpected label is not unique error #1115

Closed

dgrisonnet mentioned this issue Apr 16, 2020

Bound the number of labels of all metrics #1124

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 28, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 29, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 27, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 27, 2020

dgrisonnet closed this as completed Oct 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mechanism to allow/deny high cardinality metrics #1102

Add mechanism to allow/deny high cardinality metrics #1102

dgrisonnet commented Mar 23, 2020

lilic commented Mar 23, 2020

tariq1890 commented Mar 23, 2020 •

edited

Loading

brancz commented Mar 24, 2020 •

edited

Loading

jw-s commented Mar 24, 2020

dgrisonnet commented Mar 24, 2020

brancz commented Mar 24, 2020

dgrisonnet commented Mar 24, 2020

brancz commented Mar 26, 2020 •

edited

Loading

lilic commented Mar 26, 2020

tariq1890 commented Mar 27, 2020

lilic commented Mar 30, 2020

fejta-bot commented Jun 28, 2020

brancz commented Jun 29, 2020

fejta-bot commented Sep 27, 2020

fejta-bot commented Oct 27, 2020

dgrisonnet commented Oct 27, 2020

Add mechanism to allow/deny high cardinality metrics #1102

Add mechanism to allow/deny high cardinality metrics #1102

Comments

dgrisonnet commented Mar 23, 2020

lilic commented Mar 23, 2020

tariq1890 commented Mar 23, 2020 • edited Loading

brancz commented Mar 24, 2020 • edited Loading

jw-s commented Mar 24, 2020

dgrisonnet commented Mar 24, 2020

brancz commented Mar 24, 2020

dgrisonnet commented Mar 24, 2020

brancz commented Mar 26, 2020 • edited Loading

lilic commented Mar 26, 2020

tariq1890 commented Mar 27, 2020

lilic commented Mar 30, 2020

fejta-bot commented Jun 28, 2020

brancz commented Jun 29, 2020

fejta-bot commented Sep 27, 2020

fejta-bot commented Oct 27, 2020

dgrisonnet commented Oct 27, 2020

tariq1890 commented Mar 23, 2020 •

edited

Loading

brancz commented Mar 24, 2020 •

edited

Loading

brancz commented Mar 26, 2020 •

edited

Loading