Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mechanism to allow/deny high cardinality metrics #1102

Closed
dgrisonnet opened this issue Mar 23, 2020 · 16 comments
Closed

Add mechanism to allow/deny high cardinality metrics #1102

dgrisonnet opened this issue Mar 23, 2020 · 16 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@dgrisonnet
Copy link
Member

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

Context

More and more users want to expose high cardinality metrics; however, it comes with great costs for the default configuration of kube-state-metrics.

#941
#1047

Proposal

In order to reduce the number of metrics exposed by the default configuration, we could deny some metrics by default (e.g. labels and annotations) and allow users to expose them via the arguments.

Predefined deny list example:

metricsDenyList := []string {
    "^*_labels$",
    "^*_annotations$",
}

The straight forward approach would be to append metrics denied by default to the existing deny list and remove the allowed one from it; however, in the example below, we can't really do that.

allowList: "kube_job_labels"
denyList: "^*_labels$"

The current AllowDenyList implementation can't really be used to solve this problem since it is only using one list to allow/deny metrics and as seen in the above example we can't always merge the allowList and the denyList into a single list.

Thus, I was thinking about answering the problem in the following way:

  • Introduce a new metric-list option taking over the behavior of the current metric-allowlist.
  • Change the behavior of metric-allowlist option to allow denied by default metrics.
  • Instead of having metric-allowlist and metric-denylist mutually exclusive they would be complementary and mutually exclusive to metric-list.

note: another solution would be to introduce a new option for allowing metrics but I felt that the existing metric-allowlist was more appropriate.

  • Rewrite the current AllowDenyList to allow and deny.
  • Create a new type implementing AllowDenyLister who will only allow certain metrics.
  • Create a deny list containing exact metric names and/or regex patterns.
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 23, 2020
@lilic
Copy link
Member

lilic commented Mar 23, 2020

cc @brancz @tariq1890 ^

@tariq1890
Copy link
Contributor

tariq1890 commented Mar 23, 2020

Thanks for starting this conversation @dgrisonnet . The issue where we can't merge an allow and deny directive is definitely a flaw that needs fixing.

However, I lean towards some simplicity. This is my personal opinion; having a metric-list and then a dichotomous allowList and denyList does not lend itself to so much simplicity.

What I've had in mind was having a simple interface for users to provide this configuration and some rules of precedence. For eg:- A specific metric name would take higher precedence over a regex pattern. In your case, when we allow kube_job_labels and deny ^*_labels$, we would have logic in ksm which would pre-empt the labels regex from denying kube_job_labels.

Open to hearing the thoughts of the other maintainers on this. I am definitely open to being convinced otherwise as it's definitely possible that my proposal has some holes in it :)

@brancz
Copy link
Member

brancz commented Mar 24, 2020

I think we made a mistake by ever trying to use the allow/deny lists for what are essentially feature gates. I think we should have explicit feature gates for metrics we want to be off by default. Then having allow/deny list being mutually exclusive is not a problem anymore.

That said, I think label and annotation metrics are yet another case, where each label that we expose needs to be allow/deny listed (maybe even only allow-listed), instead of the metric itself. And that for each resource.

@jw-s
Copy link
Contributor

jw-s commented Mar 24, 2020

Hi guys!

After reading the OP's proposal, I definitely think this is needed in the project however I do think it could be ugly to use when it comes to using regex and precedence logic.

I like @brancz's proposal of having feature flags/gates for each metric and then an allow/deny list IMO is simple and doesn't complicate things.

@dgrisonnet
Copy link
Member Author

Thank you for all the comments. I also really like @brancz's proposal of having feature gates to handle denied by default metrics.

As for labels, do you think we can also apply feature gates to them? Having metrics as key and a list of labels defaulting to an empty list as value maybe?

@brancz
Copy link
Member

brancz commented Mar 24, 2020

As for labels, do you think we can also apply feature gates to them? Having metrics as key and a list of labels defaulting to an empty list as value maybe?

Yeah I think only allow-listing the labels would be the same here, unless I'm misunderstanding.

@dgrisonnet
Copy link
Member Author

Yes, if done this way it would be the same.

I am actually more in favor of the allow-listing of labels approach rather than the allow/deny listing one. As mentioned before, we should apply labels listing to metrics like labels and annotations which I understood as applying it to metrics having an unbound number of labels. In my opinion, if we do allow/deny listing, it would only amount to reducing something that is still unbound in the end. Whilst on the other hand, by only allow-listing labels we could bound the number of labels of these metrics.

@brancz
Copy link
Member

brancz commented Mar 26, 2020

Precisely my thought as well.

@lilic
Copy link
Member

lilic commented Mar 26, 2020

Should we update this issue to reflect your comment, as now they are different from what I can tell.

@tariq1890
Copy link
Contributor

I'd prefer creating a new to reflect the new decisions and closing this in favour of that.

@lilic
Copy link
Member

lilic commented Mar 30, 2020

That sounds good as well! 👍 And link to this one for clarity.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 28, 2020
@brancz
Copy link
Member

brancz commented Jun 29, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 29, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 27, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 27, 2020
@dgrisonnet
Copy link
Member Author

Closing as #1125 added the feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants