Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Custom Resource metrics configuration file to the release artifacts #7229

Open
chrischdi opened this issue Sep 16, 2022 · 14 comments
Open
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@chrischdi
Copy link
Member

chrischdi commented Sep 16, 2022

User Story

As a operator I would like to have metrics for my running Cluster API verison for day-2 operations.

Detailed Description

We recently added a first configuration for metrics for the core CAPI resources via

It would be great to add the configuration as artifact to future releases so an operator could also easily find the matching configuration when updating Cluster API.

Before doing so, we might want to ensure to generate the configuration from code as outlined in

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 16, 2022
@sbueringer
Copy link
Member

/triage accepted

Sounds good. Thx for working on this!

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 16, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2022
@fabriziopandini
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 18, 2022
@cwrau
Copy link
Contributor

cwrau commented Aug 25, 2023

Ah, nice, would this also be rolled out via clusterctl init / clusterctl upgrade?

That way we can just consume it without having to do anything major manually.

@sbueringer
Copy link
Member

sbueringer commented Aug 25, 2023

The plan was just to publish it as a release attachment. In the past we didn't want to extend clusterctl to become a general purpose kubectl apply or something similar

@cwrau
Copy link
Contributor

cwrau commented Aug 25, 2023

The plan was just to publish it as a release attachment. In the past we didn't want to extend clusterctl to become a general purpose kubectl apply or something similar

Mh, but that doesn't solve the problem, or does it? 🤔

We as the end user still have to download (aka copy) that file, convert it to a configmap, apply it somehow to our clusters and configure a kube-state-metrics setup to use it.

Whereas the current, development, solution is so 👌 close to being perfect. The only current problem is that it doesn't build 😅

@sbueringer
Copy link
Member

sbueringer commented Aug 25, 2023

Depends on what the problem is. For me one of the biggest problems we have is that we don't have enough maintainers, so maintainability is a huge concern and I'm not looking forward to maintain a monitoring stack (or a part of that) :)

@sbueringer
Copy link
Member

What would be the change that we have to make to get rid of your current problem?

@sbueringer
Copy link
Member

sbueringer commented Aug 25, 2023

Some more context. It feels like everything in our repo that is somehow consumable will eventually be consumed by users. Doesn't matter if we document that we don't provide any guarantees etc.. People start relying on it, people ask for help if it doesn't work or if they have any questions about it and basically to a certain degree some expect support.

The only protection we sometimes have is to try set clear expectations on what guarantees we provide and otherwise make things "un-consumable" (which we afaik didn't do in the past, just to be clear).

@cwrau
Copy link
Contributor

cwrau commented Aug 28, 2023

Depends on what the problem is. For me one of the biggest problems we have is that we don't have enough maintainers, so maintainability is a huge concern and I'm not looking forward to maintain a monitoring stack (or a part of that) :)

Some more context. It feels like everything in our repo that is somehow consumable will eventually be consumed by users. Doesn't matter if we document that we don't provide any guarantees etc.. People start relying on it, people ask for help if it doesn't work or if they have any questions about it and basically to a certain degree some expect support.

The only protection we sometimes have is to try set clear expectations on what guarantees we provide and otherwise make things "un-consumable" (which we afaik didn't do in the past, just to be clear).

I definitely understand that, but monitoring, especially for a fleet of clusters (maybe even for various customers), is essential.

I wish kube-state-metrics would have an easier way to integrate these metrics, maybe like grafana does with their dashboard: 1 configmaps that are auto-(re)loaded. (See kubernetes/kube-state-metrics#2169) That would make your work/maintenance as easy as just somehow publishing the configmap, possibly even with just a release artifact and a little twist.

But as long as we need a manual configuration we either need this current setup with a custom kube-state-metrics just for this or a configmap I can just load into my existing kube-state-metrics. Both should be deployable via gitops without copying something.

What would be the change that we have to make to get rid of your current problem?

Currently the only thing preventing this from working is what I've described in #9312. Right now we can't use flux, as kustomize build fails because the namespace.yaml is outside the folder.

@sbueringer
Copy link
Member

Currently the only thing preventing this from working is what I've described in #9312. Right now we can't use flux, as kustomize build fails because the namespace.yaml is outside the folder.

I'm aware. I meant, what do we have to change to make this work?

@cwrau
Copy link
Contributor

cwrau commented Aug 28, 2023

Currently the only thing preventing this from working is what I've described in #9312. Right now we can't use flux, as kustomize build fails because the namespace.yaml is outside the folder.

I'm aware. I meant, what do we have to change to make this work?

Ah, oh, sorry 😅

Thinking about it I also found out that flux's kustomization doesn't support these Kustomize HelmChart thingies. 😅

So even just copying the namespace.yaml into the folder, wouldn't really solve our problem.

But a working solution would be to have the configuration in a configmap instead of a plain file with a configmap generator, we then could use a flux kustomization to only include that configmap. And patch our kube-state-metrics to include that file.

If that is not automation-friendly enough, we could also move the plain file into it's own folder with it's own kustomization and configmap generator and we could include that.

Thinking about it I think the latter is the better solution.

If you're open to that I can open a PR for that. 😁

@sbueringer
Copy link
Member

sbueringer commented Sep 13, 2023

We mostly implemented something else, but please check if #9390 solves your problem as well.

@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 12, 2024
@fabriziopandini fabriziopandini removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 16, 2024
@fabriziopandini fabriziopandini added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants