Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Monitoring Plugin #3078

Closed
sradco opened this issue Nov 16, 2022 · 23 comments
Closed

Add Monitoring Plugin #3078

sradco opened this issue Nov 16, 2022 · 23 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@sradco
Copy link

sradco commented Nov 16, 2022

What do you want to happen?

We would like to add a new monitoring plugin that will help operators developer with setting up Prometheus based monitoring for their operator, will provide them with best practices and tooling to shorten their time to get up to speed with monitoring requirements and help with standardizing the way monitoring is implemented in operators.

The proposed structure and content:

operator main/
  monitoring/
    READM.md
    metrics/
      metrics.go (script to register the metrics and additional info)
      example_metrics.go  (script for creating the custom metrics and additional info)
      util/
        util.go  (Help functions and other utils for handling the metrics)
    alerts/
      alerts.go  (script for creating alerts and additional info)
      util/
        util.go  (Help functions and other utils for handling metrics)
    dashboards/ (Here the Grafana dashboard for controller-runtime metrics can be added if the users will add an additional command)
    tools/
      metricsdocs.go
    prom-rule-ci/
      prom-rules-tests.yaml (script for unit tests based on Prometheus tools)
  docs/
    monitoring/
      metrics.md
      runbooks/
       Runbook.md (Runbook template for helping with understanding and mitigating alerts)

PR operator-framework/operator-sdk#5975 is also related and can be referenced to provide additional best practices. It is correctly on hold since we would like to replace the examples with the examples here.

Extra Labels

No response

@sradco sradco added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 16, 2022
@Kavinjsir
Copy link
Contributor

The idea sounds great 👍🏼 .
Looking forward to have this feature.
(I'd be happy to help with if possible 😃 )

avlitman added a commit to avlitman/kubebuilder that referenced this issue Nov 20, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Nov 23, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Nov 23, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Nov 24, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>
@camilamacedo86
Copy link
Member

camilamacedo86 commented Nov 27, 2022

HI @sradco,

Interesting. Have we an example of what would you like to scaffold within?
I can see the PR: avlitman#1 would be possible to generate the testdata with its changes so we are able to have a better idea of what you are looking for? (run make generate gen the testdata)

@umangachapagain
Copy link

@sradco IMO a monitoring plugin that will help operators developer with setting up Prometheus based monitoring sounds much bigger in scope than the implementation.

When I first read the issue, I was expecting a scaffold that will help deploy Prometheus stack (creating Prometheus resources). But, this is more about generating metrics (which may or may not be consumed by Prometheus). So, we should probably consider renaming the issue and plugin name to be more specific to metrics.

And since we're talking about best practices, will there be a recommended set of metrics that all operators should implement?

I like the overall idea.

avlitman added a commit to avlitman/kubebuilder that referenced this issue Nov 29, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Nov 29, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Nov 29, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>
@sradco
Copy link
Author

sradco commented Nov 29, 2022

@camilamacedo86 @umangachapagain I updated the description of the issue. The plugin is meant to help the operators developers with the on boarding process of adding monitoring (metrics, alerts, recording rules and more) to their operator and to offer the best practices to do that correctly.

For example, having the metrics code logic in /monitoring/metrics and not inside the core operator code. In the core operator code there should only be a call to update the metric.

avlitman added a commit to avlitman/kubebuilder that referenced this issue Nov 30, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>

Uncomment monitoring go files

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Generate working monitoring scaffolding

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Allow kubebuilder init with monitoring bundle

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Add metrics register to main file

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Update testdata

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Improve comments

Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Nov 30, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>

Uncomment monitoring go files

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Generate working monitoring scaffolding

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Allow kubebuilder init with monitoring bundle

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Add metrics register to main file

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Update testdata

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Improve comments

Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Dec 1, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>

Uncomment monitoring go files

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Generate working monitoring scaffolding

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Allow kubebuilder init with monitoring bundle

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Add metrics register to main file

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Update testdata

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Improve comments

Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Dec 1, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>

Uncomment monitoring go files

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Generate working monitoring scaffolding

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Allow kubebuilder init with monitoring bundle

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Add metrics register to main file

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Update testdata

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Improve comments

Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Dec 1, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>

Uncomment monitoring go files

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Generate working monitoring scaffolding

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Allow kubebuilder init with monitoring bundle

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Add metrics register to main file

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Update testdata

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Improve comments

Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Dec 1, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>

Uncomment monitoring go files

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Generate working monitoring scaffolding

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Allow kubebuilder init with monitoring bundle

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Add metrics register to main file

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Update testdata

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Improve comments

Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Dec 1, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>

Uncomment monitoring go files

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Generate working monitoring scaffolding

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Allow kubebuilder init with monitoring bundle

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Add metrics register to main file

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Update testdata

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Improve comments

Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Dec 1, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>

Uncomment monitoring go files

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Generate working monitoring scaffolding

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Allow kubebuilder init with monitoring bundle

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Add metrics register to main file

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Update testdata

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Improve comments

Signed-off-by: Aviv Litman <alitman@redhat.com>
avlitman added a commit to avlitman/kubebuilder that referenced this issue Dec 1, 2022
GitHub-issue: kubernetes-sigs#3078
Signed-off-by: Aviv Litman <alitman@redhat.com>

Uncomment monitoring go files

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Generate working monitoring scaffolding

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Allow kubebuilder init with monitoring bundle

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Add metrics register to main file

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Update testdata

Signed-off-by: João Vilaça <jvilaca@redhat.com>

Improve comments

Signed-off-by: Aviv Litman <alitman@redhat.com>
@Kavinjsir
Copy link
Contributor

I have the same sense as @umangachapagain pointed out. As an operator author, I would assume to have a whole monitoring stacks solution for the first glance on the name monitoring plugins.

When going over for details, it feels more like prometheus plugin where it mainly provides approaches to define: metrics, recording rules, alert rules that are all "prometheus scoped".

Talking about monitoring, I personally am thinking of metrics, logging, tracing data, that is beyond what this issue want to bring about.

However, the idea overall looks good. Since populating customized metrics can be a common case, having a general approach to simplify metrics/rules implementation do save developers' time.

In my own opinion, I would expect this approach to be flexibly imported in my project. Which means, my project might not necessarily to be a scaffold by Kubebulider. From there, would it be possible if we change it to be a library or a separate tools like controller-gen?

@camilamacedo86 camilamacedo86 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels Dec 1, 2022
@varshaprasad96
Copy link
Member

varshaprasad96 commented Dec 1, 2022

@sradco We had a discussion regarding the monitoring plugin during the community meeting and a few insights were brought up by @camilamacedo86 and @Kavinjsir. I'm summarizing the list of concerns below:

  1. By @camilamacedo86 @Kavinjsir - Since the functions and methods are generic, does it make sense to have this as a library rather than a plugin? With it being a library, even the operators not scaffolded by KB could use it.
  2. By @Kavinjsir - The name "monitoring plugin" seems misleading, since monitoring just does not involve prometheus. It includes tracing, logging and other methods. Whereas this plugin just talks only about prometheus stack.
  3. By @Kavinjsir - This seems to be a golang specific plugin. The best practices mentioned (for adding alerts, declaring a custom metric etc) here are relevant only for go based operators, and not for say a java based operator project. Considering this, does this fit more in a prometheus project than in kubebuilder?

I would like to add like to add some of my thoughts inline, but I also defer to @sradco and team for their point of view.

  1. By @camilamacedo86 @Kavinjsir - Since the scaffolded helpers are generic, does it make sense to have this as a library rather than a plugin?

In my point of view, I consider the best practices doc generation to be useful from an operator author's persona in the following way:

  • As an operator author I introduce some custom metrics in my Kubebuilder scaffolded project.
  • To add documentation for those metrics and enable the customers of my project to easily start using it, I need some kind of user facing docs.
    The plugin scaffolds helps in generating user facing best practices documentation for the custom metrics which I have.

But at the same time, if these best practices are going to be common and the same piece of code (like the following: https://github.com/kubernetes-sigs/kubebuilder/pull/3106/files#diff-3036867e08a7c0e5bfdcccb3b380a20f01d1665d23187eb1a13fdc8926d2c8b2) is being generated for all projects, then they should specifically go into a library.

What in the plugin would be helpful to users is: https://github.com/kubernetes-sigs/kubebuilder/blob/28fe31ee240090c2009e5c0a6611d3f0b85407f1/testdata/project-v3-declarative-v1/monitoring/tools/metricsdocs.go - wherein I mention the custom metrics and have a standard documentation generated for it. Whether this warrants a separate plugin or just an additional script in a discussion.

On the other hand the contents of (testdata/project-v3-declarative-v1/monitoring/metrics/metrics.go) in the PR could possibly be moved to a library.

  1. By @Kavinjsir - With "monitoring" seems misleading, since monitoring just does not involve prometheus. It includes tracing, logging and other methods. Whereas this plugin just talks about only prometheus.

With monitoring may seem misleading if we decide to move ahead with the plugin but I wouldn't mind a name with prometheus on it.

  1. By @Kavinjsir - This seems to be a golang specific plugin. The best practices mentioned (for adding alters, declaring a custom metric etc) here are relevant only for go based operators, and not for a java based operator project. Considering this, does this fit more in a prometheus project than in kubebuilder?

Kubebuilder is a generic scaffolding tool. But at the same time, plugins can be for specific use. I could scaffold a project with kubebuilder, and chain with a plugin to do any form of customization I want. For example, the "deploy-image" plugin scaffolds out the best practices for go operators, it cannot be used on top of Helm plugin unless we add additional customizations. I would not worry about the lack of language agnostic feature to this plugin, because eventually when this is moved to an external plugin it can just be considered as an extension for having best practices for go operator.

Overall, I feel that we should break down the PR to understand its use case better. The common methods could go into a library, but having a custom doc generation mechanism which reads the custom metrics specified by the user and scaffolds out the best practices for the same does seem to be a good idea for a plugin (this may need changes to the current implementation).

@Kavinjsir
Copy link
Contributor

@varshaprasad96 Big thanks for bringing what we've discussed in weekly with details! 👍🏼

@sradco I would also like to add my thoughts to supplement the points above:

For Q1

What in the plugin would be helpful to users is: https://github.com/kubernetes-sigs/kubebuilder/blob/28fe31ee240090c2009e5c0a6611d3f0b85407f1/testdata/project-v3-declarative-v1/monitoring/tools/metricsdocs.go - wherein I mention the custom metrics and have a standard documentation generated for it. Whether this warrants a separate plugin or just an additional script in a discussion.

I'm also happy to see a documentation with introduction of custom metrics if existing.
Just like how we have docs of default metrics that helps us to identify the metrics for our needs.

For Q2
Since what this issue and the corresponding PR are discussing is closely related to Prometheus, maybe we can consider if a better naming can be possible?

For Q3

...because eventually when this is moved to an external plugin it can just be considered as an extension for having best practices for go operator.

I'm good if such feature becomes Golang specific finally.
Well, one point I'm personally thinking is: except metrics, other prometheus manifests may be not necessarily go-specific.
Like the implementation and usage of alerting rules and recording rules.


In fact, I would imagine that the default metrics provided by controller-runtime may cover many use cases when monitoring on API reconciliation.
So wondering if it is possible that "implementing custom metrics for CRD" may become a comparatively advanced topic?
And before that, the users might be more interested in utilizing the existing metrics to create "recording rules" / "alerting rules"?

Actually, many projects provides mixins to give these prometheus resources "off the shelf", for instance:

So as an operator user, I might also be happy if the plugin/lib can bring me similar manifests. (Just one thought around monitoring...)

@camilamacedo86
Copy link
Member

camilamacedo86 commented Dec 2, 2022

Hi @sradco,

The utils proposed seems to fit as a lib. Therefore, since it is very prometheus specific and not "K8s" context I think that it might be accepted under github.com/prometheus/. Did you tried that? Then, if we remove the utils it mainly only scaffolds a README. Yes the readme is nice but have a plugin to do only that does not seems to bring too much value (imo)

(Suggestion, it seems fit in the SDK samples) Operator-sdk has a best practice section (https://sdk.operatorframework.io/docs/best-practices/) and scaffold samples within code which is used in the tutorials. Also, you made a sample with utils and related "good practices for metrics" as part of your previous initiatives (https://github.com/operator-framework/operator-sdk/tree/master/testdata/go/v3/monitoring/memcached-operator). So, why not add this code there?

PS.: Note that the deploy-image plugin has the goal of follow the good practices but it scaffolds the whole code, tests and etc to deploy and manage an image which is a common need. It has been requested by the community in SDK and KB not only to know how to do but also to speed up, see the design doc: https://github.com/kubernetes-sigs/kubebuilder/blob/master/designs/code-generate-image-plugin.md and old proposals operator-framework/operator-sdk#2158.. Then, imo we cannot compare both proposals/ideas.

Conclusion

IMO we should not accepted it in Kubebuilder (it does seem address value for the project and seems to be too prometheus specific which is out of the scope).

The goal of Kubebuilder is not accepted and/or have a lot of plugins because it would be hard to kept maintained. Therefore, Kubebuilder has been investing efforts in the plugin phase 2 to provide an API to allow the community created its own plugins that would not fit in its space and use as external plugins (see #2600). We really would like to provide only few plugins shipped with the CLI and invest efforts in the API.

Also, I understand that the others thoughts go along with.
So, are we you ok with we close this issue and the open pr as not accepted?

@Kavinjsir @varshaprasad96 @umangachapagain

@machadovilaca
Copy link
Contributor

Related to having util.go file in an external library, in my opinion, it is best to keep the util file within the project rather than moving it to an external library. This allows developers to easily modify and maintain the code according to their needs. It also makes it easier for them to understand how the code works and how it fits into the overall project. Currently, the util file only contains basic operations for creating and registering metrics. However, advanced developers may have additional requirements, such as the ability to pass additional parameters to the metrics opts or the use of metric vectors.

@umangachapagain
Copy link

The goal of Kubebuilder is not accepted and/or have a lot of plugins because it would be hard to kept maintained. Therefore, Kubebuilder has been investing efforts in the plugin phase 2 to provide an API to allow the community created its own plugins that would not fit in its space and use as external plugins (see #2600). We really would like to provide only few plugins shipped with the CLI and invest efforts in the API.

Also, I understand that the others thoughts go along with. So, are we you ok with we close this issue and the open pr as not accepted?

@Kavinjsir @varshaprasad96 @umangachapagain

I'd still like to see this as a plugin. External plugin system seems to be the right fit. Is there a centralized repo for community contributed plugins?

@camilamacedo86 Operator SDK already scaffolds some resources specific to Prometheus. Do you know if SDK does it on it's own or gets it via KB? As you suggested, SDK seems like a right fit to integrate such a plugin.

I envision this as an external KB plugin which is leveraged by SDK as a default. SDK already brings in ServiceMonitor. This plugin can add metrics and PrometheusRule to it and document best practices.

@camilamacedo86
Copy link
Member

camilamacedo86 commented Dec 3, 2022

Hi @umangachapagain,

I'd still like to see this as a plugin. External plugin system seems to be the right fit. Is there a centralized repo for community contributed plugins?

Why? What advantages a plugin that mainly only scaffolds a README can bring to you vs the cost to keep it maintained?
Do you need a plugin for that? Should not a doc and a sample/example enough for you achieve the goal?

@camilamacedo86 Operator SDK already scaffolds some resources specific to Prometheus. Do you know if SDK does it on it's own or gets it via KB? As you suggested, SDK seems like a right fit to integrate such a plugin.

The only specific prometheus thing scaffold in the projects are the rule to allow enable the default metrics because by default they are protected.

No utils code in any form are scaffolded in the projects done by SDK or Kubebuilder because that fits to a lib or into a third-party tool that can be consumed by the projects usually.

My suggestion was check if this code cannot be added into the sample/example https://github.com/operator-framework/operator-sdk/tree/master/testdata/go/v3/monitoring/memcached-operator (not default scaffolds or plugin) since there has other "good practices" regards this topic and it can be used into the SDK docs.

@sradco
Copy link
Author

sradco commented Dec 4, 2022

Hi @umangachapagain,

I'd still like to see this as a plugin. External plugin system seems to be the right fit. Is there a centralized repo for community contributed plugins?

Why? What advantages a plugin that mainly only scaffolds a README can bring to you vs the cost to keep it maintained? Do you need a plugin for that? Should not a doc and a sample/example enough for you achieve the goal?

I disagree. As I mentioned before, this plugin is meant for an easier onboarding to Prometheus monitoring. This PR is the initial work that includes the metrics part. It doesn't only include README but includes the example files that the developers will add their specific metrics to.

This is a very important part that will greatly help operator developers, with adding new metrics.
Also this integrates an important best practice which is to separate the monitoring code from the core operator code.
This is emphesised both in the example files and the README. It adds tooling that help them to document the operator specific metrics automatically. This is only the beginning.

For alerts we will also add a second PR that adds the required files for adding the alert rules, recording rules, Prometheus unit tests, runbooks.

We plan to complement this plugin with a best practices doc which includes metrics naming conventions, alerts labels, which are currently not documented in a central place and its like the wild west which makes it hard for end users to make good use of the monitoring stack.

I agree that this maybe specific to Prometheus stack, but the Grafana plugin is specific to Grafana and its in Kubebuilder. I don't mind making this a Prometheus stack plugin.

Prometheus is a highly used stack and it deserves the attention. The Grafana plugin can become part of it.

@camilamacedo86 Operator SDK already scaffolds some resources specific to Prometheus. Do you know if SDK does it on it's own or gets it via KB? As you suggested, SDK seems like a right fit to integrate such a plugin.

The only specific prometheus thing scaffold in the projects are the rule to allow enable the default metrics because by default they are protected.

What we added to this operator will save a lot of development time, frustration and prevent pitfalls that will later cost in more time to fix.

No utils code in any form are scaffolded in the projects done by SDK or Kubebuilder because that fits to a lib or into a third-party tool that can be consumed by the projects usually.

We are not providing a lib. The files are the base for the operator developers to build on.

My suggestion was check if this code cannot be added into the sample/example https://github.com/operator-framework/operator-sdk/tree/master/testdata/go/v3/monitoring/memcached-operator (not default scaffolds or plugin) since there has other "good practices" regards this topic and it can be used into the SDK docs.

The example dashboard will not have the same affect. We want to give the developers that starting point that will lead them for easier and better implementation of the Prometheus stack monitoring.

The memcached operator will be built on top of this plugin and add the actual metrics, alerts and so on. We still want to add that.

@camilamacedo86
Copy link
Member

camilamacedo86 commented Dec 4, 2022

HI @sradco,

Thank you for your input. Could you attend the next community meeting?
It would be great if we were able to discuss this topic altogether and by voice. WDYT?

Let's first ensure that we are all on the same page here and that the utils/code generated by the proposed plugin should not be scaffolded in the project itself. Then, after moving this implementation to a lib the plugin will scaffold? (That is what we need to discuss here, right?)

It doesn't only include README but includes the example files that the developers will add their specific metrics to.

Codes examples in Kubebuilder are only addressed via docs/tutorials.
All plugins scaffold useful/valid code, NOT examples.

So, if the plugin's main goal is to scaffold examples, then we cannot accept this one._

Therefore, following good practices and conventions shows a pre-requirement for any code done by the tool.

In SDK, you have been working on adding content regards this initiative. (here) Why not ship these examples there?

We plan to complement this plugin with a best practices doc that about metrics naming conventions, alerts labels, which are correctly not documented in a central place and it's the wild west which makes it hard for end users to make good use of the monitoring stack.
I agree that this maybe specific to Prometheus stack, but the Grafana plugin is specific to Grafana and its in Kubebuilder. I don't mind making this a Prometheus stack plugin.

The grafana plugin

  • does not scaffold utils to work with grafana. (such as the proposed idea so far)
  • does not scaffold examples. Instead, it provides a valuable/useful implementation for what is done by default
  • and also allow you to input data from custom metrics and generate valuable/useful code based on that

So, what could be scaffolded by the monitoring plugin that is a useful and valid code for what is done by default? What valuable, useful code could be generated by it?(what it can do that after to do the scaffold I can go there and use the result without the need to add any extra code on top)

  • Could we try to re-think and see if we could propose something based on that?
  • Could we have alerts for the default metrics? Have any metric scaffold by default in controller-runtime that does not follow this convention, and we could improve the implementation?

That is something that IMO could be very nice to have and would have +1 vote here.

@sradco
Copy link
Author

sradco commented Dec 4, 2022

I will attend the meeting.

  • Perhaps the word example wasn't the exact word for what we mean and what is implemented.

    The files do not include metrics/alerts examples. They include the actual code needed for creating and exposing operator metrics and alerts.

It is the same like https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4-multigroup/apis/foo/v1/bar_types.go
And will be extremely helpful for the developers!

The developers only need to add their custom metrics/alerts/recording rules and so on to them.

This plugin doesn't aim to add specific metrics and alerts and other specific resources.
Adding specific metrics, alerts, dashboards should use this plugin and add them on top of it, while keeping the best practices

@camilamacedo86
Copy link
Member

camilamacedo86 commented Dec 4, 2022

Hi @sradco,

In the Proposed PR, the code generated is lib/bin features. not CLI ones and should not be inside the "Operator" itself.

The problem is not the directory being named "utils" but what it does. The code inside the utils should be centralized in lib and consumed by the projects. The binary also should not be built for the Operator itself. The bin should be provided outside and consumed by the projects.

Could you please let me know:

  • a) In SDK, you have been working on adding content regards this initiative. (here) Why not ship these examples there?
  • b) Have alerts and etc, that could be added to what is scaffolded by default? If so, I think would be something that could end up as a plugin-related feature.

The examples raised by you are not the same scenario. I tried to clarify in the following comments.

It is the same like https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4-multigroup/apis/foo/v1/bar_types.go

That results from the code generated when we create APIs (feature). Provide comments in the code to let users know how to use a feature or if its result shows completed acceptably. But creating a feature that provides only mainly comments and examples seems not.

In Memcached operator we use the code and add the Memcached specific metrics/alerts and all other the examples.
It is a sample provided in the SDK repo not Kubebuilder

It is a sample in SDK. Kubebuilder has no Memcached operator or plugin that generates a specific Memcached operator.

The sample is generated automatically via a stack available in SDK for maintainability/development purposes and is not part of any CLI feature. Therefore, ihmo what you are trying to add here could fit well in the sample but not in a plugin/cli feature.

The Grafana plugin does include example file for adding custom metrics https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4-with-metrics/grafana/custom-metrics/config.yaml.
And its used here https://github.com/kubernetes-sigs/kubebuilder/blob/8069fa5e3a64a1671c7b80ae5dc588567c9aa295/pkg/plugins/optional/grafana/v1alpha/scaffolds/internal/templates/custom.go to generate the template.

The Grafana plugin provides a feature where you can input the info about your custom metrics and let the tool (feature) generate the Grafana Dashboards for the input provided. The examples are NOT the feature. The comment with an example is to let the user knows how to use the feature.

We believe this is NOT the proper way to add custom metrics. This plugin doesn't aim to add specific metrics and alerts and other specific resources.

Grafana plugins should not add metrics. Its purpose is generate the dashboards for what is exported and exist by default. On top of that, it has a feature that allows you to say, "here are my custom metrics" please generate "Garafana dashboards for them".

Adding specific metrics, alerts, dashboards should use this plugin and add them on top of it, while keeping the best practices

Sorry, but I could not follow this one. However, if you disagree with the grafana plugin or would like to change it, then that seems that the best way to address that is open an issue so we discuss what should be changed, etc on it specifically. Could you please open this one if that is the case?

@varshaprasad96
Copy link
Member

varshaprasad96 commented Dec 5, 2022

@camilamacedo86 I understand your point of view, and to summarize adding two important points here that are creating questions on whether this issue should be accepted or not. I'll add my answers to them inline:

Question 1: Are the helpers scaffolded by the plugin valuable? What are the benefits of scaffolding this inside an operator project, can these be just helpers in a library.
Question 2: Do we accept this as a plugin in KB? If we accept this as an plugin in KB, how do we manage the engineering effort to maintain it?

Answers inline:

Question 1:

By definition, plugins are extensions to KB cli, which address "specific" needs of the user. These needs can either be addressed by scaffolding or generating code that is helpful to the user, like the grafana plugin does, or by enforcing a best practice (like the deploy image plugin does). Either ways until a plugin "addresses the needs of the user" it is valuable. The monitoring plugin here, is trying to enforce best practices by directing the user on where or how to add metrics in their project following best practice. After digging up, I understand that "util.go" is not a library, but is a helper which an operator author should eventually have "inside their project". Let's take an example here:

The method MustRegister (https://github.com/prometheus/client_golang/blob/254e5468413f19fb75cdad45f5ddc0b8c975188c/prometheus/registry.go#L177) in very simple terms registers a collection of metrics (in promethteus' terms "collector"). This is an API provided by the user to register their custom metrics. Hence prometheus provides this as a library. An operator author has to use this method in their respective project to register their custom metrics. What usually folks do is add this in their operator code (usually in reconciler) which does work without any problems but is unnecessary and "NOT" a good practice. What the monitoring plugin is trying to do is scaffold this piece of code in a separate folder and enforce the users to organize the monitoring pieces of their project in the right place. Having the wrappers like this - https://github.com/kubernetes-sigs/kubebuilder/pull/3106/files#diff-75224dd6164f9903e8acf72b976de2f46f97c451a3b784fcd1d20347a90ffcbaR29 in a library is adding no value, the user will still continue to make the same mistake of having them at the wrong location.

This is similar to the deploy-image plugin - without this everything works as expected, but the operator project does not follow a best practice. A user can have an unfiltered cache, their project would still work, but having cache selectors will enable the operator to be more efficient - which is one of the things that deploy-image does.

Imo this should be considered as a plugin. Having the helpers in a library would add no value, its like creating wrapper around wrappers (prometheus APIs) which is of no use. The motive of this plugin is to enforce a best practice and having these in a library would defeat that purpose, since users would not even use the library which is created from this.

Question 2: Do we accept this is a plugin in KB?

I agree with @camilamacedo86's point of view in this aspect wherein the eventual effort of maintaining the plugin inside KB can be painful. Having a plugin as a part of KB has a long list of problems (which I wouldn't repeat here), but this is why Phase 2 was introduced. Having out-of-tree plugins solves question 2, where as KB community we need not be worried of maintaining the code base. It would be external, can be used with KB, but will have its own set of maintainers and release cycle. This issue was brought up in community meeting, however, the idea was to start this as a phase 1.5 and eventually move it to phase 2 plugin, since as KB maintainers we haven't done a good job of creating a step-by-step documentation on building phase 2 plugins. However, if @sradco and team are comfortable on making this phase 2, that would definitely be the best solution.

Tl;dr; the first step is to answer whether we can consider this to be a plugin or not. (Imo we should for which I have provided a detailed explanation above). The next is if this should be phase 1.5 or a phase 2 plugin. If the authors are comfortable with making this external using phase 2, then definitely +1 for it!

cc: @camilamacedo86 @sradco @machadovilaca @umangachapagain @Kavinjsir

PS: Modifying Grafana plugin and/or integrating all the aspects of a monitoring stack into one are separate follow up issues for future that need separate discussion and we need not bring them up here in the first step (imho).

@Kavinjsir
Copy link
Contributor

I'm would also vote for external plugin:

  1. The proposed "best practices" looks great especially in case the author continuously defines number of metrics. If there are just a few metrics, will this practice be heavy?
  2. Instead of "guide users to generate metrics for operator", does the code scaffolded under monitoring/ look more like "guide users to generate metrics in go project"?

I'm wondering if there is some specific metric practices that is operator scoped, because Kubebuilder, as I suppose, is the tool to help user create a good operator project (versus "golang project").

@camilamacedo86
Copy link
Member

camilamacedo86 commented Dec 6, 2022

HI @varshaprasad96,

Thank you for all your input. Following some comments inline

Regards the question: Question 1: Are the helpers scaffolded by the plugin valuable? What are the benefits of scaffolding this inside an operator project, can these be just helpers in a library.

See that the utils also have a binary to build the docs for the metrics. In the same way, and for reasons that we do not scaffold a code to build envtest bin inside of the project, I do not agree that is something that should be scaffolded by default.

Therefore, I was also wondering why it is required to implement and build a binary. If that is a good practice, does it not exist for projects that already provide the docs for the metrics? Are not the metrics implement in go so godoc will not generate the docs for it?

Conclusion

I think we can all convey that we reached a consensus that this proposal does fit well for KB purposes and views and, unfortunately, cannot be accepted to be shipped within.

(ihmo) I do not think nomenclature, examples, or conventions are good purposes as end goals for plugins and seem better addressed via docs and examples. Ensure that the code resulting from the CLI/feature follows good practices, etc seems to me a pre-requirement and does not fit as "that is the reason this plugin exists".

However, the idea with plugins phase 2 is for Kubebuilder to provide the libs (which are implemented already. It is only missing docs and e2e test to ensure the implementation), and the community can use them as the API to do what they wish.

@sradco, please let us know if we can agree to close this one as deferred asap or if you want to wait until the next community meeting.

@Kavinjsir @umangachapagain @everettraven

@varshaprasad96
Copy link
Member

varshaprasad96 commented Dec 6, 2022

@camilamacedo86 Just adding my follow up thoughts here.

Therefore, I was also wondering why it is required to implement and build a binary. If that is a good practice, does it not exist for projects that already provide the docs for the metrics? Are not the metrics implement in go so godoc will not generate the docs for it?

The docs are not the only files scaffolded. The plugin brings in the code required to setup metrics as well (https://github.com/kubernetes-sigs/kubebuilder/pull/3106/files#diff-75224dd6164f9903e8acf72b976de2f46f97c451a3b784fcd1d20347a90ffcbaR29). If an operator author wants to add metrics using prometheus, then its upto them to use this plugin and scaffold the helpers and its documentation (its not by default). Envtest imo is a different case, where we use the binary (as is) to spin up the test cluster. The users of the operator author project need not know what binaries are downloaded by envtest, or what is the use of "setup-envtest", or even the use of each binary. But the users of these metrics need to know how it is registered as well as what each one of them do.

In my mind, the documentation can be of two kinds:

  1. Docs which help operator author to generate metrics - This should definitely be in a common place as KB docs (like in KB book). => (i.e its like KB is the product, and customer are operator authors)
  2. Docs which help the users of operator - If these are operator specific docs then having it in the project does add value to its users => (i.e operator author is delivering the operator image as the product, its users are the customers).

I agree that docs need not be a part of the operator image (or the binary built for it), but can definitely be a part of the project, for which this plugin is helpful. Just the helpers used to register metrics can be in pkg/.

We could divide the PR into three portions, where the first two that have operator specific metric helpers are a part of plugin, and the generic helpers which are common to all are a part of library:

  1. The methods to register metrics and use them in a plugin : https://github.com/kubernetes-sigs/kubebuilder/pull/3106/files#diff-83c253335297c31f887e2bbe0a07b4225ef159c38fd5c382141b03d5a179b08dR1
  2. The scaffolded documentation to be a part of the plugin : https://github.com/kubernetes-sigs/kubebuilder/pull/3106/files#diff-02ed1564d3e8e371c7a8cc78c6547eca9362b8911b6329f53efd924e25deccc0R2-R10
  3. The generic helpers that list, or compare metrics in a library : https://github.com/kubernetes-sigs/kubebuilder/pull/3106/files#diff-455f7928933e227aa87e92a8cc46759fc8fd4ea3ca27cbd08bb80af22a858156R38-R68

I think we can all convey that we reached a consensus that this proposal does fit well for KB purposes and views and, unfortunately, cannot be accepted to be shipped within.

(please correct me if I'm wrong) Do we agree that this could be a plugin? just that it cannot be a phase 1.5 plugin and shipped along with KB.

I'm more concerned on opinions on why this shouldn't be a plugin than whether it should be in KB (as phase 1.5) or an external one. If the latter is true (this can be plugin but external), then we can surely close this issue and encourage the authors to create issues specifically with respect to phase 2 as they start building on it.

@camilamacedo86
Copy link
Member

camilamacedo86 commented Dec 9, 2022

Hi @varshaprasad96,

IHMO:

Regards the specific propose of this plugin

After we remove the testdata/project-v3-declarative-v1/monitoring/tools/metricsdocs.go which we all agree that does not seems a good fit to be scaffolded in the project (Not sure why the binary is indeed required, are not godocs doing that already? Have not projects that can do it? Could not it be addressed in another project ? )

The goal of the plugin still:

  • scaffold a README to doc the metrics
  • scaffold a wrapper for Prometheus client and provide an example over how/where according to them would the metrics be better organized

Then, let's think about the value:

  • Custom metrics is not a common case need at all. I cannot think in a good example where it would be required. (seems to me address very specific need targeting < 20% at least ))
  • By using the plugin it will only try to enforce a "convention/standard" that they would like to promote and not delivery something that we can check working and get an immediately result.
  • It seems that the definition that is the best away to achieve the goal can still be doubled. (I am not very skilled on this area, but I would recommend check if has not an specific k8s working group for this subject and see wdyt)
  • Nomenclature, conventions are usually better described by docs and examples.
  • Have the plugin will not remove the need of all be described properly in a doc which ought to provide examples as well.

So, in POV: The work required to maintain the proposed plugin does not pay off the value that it can returns.

Now, about what plugins Kubebuilder could accept or not to be shipped with and maintained by the project?

The historic context here, is that it was discussed in the past and we agreed that unfortunately we have not the effort required to accept all possible plugins and address all possible needs. Therefore, we need to keep in mind only accept plugins that can bring value for the common scenarios and what is done by default. We need target things that can be useful and helpful for 80% of use cases instead of very specific needs.

Because all points described above, I do not think that it fits well in Kubebuilder (my vote is -1). Looking on how can we still helping anyone from the community that has appetizer to maintain solutions like that we have been working in the external plugins ( plugins phase 2 ).

Therefore, if @sradco has the effort required and think that is valuable address this need via a plugin instead docs and examples then, I'd say that it shows a great fit for the external plugins. And we encourage everybody use Kubebuilder as lib to create their own plugins for specific use cases and needs. The code done so far is mainly the same, it will mainly only not live in the Kubuilder repo.

PS.: I indeed would like to suggest we review the declarative plugin. It seems for me that this one could be in the declarative repo instead of kubebuilder.

@camilamacedo86
Copy link
Member

Hi @varshaprasad96 @sradco @Kavinjsir @everettraven,

It seems that the recommendation will be addressed via examples and docs in operator-sdk side. If so, do we agree in close this one as the PR: #3106?

@everettraven
Copy link
Contributor

@camilamacedo86 I think it is fine to close #3106 as per this comment

I think closing this issue is fine unless we feel there needs to be more discussion about making it an external plugin.

@sradco
Copy link
Author

sradco commented Jan 24, 2023

@camilamacedo86 I think we can close this for now.
I still believe that a plugin for monitoring would be very valuable, but since it will not be accepted as part of the kubebuilder code, we are considering the amount of effort and time this would require from our team.

Having the documentation and examples is a good starting point. It can also be expended to include additional information about logs, tracing, events etc and we plan to help in adding additional information as we go.

From my POV, adding the plugin would allow automations that would save development time and on boarding time, will allow code reusability, improve the code quality and would ultimately result in better user experience.

@sradco sradco closed this as completed Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
7 participants