Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Monitoring code refactor #219

Conversation

machadovilaca
Copy link
Member

This design doc is proposing a code refactor for the monitoring logic in all
KubeVirt components.

@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/M labels May 2, 2023
@machadovilaca
Copy link
Member Author

/cc @enp0s3 @fossedihelm @xpivarc

@machadovilaca
Copy link
Member Author

/cc @sradco

@kubevirt-bot kubevirt-bot requested a review from sradco May 2, 2023 14:47
@EdDev
Copy link
Member

EdDev commented May 18, 2023

/cc

@kubevirt-bot kubevirt-bot requested a review from EdDev May 18, 2023 12:47
Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice initiative, thanks!

@iholder101
Copy link
Contributor

Thanks @machadovilaca! Separating monitoring logic from business logic is very important! great to see this being pushed forward. 👍


I feel like further explanation is needed here. Specifically, I'd like to understand:
What is the difference between the current situation and what you're proposing and how does the new state helps to reach the goals you mentioned, specifically to help separate monitoring from business logic?

IOW, is this change trying to solely achieve a uniformity between the different repositories, or does it also actually help separate monitoring from business logic? I'm asking since even now we have a dedicated /monitoring directory, which sounds pretty similar to what you suggest here.


In addition, in the proposal I've created once, I've thought of this problem at the opposite direction. Instead aiming for every repo to align with a certain structure, I think it's reasonable for every repo to adopt some kind of a declarative configuration that would describe which fields need to be monitored and how. I'm not sure how feasible this direction really is, but my point is to remove the burden from the developers to automation as much as possible, especially because I think that a lot of repositories use monitoring pretty similarly (e.g. track X and Y fields, alert when X is bigger than 123, etc).

I'm not experienced enough with the monitoring field to know if that's feasible and how, but I would like to hear your perspective regarding that direction.

@iholder101
Copy link
Contributor

/cc

@machadovilaca
Copy link
Member Author

@iholder101 First of all thank you for your review.

And I agree with you. I think a declarative configuration for metrics and rules is a solution I would like to have one day. Unfortunately, for now, I think that goal would be way down the road since, in my opinion, we have increasingly more technical debt and it is becoming more difficult to update and 'evolve' monitoring features.

In this stage, my goal is to, as you said, align all repositories with a certain structure, have a package that all repositories import and use (reducing code duplication and making it easier to add new features for monitoring in the future), and cleanup business logic code.

This way, in the future, I think it would be easier to update the library package to parse and registry metrics and rules from declarative configurations, move the collection logic of external resources to external repositories, and even have an external controller/operator if needed/wanted.


Related to

IOW, is this change trying to solely achieve uniformity between the different repositories, or does it also actually help separate monitoring from business logic? I'm asking since even now we have a dedicated /monitoring directory, which sounds pretty similar to what you suggest here.

Yes, KubeVirt already has a monitoring directory, some other components like HCO, also have it. But I still think there is still a lot of work to be done. Besides the components creating and registering metrics in very different ways, we see that not all metrics are registered there (for example kubevirt_virt_controller_leading),. We see that if we want to follow Kubernetes Metrics Stability framework in kubevirt/kubevirt alone it would take a lot of manual repetitive work. We see that the metric documentation generator misses a lot of metrics.

I think this is hard work, but necessary to allow us to continuously improve our monitoring systems, and important if we ever want to reach some of the ideas you also had.

@enp0s3
Copy link
Contributor

enp0s3 commented May 24, 2023

/cc

@enp0s3
Copy link
Contributor

enp0s3 commented May 24, 2023

@machadovilaca @iholder101 Following the idea of a generic approach, I saw this initiative from Meta incubator.

@machadovilaca
Copy link
Member Author

@enp0s3 Didn't test it, but since we are working with operators and controller-runtime we have one limitation. controller-runtime creates a new Registry which usually does not work with these packages, the only workaround I was able to do was override the controller-runtime registry, but we would lose all default exported metrics. And go-belt is managing its own repository.

At the time, I found this issue investigating a library that, personally, I think was even more interesting than go-belt, Kubernetes component base for metrics, which is used by most of the Kubernetes components, but unfortunately has these compatibility issues with sigs.k8s.io/controller-runtime/pkg/metrics. They already have support for all metric types, stability levels, and also tracing and logs.

@iholder101
Copy link
Contributor

@iholder101 First of all thank you for your review.

And I agree with you. I think a declarative configuration for metrics and rules is a solution I would like to have one day. Unfortunately, for now, I think that goal would be way down the road since, in my opinion, we have increasingly more technical debt and it is becoming more difficult to update and 'evolve' monitoring features.

In this stage, my goal is to, as you said, align all repositories with a certain structure, have a package that all repositories import and use (reducing code duplication and making it easier to add new features for monitoring in the future), and cleanup business logic code.

This way, in the future, I think it would be easier to update the library package to parse and registry metrics and rules from declarative configurations, move the collection logic of external resources to external repositories, and even have an external controller/operator if needed/wanted.

Hey @machadovilaca!

Let's back down a bit and discuss what is the problem we really aim to solve here.

The basic problem here, as I see it, is that every change regarding monitoring needs to happen at many different places. For example, if another label needs to be added to a certain metric, there's a need to perform changes at many different repositories. This is both hard, takes a lot of time, requires going through code-reviews, costs CI resources and makes it mentally harder to decide to change anything.

The reality is that it's very hard maintaining monitoring, that's supposed to be aligned and uniform, between many different repositories. A side-effect of this approach is both that repositories diverge from one another over time, making it even harder to maintain as time goes by.

As I see it, this proposal suggests to invest a lot of effort in trying to solve the side-effect problem, by making conventions and linters that would try to reduce the divergence of repos from one another. My humble opinion is that we should invest, even if it would take longer, in a solution that solves the root cause of evil here, which is performing changes in many repos manually.

What concerns me the most in this approach:

  1. This effort is not negligible at all. It demands once again to perform changes at many different places, which obviously includes code reviews, CI resources, etc.
  2. I don't see how this can be enforced. Even with linters and conventions, it's likely that some of the repositories would diverge over time.
  3. And this is the most important bit, we don't solve the root issue. All of this effort is invested just to keep the problem of making PR to many different repos for every change, but make it a bit less severe.

I think that a better approach going forward is to introduce some kind of a framework / library that repositories can import and have under /vendor. This way, if something needs to change, all needs to be done is to make the changes in one repo, then only bump the version for many repos. Even if it's harder to accomplish and would take more time, it's at least directed at solving the root problem. This would make a huge difference, would ensure uniformity between all repos, save time (for example w.r.t. code-reviews), CI resources, and would make it mentally easier for you to decide about future changes.

WDYT?

@enp0s3
Copy link
Contributor

enp0s3 commented Jun 1, 2023

@machadovilaca @iholder101 More or less what I was thinking of:

  • Encapsulate the monitoring best-practice and the common patterns into a framework/library and put in in /vendor
  • De-couple the monitroing bits from the main flow and put them into a consolidated place in each project, put OWNERS file under the directory thus making the monitoring team the maintainers of the that piece of code.
  • Enforce stability by automation as much as possible, leaving the non-trivial parts for human review, thus implementing linters.
  • Document how to use the monitoring framework for adding new metrics and rules.
  • Ensure proper use of the framework by manual reviews

avlitman added a commit to avlitman/ssp-operator that referenced this pull request Feb 14, 2024
Following the work started in kubevirt/kubevirt#10044 , and according to
the kubevirt/community#219 proposal, this PR refactors monitoring
recording rules and alerts

Signed-off-by: avlitman <alitman@redhat.com>
assafad added a commit to assafad/kubevirt that referenced this pull request Mar 6, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/kubevirt that referenced this pull request Mar 6, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/kubevirt that referenced this pull request Mar 7, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/kubevirt that referenced this pull request Mar 7, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/kubevirt that referenced this pull request Mar 7, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/kubevirt that referenced this pull request Mar 7, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/kubevirt that referenced this pull request Mar 7, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/kubevirt that referenced this pull request Mar 7, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/kubevirt that referenced this pull request Mar 10, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/cluster-network-addons-operator that referenced this pull request Mar 17, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/ssp-operator that referenced this pull request Mar 19, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/ssp-operator that referenced this pull request Mar 19, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/ssp-operator that referenced this pull request Mar 19, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/ssp-operator that referenced this pull request Mar 19, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/cluster-network-addons-operator that referenced this pull request Mar 20, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/cluster-network-addons-operator that referenced this pull request Mar 20, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/hostpath-provisioner-operator that referenced this pull request Mar 20, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
kubevirt-bot pushed a commit to kubevirt/cluster-network-addons-operator that referenced this pull request Mar 25, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
assafad added a commit to assafad/hostpath-provisioner-operator that referenced this pull request Mar 25, 2024
Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>
kubevirt-bot pushed a commit to kubevirt/hostpath-provisioner-operator that referenced this pull request Mar 25, 2024
* Apply monitoringlinter

Apply monitoringlinter, which is designed to enforce
kubevirt/community#219 by ensuring that monitoring-related practices are
implemented within the pkg/monitoring directory using
operator-observability methods.

Signed-off-by: assafad <aadmi@redhat.com>

* Apply operator-observability rules linter

Apply operator-observability rules linter which ensures that kubevirt
alerts and recording rules definitions are following the monitoring best
practices.

Signed-off-by: assafad <aadmi@redhat.com>

---------

Signed-off-by: assafad <aadmi@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants