Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove metrics global registry #210

Open
lilic opened this issue Nov 13, 2018 · 10 comments
Open

Remove metrics global registry #210

lilic opened this issue Nov 13, 2018 · 10 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@lilic
Copy link

lilic commented Nov 13, 2018

I would like to suggest to remove the global registry as currently the controller-runtime global registry does not capture metrics registered onto prometheus.DefaultRegisterer. As a step forward and in line with the prosed KEP, the controller-runtime should be able to have the registry injected. (for a start with prometheus.DefaultRegisterer, with the long term goal of removing the global registry in the first place, I am open for suggestions on that.)

As per the proposal kubernetes/community#2909:

Kubernetes also makes extensive use of a global metrics registry to register metrics to be exposed. Aside from general shortcomings of global variables, Kubernetes is seeing actual effects of this, causing a number of components to use sync.Once or other mechanisms to ensure to not panic, when registering metrics. Instead a metrics registry should be passed to each component in order to explicitly register metrics instead of through init methods or other global, non-obvious executions.

Lastly, this may be my own opinion, but introducing globals as a library hides its dependencies, therefore I think having the registry be injected makes the API more clear.

@DirectXMan12
Copy link
Contributor

Disclaimer: the below represents my current thoughts, but I'm generally open to being convinced otherwise :-)

The original goal was to make registering metrics "easy", while also not necessarily polluting the default Prometheus registry. IMO, this is a pattern that the default Prometheus docs get right, especially for smaller projects -- since the metrics registry is in a global, there's very little code or cognitive overhead in registering a new metric -- you just declare it at the top of your file, and start using it. Fundamentally, most programmers are lazy, and so the easier we can make it to add metrics, the more likely people are to do it. I don't want to give people excuses to put off writing metrics until later :-). Overhead leads to all sorts of excuses -- "oh, we can't put metrics here without refactoring to plumb through the registry", "oh we can't put metrics here because this is a free-standing function and not a method", etc. You might still get metrics, but you'll miss out on some that people didn't think were worth the effort.

With that motivation in mind, I think our current setup is a decent compromise -- it doesn't use the default registry -- it's actually separate, so you can choose to use it or not use it. However, you still get a similar feel: instead of just importing prometheus you have to import prometheus and metrics, but you get a similar feel -- declaring metrics.Registry.MustRegister at the top of your file, and then using the metrics as normal.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 27, 2019
@lilic
Copy link
Author

lilic commented May 27, 2019

/remove-lifecycle rotten

Personally, still think we should move away from it :)

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 27, 2019
@DirectXMan12
Copy link
Contributor

yeah, I think this still bears further discussion. Likely what'll happen is that we'll eagerly go the direction that k/k goes -- if it looks like k/k is switching to opencensus/opentelemetry, we'll head in that direction eagerly (I expect we'll need to wait for opentelemetry's reference Go client in september or so, unless the opencensus --> opentelemetry switch is easy). At any rate, I expect when we switch we'll want to go in the direction of something like opentelemetry to make plumbing thing through easier.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2019
@DirectXMan12
Copy link
Contributor

/lifecycle frozen

still want to do this

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 3, 2019
DirectXMan12 pushed a commit that referenced this issue Jan 31, 2020
Update contributing guide to build successfully
@vincepri vincepri added this to the Next milestone Feb 20, 2020
@vincepri
Copy link
Member

Might be related / superseded by #305

/priority backlog

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Feb 20, 2020
@alvaroaleman
Copy link
Member

A backwards-compatible approach to capture metrics from both controller-runtimes registry and prometheus default registry would be to combine them via the prometheus Gatherers

@ash2k
Copy link
Member

ash2k commented Jul 24, 2020

It seems Kubernetes is removing global metrics, if I'm reading the KEP correctly https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/20191028-metrics-stability-to-beta.md - first goal:

No Kubernetes binaries register metrics to prometheus registries directly.

As a user of any library that has global state, I can say that I always wish it didn't have it.

While having global "things" makes it "easy" to start using them (e.g. just import and register a metric!), medium and long term this approach from my personal experience doesn't actually make things simple, quite the opposite. Global loggers, registries, http muxes, default clients, etc all make libraries harder to compose, understand and test. It's very unexpected that things start happening when you just import some package to use a constant, function or type from it. There are a few examples of this being an issue in kubernetes alone - forked glog to remove global flag registration, now removing global metrics registrie. Also related - json-iterator/go#265 - global state makes things fragile (found while hacking on Kubernetes).

Just my $0.02, sorry for the rant :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

7 participants