Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a method to clear the registry #184

Closed

Conversation

matthieuprat
Copy link
Contributor

Clearing the registry is something we would typically do in development
and test environments.

For instance, in Rails apps, this method can be used when reloading
classes so that auto-loaded files that register Prometheus metrics can
be reloaded:

Rails.application.reloader.after_class_unload do
  Prometheus::Client.registry.clear
end

Not clearing the registry before classes are reloaded would result in
the registry raising an AlreadyRegisteredError when a file
registering a metric is reloaded.

Signed-off-by: Matthieu Prat matthieuprat@gocardless.com

Clearing the registry is something we would typically do in development
and test environments.

For instance, in Rails apps, this method can be used when reloading
classes so that auto-loaded files that register Prometheus metrics can
be reloaded:

```rb
Rails.application.reloader.after_class_unload do
  Prometheus::Client.registry.clear
end
```

Not clearing the registry before classes are reloaded would result in
the registry raising an `AlreadyRegisteredError` when a file
registering a metric is reloaded.

Signed-off-by: Matthieu Prat <matthieuprat@gocardless.com>
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 7999d5d on matthieuprat:mprat/registry-clear into acd1466 on prometheus:master.

Copy link

@nickcampbell18 nickcampbell18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is clear the right word here, or maybe reset?

Copy link
Collaborator

@dmagliola dmagliola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder where this belongs...
For example, if you look at client.rb...

    def self.registry
      @registry ||= Registry.new
    end

It feels to me that you want to init a new Registry, rather than take the old registry and clear it...
Because this is not a thing you'd normally do to an actual Registry, it's more of a "global config" thing...

To be honest, I thought there was also a self.registry= in there, that'd allow you to swap the "default registry". I'm not sure that's the answer either, as having that setter might be confusing as to what it's there for.

But I wonder whether you have a Prometheus::Client.reset_registry method that just @registry = Registry.new.

NOTE: A large part of the reason i'm hesitating about this is that I don't see a clear precedent in any of the other clients (Python and Go, specifically, which are what I consider the most "canonical" one), so i'm a bit unsure whether this is the right thing to do. I think what we're doing here is a very Ruby thing, though.

A naive question... On that after_unload code you showed... (I have no idea how the autoloader works)
When that runs, did all of the code get reloaded? Is there any chance some classes reloaded and some didn't (for example, if it only reloads changed files, which is what I thought Rails Server did)

It maybe isn't a problem, because the classes that didn't get reloaded are pointing to still valid instances of Metric, they're just not in the default registry / in a registry anymore, which you only care about when scraping, but it feels a bit weird to me to leave those kinda dangling. Again, maybe it doesn't matter because "it's just test".

Finally, could we add a nice comment in the code above the method (whichever we go with) explaining what the method is for, and a section in the README going "if you have this problem when testing, this snippet of code is how you could solve it".

@@ -83,6 +83,10 @@ def get(name)
def metrics
@metrics.values
end

def clear
@metrics.clear
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mutex.synchronize please :). (if we're going with this code, which I'm not sure we are, so maybe ignore me for now)

And come to think of it... Can we also please synchronize the 3 methods above? I'm not entirely sure how I didn't see that! 🙈

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: #201

@matthieuprat
Copy link
Contributor Author

matthieuprat commented Apr 9, 2020

But I wonder whether you have a Prometheus::Client.reset_registry method that just @registry = Registry.new

I'd be fine with that!

A large part of the reason i'm hesitating about this is that I don't see a clear precedent in any of the other clients

That sounds like a valid concern to me. To be fair, we could work around the lack of a way to clear the registry by keeping track of all the metrics we've registered and un-registering them manually in userland. Feels a bit tedious though.

When that runs, did all of the code get reloaded? Is there any chance some classes reloaded and some didn't (for example, if it only reloads changed files, which is what I thought Rails Server did)

Yes, there's a chance.

To clarify, when a reload is needed, Rails merely unloads all autoloaded classes. (An important thing to note here is that Rails doesn't only unload classes that changed but all of them. See Constant Reloading in the Rails docs.)

Then, classes are autoloaded as they normally would — that is, when they are referenced.

it feels a bit weird to me to leave those kinda dangling

I had a similar concern and figured that the registry exposed a method to unregister a metric — so dangling metrics are already a thing, right?

Finally, could we add a nice comment in the code above the method (whichever we go with) explaining what the method is for, and a section in the README going "if you have this problem when testing, this snippet of code is how you could solve it".

Happy to once we've agreed on the approach 😃

@dmagliola
Copy link
Collaborator

we could work around the lack of a way to clear the registry by keeping track of all the metrics we've registered and un-registering them manually

Yeah, no, that sucks as an experience.

Let's do a Client.reset_registry method that just makes a new registry, document it with a code comment above it saying this is intended to be used for use of testing and not in production, and a nice section in README.md on how to test your code that has metrics by doing this.

And I hope i'm not missing any weird edge cases where this is a terrible idea 😅

@matthieuprat
Copy link
Contributor Author

Sounds good! I'll follow up with another PR.

@matthieuprat matthieuprat deleted the mprat/registry-clear branch April 15, 2020 08:59
@jcoconnor
Copy link

@matthieuprat - Did this ever get followed up - I could say asking for a friend, but think I'm hitting an issue with this in my test specs and would be really useful to have a reset_registry method.

@matthieuprat
Copy link
Contributor Author

matthieuprat commented Aug 8, 2020

@jcoconnor Not yet, but planning to action this in the next couple of weeks. (Unless you'd be keen to give it a go?)

@jcoconnor
Copy link

Thanks @matthieuprat - probably won't get it anytime soon at my end.

matthieuprat added a commit to matthieuprat/prometheus-ruby-client that referenced this pull request Aug 25, 2020
The registry is backed by a Hash, which is not guaranteed to be thread
safe on all interpreters. For piece of mind, this change synchronizes
all accesses to the metrics hash.

Another option would have been to use a [thread-safe Hash][1] instead of
a Hash but this would have meant adding Ruby Concurrent as a
dependency, which I'm assuming we don't want.

Ref: prometheus#184 (comment)

[1]: https://github.com/ruby-concurrency/concurrent-ruby/blob/v1.1.7/lib/concurrent-ruby/concurrent/hash.rb
matthieuprat added a commit to matthieuprat/prometheus-ruby-client that referenced this pull request Aug 25, 2020
The registry is backed by a Hash, which is not guaranteed to be thread
safe on all interpreters. For piece of mind, this change synchronizes
all accesses to the metrics hash.

Another option would have been to use a [thread-safe Hash][1] instead of
a Hash but this would have meant adding Ruby Concurrent as a
dependency, which I'm assuming we don't want.

Ref: prometheus#184 (comment)

[1]: https://github.com/ruby-concurrency/concurrent-ruby/blob/v1.1.7/lib/concurrent-ruby/concurrent/hash.rb

Signed-off-by: Matthieu Prat <matthieuprat@gocardless.com>
matthieuprat added a commit to matthieuprat/prometheus-ruby-client that referenced this pull request Aug 25, 2020
The registry is backed by a Hash, which is not guaranteed to be thread
safe on all interpreters. For peace of mind, this change synchronizes
all accesses to the metrics hash.

Another option would have been to use a [thread-safe Hash][1] instead of
a Hash but this would have meant adding Ruby Concurrent as a
dependency, which I'm assuming we don't want.

Ref: prometheus#184 (comment)

[1]: https://github.com/ruby-concurrency/concurrent-ruby/blob/v1.1.7/lib/concurrent-ruby/concurrent/hash.rb

Signed-off-by: Matthieu Prat <matthieuprat@gocardless.com>
@matthieuprat
Copy link
Contributor Author

Sorry for the late follow-up.

I've taken a closer look today and the registry of the Java client library actually has a clear method.

@dmagliola Should we stick with what we originally discussed (Client.reset_registry) or go with Registry#clear?

@dmagliola
Copy link
Collaborator

Interesting. The Java library also has unregister. I haven't found any docs with examples on when you should use those, so i'm not sure whether they're meant for testing or for regular use.

I also found this, which is slightly off-topic, but there's at least one user out there clearing their registry in a way that sounds frustrating: https://stackoverflow.com/questions/58688050/handling-stale-data-in-prometheus-gauges

Let's follow Java, and have the clear method, and document in the README that you should probably only be using that for tests.

I'm still not too sure about that after_class_unload code, it feels like a potential footgun, but I can see the "reload" issue. This happens in dev rather than test, right? For example when testing stuff in a Rails console?

@matthieuprat
Copy link
Contributor Author

This can happen anytime classes are reloaded. Here are some examples:

  • When running reload! in a Rails console.
  • When running a development server and changing an autoloaded file (e.g., a model).
  • When running tests with Spring and changing an autoloaded file.

To give you a concrete example, let's imagine you have a controller which registers a metric:

class PingController < ApplicationController
  PingCounter = Prometheus::Client.registry.counter(:pings_total)
end

If you have Spring enabled, run the PingController tests, change any file that was autoloaded, and run the tests again, you'll get the following error:

Prometheus::Client::Registry::AlreadyRegisteredError:
  pings_total has already been registered

This is because when you run the tests again, Rails detected that you changed a file and that it needs to reload all autoloaded classes (PingController being on of them). To do that, it unloads all autoloaded classes and loads them again.

The idea behind the after_class_unload "trick" is to clear metrics before classes are loaded again.

@matthieuprat
Copy link
Contributor Author

I found a similar use case here: https://github.com/tansengming/stripe-rails/pull/137/files

Although, I'm not saying the Prometheus client should necessarily do that. This PR is merely about allowing people to clear the registry if they want to, not clearing it for them automagically.

dmagliola pushed a commit that referenced this pull request Jun 19, 2021
The registry is backed by a Hash, which is not guaranteed to be thread
safe on all interpreters. For peace of mind, this change synchronizes
all accesses to the metrics hash.

Another option would have been to use a [thread-safe Hash][1] instead of
a Hash but this would have meant adding Ruby Concurrent as a
dependency, which I'm assuming we don't want.

Ref: #184 (comment)

[1]: https://github.com/ruby-concurrency/concurrent-ruby/blob/v1.1.7/lib/concurrent-ruby/concurrent/hash.rb

Signed-off-by: Matthieu Prat <matthieuprat@gocardless.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants