-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus ClassCastException while scraping (Micrometer 1.13 / SB 3.3) #5150
Comments
Thanks for the issue and thanks Andy for the reproducer! WorkaroundIt seems this is only an issue with the Prometheus 1.x Client, using the 0.x Client does not seem to have this issue. As a workaround you can downgrade to the 0.x client by using Investigation NotesHere's another reproducer also proving that the 0.x Client does not have this issue (commented out): io.micrometer.prometheusmetrics.PrometheusMeterRegistry registry = new io.micrometer.prometheusmetrics.PrometheusMeterRegistry(io.micrometer.prometheusmetrics.PrometheusConfig.DEFAULT);
// io.micrometer.prometheus.PrometheusMeterRegistry registry = new io.micrometer.prometheus.PrometheusMeterRegistry(io.micrometer.prometheus.PrometheusConfig.DEFAULT);
Timer.builder("test").tag("index", "1").publishPercentileHistogram(false).register(registry).record(Duration.ofMillis(100));
Timer.builder("test").tag("index", "2").publishPercentileHistogram( true).register(registry).record(Duration.ofMillis(100));
System.out.println(registry.scrape()); Doing such thing seems quite weird to me, it produces an output like this:
This means that Let me reach out to the Prometheus Client maintainer but right now I feel the output should not be considered as valid and this should be fix in your code by
|
Thanks for investigation; I've applied your mentioned workaround with adding histogram to all those cases. Example / Context / Reasoning behind this: We use |
If you are using a framework like Spring Boot, Quarkus, Micronaut, etc, those controllers are already instrumented so you don't need to re-instrument them with Btw, you can apply |
After talking to the maintainer of the Prometheus Client, there is one more workaround you can do right now: in Prometheus the histogram is valid if it has a single bucket: PrometheusMeterRegistry registry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
Timer.builder("test").tag("index", "1")
.serviceLevelObjectives(Duration.ofNanos(1)) // or you can use any meaningful value, like 100 ms
.register(registry)
.record(Duration.ofMillis(100));
Timer.builder("test").tag("index", "2")
.publishPercentileHistogram( )
.register(registry)
.record(Duration.ofMillis(100));
System.out.println(registry.scrape()); Doing this means that sometimes you will have all the buckets and sometimes you will have only two buckets (not sure if it is good enough in your case). Unfortunately I think I still not recommend doing the above since the buckets are different and because of that they are not aggregatable which kind of defeats the purpose of the histogram and I would still go with having summary and histogram with different names but since you are already doing this, you probably already aware of the consequences. With this, let me close this issue, please let us know if these don't help and we can reopen. |
Hello, the issue exists if the 'publishHistogram()' can not be used. The same ClassCastException occur if a histogram with defined ranges is used:
From my point of view this is a breaking change as Meter with different tags are a normal case to differentiate results in Grafana. Such workarounds does not help. A solution without ClassCastExceptions and at least error reporting of problematic meters is required. The current implementation does not report ANY metric back to Prometheus. Please recheck the decision to close the issue. |
Thanks for raising this issue. This is a limitation of the Prometheus and the OpenMetrics formats, I wrote this down in the Migration Guide. Micrometer using the old Prometheus Java Client (0.x) ( Since the new Prometheus Java Client (1.x) makes it impossible to bring back the old behavior which I'm not sure we should (it was invalid), I'm not sure what should we do on Micrometer side. Please let us know if you have any idea. Also, please notice that if any changes mentioned in the Migration Guide impacts you negatively and you don't want to migrate right now, you can still use the new Prometheus Java Client (0.x) with Micrometer, you "just" need to change the dependency from
Please see the Migration Guide for more details. |
Thanks for your quick response @jonatan-ivanov, I'm a colleague of @opendevrunner. The problem is not really the case where the client produces invalid output, more the case where types are mixed. One use case we have is basically a generic performance measurement library that's using the same name (but different tags) for the different measured operations. In addition, the different operations can be configured independently with for additional SLO or percentiles or no additional details at all, whatever is needed or makes sense for the specific operation. There is no need to merge histograms or so for different operations, they're completely independent and just share the same name. Operations are grouped and there is a generic board in Grafana showing the data for all operations with grouping and filtering. There are different panels for the optional SLOs etc. - some of them stay empty if not reported - but that's ok. Using different names for different types would break / complicate this approach. |
That's what I was referring as "invalid output": Micrometer mixes the
According to the Prometheus and the OpenMetrics specifications, this is invalid.
If they share the same name they are not independent, they belong to the same family so they must have the same type. If I understand correctly, this is what you want:
If so, please notice that:
If not, please let us know what you would like the Prometheus output to look like. I think quantiles are not allowed on histograms since if you have a histogram, you can calculate the quantiles on the backend. Except in your case, since you use the histogram for SLOs, so you only have a few buckets and this calculation does not make sense since it would be inaccurate (you need more buckets for this: |
Describe the bug
Multiple Methods annotated with
@Timed
(or creating a Timer by hand) with the samevalue
to group them together throws aClassCastException
oncehistogram
is enabled on some - but not on all.Environment
To Reproduce
Smallest example by Andy Wilkinson (wilkinsona) (don't wanna spam tag him)
Example with Spring involved: spring-projects/spring-boot#40907 (comment)
Expected behavior
No exception; using Timer with and without histogram should work.
Additional context
Originally reported here: spring-projects/spring-boot#40907
The text was updated successfully, but these errors were encountered: