-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better control over Prometheus metrics #4016
Comments
I'm not sure that Also, please remember that what's important to think about is how it's configured in JSON, before even thinking of how it'll look in the Caddyfile. The Caddyfile is just an adapter to JSON config. One thought though is maybe an API for the Caddyfile like this:
This would imply having a JSON handler like this: {
"handler": "metric",
"tags": {
"host": "{http.request.host}",
"content-type": "{http.response.header.Content-Type}"
},
"disable": false
} And I figure this would have to be a handler that defers execution until the middleware chain is coming back downstream if we plan to support reading from the response headers. /cc @hairyhenderson |
@francislavoie, it's OK, because Caddy already sends response code to metrics. |
I haven't had time to think too deeply about this yet, but in general all labels should be set on all endpoints. That is to say, if a Another way to say that is - no per-matcher/per-path metrics options, metric configuration should apply globally. The one potential exception to this is we may want to have some way to ignore certain endpoints. But that doesn't need to be part of this design. I don't think it's necessary to have a new directive - we should be able to use the existing And, I like the idea that @francislavoie implied with having custom labels that simply use a placeholder (or even a static value) as the value; I could see a directive like this:
I'm not at all familiar with the placeholder code but I think this would be relatively straightforward to do. I'll see if I can find some time in the next few weeks to work on this, but I'd be very open to contributions for a design like this 😅 |
If that's the case, then the existing |
I spoke with @hairyhenderson and ultimately I don't think this'll be possible in any way that doesn't have drastic downsides. The trouble is that metrics are configured at process startup, and can't be safely reconfigured because the prometheus metrics lib aims to remain consistent for the lifetime of the process. If a reconfiguration were to happen, then the entire metrics storage would need to be thrown away and reinitialized. This is not ok, because certain counters need to remain because they track other global state. Since Caddy is designed to have reloadable config, without restarting the process, this is at odds with that idea. Theoretically we could delegate initialization of the metrics until after the config is initially loaded, but this would mean that any subsequent config change that attempts to change metrics config would have to be rejected. Not ideal. Breaks certain assumptions made in Caddy's architecture. So to avoid needing to open the possibility for reloads to fail, any configuration for metrics would have to be provided separately, specifically at process startup. This might mean having a separate config mechanism outside of the regular configs (e.g. environment variables, or new CLI args to So... what do you think @mholt? I don't see a happy path here. |
I'm not at all familiar with the metrics lib used, but would it be possible to contribute a patch to the lib that allows re-configuration at runtime? |
I don't think it would be easy. It seems fundamental to how its designed. The lib in question is https://github.com/prometheus/client_golang/tree/master/prometheus. I know that @hairyhenderson or I aren't interested in working on that, so someone else would need to work on it if they're up for it. |
Just to add a bit more clarity on the issue - when exposing metrics with Prometheus, the expectation is that all metrics will remain consistent for the lifetime of the process, both in terms of value (a counter which starts at Another thing to consider is that Caddy exposes metrics on the admin API. An important metric for monitoring Caddy is the Bottom line: we can't move forward with any customization of the metrics series (i.e. labels) until we can find a suitable config mechanism that doesn't require resetting at config reload time. I hope @mholt has some ideas on that. |
That shouldn't be a problem. Caddy has something called a |
@mholt I think you misunderstood, metrics are already global. The problem is that the metrics cannot ever change after the first load. |
@francislavoie What does their scope have to do with whether they change or not? |
Because the only way to change scope involves resetting the metrics storage. As we're written above, that's a deal-breaker, we can't reset the counters during config reloads, because many of them need to stick around because they track global state. Like I wrote above, the way prometheus metrics work is fundamentally incompatible with config reloads. |
That sounds like a pretty severe limitation in Prometheus, which is surprising. I don't know how I could be of help here, as this is a bug in Prometheus (which I'm not familiar with), not Caddy. |
It's not a "bug" though, it's an assumption that underlies the metrics tracking system. The question is this:
and
|
It's a broken assumption then. Servers change their configs, it's as simple as that. Prometheus should know that by now. |
We can probably use https://github.com/VictoriaMetrics/metrics which supports creating metrics on the fly. I believe what you want to use is creating a It's also a nicer, less dependency alternative to the official client lib. |
@mr-karan I'm not sure that's exactly what we need. What's lacking is not the ability to create metrics "on the fly" (which the Prometheus client can obviously also do), but instead the ability to modify metrics after being created. If that's possible with VictoriaMetrics, then that indicates a bug in that client, not a feature. Metrics need to be guaranteed to stay consistent through the life of the process. |
To demonstrate with an example:
You want this to be modified to something else after the reload, like:
Is my understanding correct? |
Leaving this here in case it helps - If you use telegraf to scrape metrics you can easily rename the tag values: manojm321/ansible-synology@83ca673 |
This issue doesn't seem to be getting traction. It would be nice to have the labels capability. |
On one hand, with lack of lables, prometheus metrics make barely more sense than measuring nationwide average body temperatures. On the other hand, with too many labels, in prometheus might be a source of unbounded high-cardinality, which will cause issues when scraping with Prometheus over long periods of time and with larger deployments.
In the usecase we had, we needed to see per-host metrics (for an obvious reason, different hosts often have very different performance), and also a per-content-type metric, here's why:
Dynamically generated resources usually have way bigger TTFB, and mixing such resources with static files we obscure actual peaks in data, because each dynamically generated HTML document also pulls 10 to 100 static files. Content-type is a good heuristic for dynamic content, because html is usually dynamically generated, and everything else is static in most cases
No matter if html is dynamic or static, its performance is more critical for overall page load time, because it's the root document for the page, and each millisecond of its load time is always added to overall page load time, which is not always the case for page resources.
We intended to create two extra grafana panels, to be able to see metrics subsets that represent:
Also, it might be useful to separate metrics by document cache status, when caddy cache is enabled.
Based at the discussion in this ticket: #3784 and in this PR: #4015
We came up with a following proposal:
metrics
directive, because it defines the virtual host that serves metrics if defined inside of a specific hostmetrics_options
as new directive name; does anyone have better ideas maybe?The text was updated successfully, but these errors were encountered: