-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow caching regular expression matching in rules #518
Allow caching regular expression matching in rules #518
Conversation
46dc03f
to
427dcc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick peek, this has a memory leak as you never remove old cache entries.
Having the setting apply to the whole config also means that a single string mbean that you want to work with would prevent you from being able to use this, per-rule would make more sense.
04db9ae
to
02c4605
Compare
Good catch, I've added some logic to only retain beans that have just been scraped in the cache. Regarding the per-rule caching, I think it makes sense to try to have per-rule caching. However I'm not able to find a decent way to support all of: per-rule caching, caching "unmatched rules" and allowing bean value in patterns. If we allow patterns with the bean value in them, caching a miss for a given value will lead to false negatives; on the other hand, not caching unmatched rules will limit the performance of the cache. I could try to "detect" rules that have a bean value in them and treat them differently, but that seems really prone to errors too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not caching unmatched rules will limit the performance of the cache.
This is what I'd suggest. If it matters to a user, the impact can be minimised by having those rules last.
19ce83f
to
73ea1a6
Compare
Are you suggesting having a catch-all rule at the end of the list of rules, that would effectively cache unmatched bean names (at the expense of actually exposing them)? If so, that sounds reasonable to me. If not, happy to hear more about the idea! In the meantime, I had switched to having a per-rule flag to cache regular expression matching. The code looks quite different from before (with the introduction of Commit message
|
I'm not, that'd be potentially incorrect. I'm proposing that the setting be per-rule, and if this is a concern for the user that they put the non-cached rules last so that they're not applied to everything that has already been matched. There's also already a way to halt processing of a given attribute. |
d4abe90
to
648c7b3
Compare
Gotcha. I've had another go at this, and removed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that scrapes can be concurrent.
collector/src/main/java/io/prometheus/jmx/MatchedRulesCache.java
Outdated
Show resolved
Hide resolved
collector/src/main/java/io/prometheus/jmx/MatchedRulesCache.java
Outdated
Show resolved
Hide resolved
This improves performance for some use cases. The exporter is configured with a set of rules to match regular expressions against bean names. Regular expressions can be CPU-intensive, and matching the same bean names over the same patterns is time-consuming when there are tens of thousands of bean names (e.g: exposed by Kafka). Using a single `ConcurrentHashMap` to store the result of pattern matching here. The `MatchedRule` class was introduced here to "store" this result. Each rule in the configuration leads to a `MatchedRule` which helps build the set of metrics to expose to Prometheus. The mapping between a bean name + attributes to a `MatchedRule` is done once. Some (in Kafka, most) bean names may not match any rule, so caching of `MatchedRule.unmatched()` helps prevent re-computation of bean names to non-existent rules. The logic to map a bean name to a Prometheus MetricFamilySample is unchanged from the upstream, the `MatchedRule` was introduced only to be able to cache the results. Caching is disabled by default (behaviour unchanged), and can be enabled by setting `cacheRules: true` in the configuration. Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Instead of having a global `cacheRules` flag, have a per-rule `cache` flag (default: false). Rules that do not have the `cache` flag set will have the default behaviour. In order to allow caching of rules with values that change over time (e.g: gauges), a new per-rule `matchBeanValue` flag has also been added (default: true). Disabling the flag will not add the bean value to the expression when matching against the list of rules (slightly different behaviour than before, but toggleable per rule) and allow caching of just the bean name (no matter the value). Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Always match on bean value, but cache names without the bean values. Signed-off-by: Flavien Raynaud <flavr@yelp.com>
It adds the result of the rule matching to the cache of matched rules (if enabled). Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Record count of cached beans by relying on the StalenessTracker to avoid race conditions between scrapes. Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Signed-off-by: Flavien Raynaud <flavr@yelp.com>
b117b4e
to
52bfec5
Compare
collector/src/main/java/io/prometheus/jmx/MatchedRulesCache.java
Outdated
Show resolved
Hide resolved
collector/src/main/java/io/prometheus/jmx/MatchedRulesCache.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Flavien Raynaud <flavr@yelp.com>
… lock Also populate the MatchedRulesCache with the set of rules from the config. Signed-off-by: Flavien Raynaud <flavr@yelp.com>
To avoid potential race conditions where a scrape could start with a Config object and finish with another one. Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Signed-off-by: Flavien Raynaud <flavr@yelp.com>
It is easier to understand and will not yield a big performance hit. Also make `maybeReloadConfig` return the new Config. Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Signed-off-by: Flavien Raynaud <flavr@yelp.com>
Thanks! |
Awesome! Thank you for all the constructive feedback! |
* Allow caching regular expression matching in rules This improves performance for some use cases. The exporter is configured with a set of rules to match regular expressions against bean names. Regular expressions can be CPU-intensive, and matching the same bean names over the same patterns is time-consuming when there are tens of thousands of bean names (e.g: exposed by Kafka). Using a single `ConcurrentHashMap` to store the result of pattern matching here. The `MatchedRule` class was introduced here to "store" this result. Each rule in the configuration leads to a `MatchedRule` which helps build the set of metrics to expose to Prometheus. The mapping between a bean name + attributes to a `MatchedRule` is done once. Some (in Kafka, most) bean names may not match any rule, so caching of `MatchedRule.unmatched()` helps prevent re-computation of bean names to non-existent rules. The logic to map a bean name to a Prometheus MetricFamilySample is unchanged from the upstream, the `MatchedRule` was introduced only to be able to cache the results. Caching is disabled by default (behaviour unchanged), and can be enabled by setting `cacheRules: true` in the configuration. Signed-off-by: Flavien Raynaud <flavr@yelp.com> * Cache regular expression matching per rule Instead of having a global `cacheRules` flag, have a per-rule `cache` flag (default: false). Rules that do not have the `cache` flag set will have the default behaviour. In order to allow caching of rules with values that change over time (e.g: gauges), a new per-rule `matchBeanValue` flag has also been added (default: true). Disabling the flag will not add the bean value to the expression when matching against the list of rules (slightly different behaviour than before, but toggleable per rule) and allow caching of just the bean name (no matter the value). Signed-off-by: Flavien Raynaud <flavr@yelp.com>
@qinghui-xu thanks for this improvement. Would it make sense to add an example of how to use cache functionality in https://github.com/prometheus/jmx_exporter/blob/master/example_configs/kafka-2_0_0.yml ? |
PRs welcome to update the existing example configs to use this. |
This improves performance for some use cases.
The exporter is configured with a set of rules to match regular
expressions against bean names. Regular expressions can be CPU-intensive,
and matching the same bean names over the same patterns is
time-consuming when there are tens of thousands of bean names (e.g:
exposed by Kafka). Using a single
ConcurrentHashMap
to store the resultof pattern matching here.
The
MatchedRule
class was introduced here to "store" this result.Each rule in the configuration leads to a
MatchedRule
which helpsbuild the set of metrics to expose to Prometheus.
The mapping between a bean name + attributes to a
MatchedRule
isdone once. Some (in Kafka, most) bean names may not match any rule,
so caching of
MatchedRule.unmatched()
helps prevent re-computationof bean names to non-existent rules.
The logic to map a bean name to a Prometheus MetricFamilySample is
unchanged from the upstream, the
MatchedRule
was introduced only tobe able to cache the results.
Caching is disabled by default (behaviour unchanged), and can be enabled
by setting
cacheRules: true
in the configuration.In our case, this lead to reducing collection time from ~70 sec to ~3 sec on
some of our applications (Kafka brokers). I opened a thread on the mailing list
a couple of weeks ago and haven't heard back, but definitely happy to discuss
and make changes if needed, if you think this is worth merging. 🙂