-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Tomcat metric definitions to aggregate multiple MBeans. #1366
Conversation
Do we really think that aggregating is the correct approach here? For counter instruments (like the sessions originally reported in #1360) maybe this makes sense, but not necessarily for all instruments, right? Isn't it possible that two different MBeans registered with the same name could be returning separate metrics with different units? Units aren't really a first-class thing in JMX, so it requires some out-of-band knowledge I think, and aggregating across different units seems problematic. |
Yeah, this is currently only going to be done for gauge instruments (
I think this comes down to the metric definition files. It's possible to have multiple MBeans grouped together that share the same attribute name that are unrelated and shouldn't be aggregated, but that would be a case of misconfiguration. Even without this change, the current behavior would still be wrong since it'd only report the value from one of the MBeans. |
Hey @jefchien . We talked this over in today's SIG meeting and it sounds like we'd prefer a slightly different approach. In the case where the user has deployed multiple apps to one tomcat instance (or however they get multiple contexts, each with its own MBean manager), rather than aggregating in code like in this PR, we'd be better served by doing multiple recordings with different attributes, and the context name should be an attribute. This allows the metric to be correctly aggregated later, and maintains the per-context data points. Make sense? |
Wouldn't this result in different metrics altogether since that would mean Does this mean that collecting multiple MBeans into a single aggregated metric is not supported? Should setting a metric definition with a wildcard without dimensions for that wildcard be disallowed? Like for example, the ActiveMQ disk metrics where the intention is to aggregate all brokers into a single metric. Should this aggregation always be handled at the collector? opentelemetry-java-contrib/jmx-metrics/src/main/resources/target-systems/activemq.groovy Lines 101 to 124 in 32c7e89
(setting aside that the definition is incorrect and |
(Sorry for hashing this out in your PR, should have done this originally in the issue) First, I agree that the functionality today is not ideal and users are getting surprising and misleading data when there are multiple MBeans that match (like in the tomcat example above). I believe that the problem is no limited to tomcat, and many other targets use wildcard beans, so the problem can exist there as well. I also think that we should work toward making this better. Today, in the cases where a jmx So maybe there are other options, but I see 3 ways the code could resolve this:
#2 above is what this PR is proposing. It assumes that the user always wants the values aggregated when multiple MBeans match the object name query. I have two primary concerns with this:
#3 above is what the SIG basically concluded would be a better approach. For each measurement from each bean, we would include a set of attributes that are different, thereby working around the "it's just using the first one" problem, but also maintaining per-bean granularity that could be externally aggregated if later desired. The simplest way of doing this, IMO, is to add an attribute called Ok, now to address the questions above:
Attributes are technically not part of metric identity, so the short answer is no. The attributes do create different "metric streams" within a single metric. The name/value of each attribute is a metric dimension. This is a decent resource that describes this.
It could. We do not yet have stability guarantees for this component, it has alpha in the version, and so we are allowed to make breaking changes. Of course, it's never great to introduce breaking changes, but keep in mind that it's definitely reporting the WRONG thing now already, so fixing that should be considered a net improvement.
The way I read this today before this PR, the answer is no. I didn't see anything that would aggregate across mbeans.
I don't think so. I think there's value in being able to use multiples...like in the tomcat example above, you could filter just the contexts you care about (eg. remove |
@jefchien I hate to be the solo reviewer and blocker on this, so I want to make a proposal to move this forward. Additional background context: We may see this thing get rewritten without groovy (native java) in the coming months, so expect some additional instability. To move this forward, I'd like to propose that this PR includes the following:
Thanks! |
What you're saying makes a lot of sense and #3 does seem like a better approach, but I think the details for what that looks like needs to be fleshed out a bit more. I appreciate the proposal to unblock this and will add the command line argument to make this opt-in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with this @jefchien ! Appreciate it. Looks like you just need to reformat with ./gradlew spotlessApply
and you should be good.
commit b84a73e Merge: 352202f 7df9862 Author: SylvainJuge <763082+SylvainJuge@users.noreply.github.com> Date: Tue Sep 17 11:13:11 2024 +0200 Merge pull request #2 from SylvainJuge/jmx-scraper-it basic JMX client implementation + tests commit 7df9862 Author: Sylvain Juge <763082+SylvainJuge@users.noreply.github.com> Date: Tue Sep 17 11:11:50 2024 +0200 fix bad merge again commit e4a8f83 Author: Sylvain Juge <763082+SylvainJuge@users.noreply.github.com> Date: Tue Sep 17 11:10:26 2024 +0200 fix bad merge again commit dad197b Author: Sylvain Juge <763082+SylvainJuge@users.noreply.github.com> Date: Tue Sep 17 11:09:18 2024 +0200 fix bad merge commit f8a73d7 Author: Sylvain Juge <763082+SylvainJuge@users.noreply.github.com> Date: Tue Sep 17 11:08:42 2024 +0200 spotless commit eab01e9 Author: Sylvain Juge <763082+SylvainJuge@users.noreply.github.com> Date: Tue Sep 17 11:01:47 2024 +0200 spotless commit f6667dd Merge: 5b3ef9d 352202f Author: Sylvain Juge <763082+SylvainJuge@users.noreply.github.com> Date: Tue Sep 17 10:55:34 2024 +0200 Merge branch 'jmx-scrapper' of github.com:SylvainJuge/opentelemetry-java-contrib into jmx-scraper-it commit 5b3ef9d Author: Sylvain Juge <763082+SylvainJuge@users.noreply.github.com> Date: Tue Sep 17 10:51:11 2024 +0200 add TODOs commit 352202f Merge: 5a82aac 40a2591 Author: SylvainJuge <763082+SylvainJuge@users.noreply.github.com> Date: Tue Sep 17 10:50:50 2024 +0200 Merge pull request #1 from robsunday/jmx-scrapper Initial commit of JmxScraper code. commit 40a2591 Author: jack-berg <34418638+jack-berg@users.noreply.github.com> Date: Mon Sep 16 16:05:50 2024 -0500 Add declarative config support for RuleBasedRoutingSampler (open-telemetry#1440) commit 6d39e93 Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Date: Mon Sep 16 10:40:16 2024 -0700 Update dependency com.uber.nullaway:nullaway to v0.11.3 (open-telemetry#1457) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> commit eb5dc6f Author: Bruno Baptista <brunobat@gmail.com> Date: Mon Sep 16 18:26:53 2024 +0100 Fix native mode error cause by static init of random (open-telemetry#862) commit 5ffa348 Author: jack-berg <34418638+jack-berg@users.noreply.github.com> Date: Mon Sep 16 12:23:41 2024 -0500 Add declarative config support for aws xray propagators (open-telemetry#1442) commit 151b744 Author: SylvainJuge <763082+SylvainJuge@users.noreply.github.com> Date: Mon Sep 16 19:23:19 2024 +0200 add span stacktrace config option (open-telemetry#1414) Co-authored-by: jackshirazi <jacks@fasterj.com> commit 9ee5fb1 Author: Yury Bubnov <ybubnov@gmail.com> Date: Mon Sep 16 10:18:51 2024 -0700 Issue-1034 Short XRay Trace (open-telemetry#1036) commit 11a2e1a Author: Jeffrey Chien <chienjef@amazon.com> Date: Mon Sep 16 13:18:10 2024 -0400 Fix Tomcat metric definitions to aggregate multiple MBeans. (open-telemetry#1366) commit 9fa44be Author: Pranav Sharma <sharmapranav@google.com> Date: Mon Sep 16 11:22:32 2024 -0400 Fix incorrect cloud.platform value for GCF (open-telemetry#1454) commit 11f2111 Author: Peter Findeisen <pfindeis@cisco.com> Date: Mon Sep 16 08:21:46 2024 -0700 Composite Samplers prototype (open-telemetry#1443) Co-authored-by: Otmar Ertl <otmar.ertl@gmail.com> commit b3386de Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Date: Mon Sep 16 08:20:44 2024 -0700 Update plugin com.squareup.wire to v5.1.0 (open-telemetry#1452) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> commit 27b631d Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Date: Mon Sep 16 08:20:21 2024 -0700 Update errorProneVersion to v2.32.0 (open-telemetry#1453) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> commit 791df4b Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Date: Mon Sep 16 09:50:03 2024 -0500 Update dependency io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom-alpha to v2.8.0-alpha (open-telemetry#1456) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Jack Berg <jberg@newrelic.com> commit 19357e6 Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Date: Tue Sep 10 15:20:46 2024 -0700 Update dependency com.gradle.enterprise:com.gradle.enterprise.gradle.plugin to v3.18.1 (open-telemetry#1450) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> commit 460470b Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Date: Tue Sep 10 15:20:23 2024 -0700 Update plugin com.gradle.develocity to v3.18.1 (open-telemetry#1451) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> commit b45cdab Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Date: Tue Sep 10 12:32:56 2024 +0300 Update dependency com.linecorp.armeria:armeria-bom to v1.30.1 (open-telemetry#1449) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> commit d36106e Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Date: Tue Sep 10 09:11:26 2024 +0300 Update micrometer packages to v1.13.4 (open-telemetry#1448) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> commit fb25c79 Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Date: Mon Sep 9 15:44:53 2024 +0300 Update dependency gradle to v8.10.1 (open-telemetry#1447) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> commit f847561 Author: robsunday <rniedziela@splunk.com> Date: Tue Sep 17 09:22:21 2024 +0200 Code review changes: Argument parsing moved to the main class. Argument validation now throw exception. More tests added for config factory. Created unit tests for JmxScraper class. commit 03788ff Author: Sylvain Juge <763082+SylvainJuge@users.noreply.github.com> Date: Mon Sep 16 17:26:12 2024 +0200 add TODO for testing server SSL support commit a3fbeb5 Author: Sylvain Juge <763082+SylvainJuge@users.noreply.github.com> Date: Mon Sep 16 17:08:16 2024 +0200 add TODO to enable server-side SSL commit 1619595 Author: Sylvain Juge <763082+SylvainJuge@users.noreply.github.com> Date: Mon Sep 16 16:59:17 2024 +0200 wip commit 164679f Author: robsunday <rniedziela@splunk.com> Date: Wed Sep 11 12:34:12 2024 +0200 Cleanup of config. Refactoring of JmxScraperConfigFactory. Added first unit tests. commit 165fcb8 Author: robsunday <rniedziela@splunk.com> Date: Tue Sep 10 12:36:29 2024 +0200 Initial commit of JmxScraper code. Application compiles and runs as standalone and can read config file.
Description: Bug fix - The current Tomcat metric definitions use the
otel.mbean
constructor function, which expects the query to the MBean server to return a single MBean. If multiple mbeans are returned, it takes the first one and drops the rest, which results in inaccurate/incomplete metric values. The tomcat metrics are all defined with wildcards (*
), which are meant to match multiple MBeans.Changed the Tomcat metric definitions to use
otel.mbeans
instead, so metric values aren't being dropped. Added an aggregation fordoubleValueCallback/longValueCallback
since both of those observable instruments will only accept the first value per collection interval.Existing Issue(s): #1360
Testing: Added a unit test in the
InstrumenterHelperTest
to account validate the aggregation.Documentation: N/A
Outstanding items: Some activemq and jetty metric definitions need to be adjusted to
otel.mbeans
and tested.