Add new JVM runtime environment metrics #3352

roberttoyonaga · 2023-03-30T18:45:06Z

Changes

This PR adds process.runtime.jvm.cpu.monitor.wait, process.runtime.jvm.cpu.monitor.blocked, process.runtime.jvm.network.io, process.runtime.jvm.network.io, and process.runtime.jvm.cpu.context_switch metrics to the runtime environment metrics.

Metric gathering implementations for these new metrics already exist in a basic form in https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/runtime-telemetry-jfr/library
Once the details around these new metrics are decided, the implementations can be updated.

JFR streaming would be used to gather these metrics. This feature has only been available since JDK 14 so these metrics would only be supported for JDK17+.

Please see original discussion in this PR and at the Java + Instrumentation SIG.

Related issues open-telemetry/semantic-conventions#1222

specification/metrics/semantic_conventions/runtime-environment-metrics.md

jack-berg · 2023-04-05T13:55:00Z

specification/metrics/semantic_conventions/runtime-environment-metrics.md

+|                                                |                                                                                                 |          |                                           |                                                   |            | pool          | Name of pool [1]           |              | Required          |
+| process.runtime.jvm.memory.allocation          | Size of object allocated by thread                                                              | Bytes    | `By`                                      | Histogram                                         | Int64      |               |                            | JDK 17+      | Required          |
+|                                                |                                                                                                 |          |                                           |                                                   |            | thread        | thread ID                  |              | Opt-In            |
+|                                                |                                                                                                 |          |                                           |                                                   |            | class         | Fully qualified class name |              | Opt-In            |


The existing implementation includes an arena attribute instead of class. Class is accessible, but its the class of the object allocated, not the class in which the the allocation occurred, which isn't clear in the description. This could be too high of cardinality even for opt-in.

Ok I'm in favor of replacing the class attribute with arena. Arena should be required I think

With respect to process.runtime.jvm.cpu.monitor.blocked and process.runtime.jvm.cpu.monitor.wait the class attribute references the monitor class. Do you agree that class can remain as opt-in here?

jack-berg · 2023-04-05T13:59:56Z

specification/metrics/semantic_conventions/runtime-environment-metrics.md

+| process.runtime.jvm.cpu.monitor.wait           | Time thread time spend waiting at a monitor                                                     | Seconds  | `s`                                       | Histogram                                         | Int64      |               |                            | JDK 17+      | Required          |
+|                                                |                                                                                                 |          |                                           |                                                   |            | thread        | thread ID                  |              | Opt-In            |
+|                                                |                                                                                                 |          |                                           |                                                   |            | class         | Fully qualified class name |              | Opt-In            |
+| process.runtime.jvm.cpu.monitor.blocked        | Time thread spend blocked at a monitor                                                          | Seconds  | `s`                                       | Histogram                                         | Int64      |               |                            | JDK 17+      | Required          |


process.runtime.jvm.cpu.monitor.wait is actually in the implementation already, just under a different name. I've renamed it here, because I think it could reduce confusion. process.runtime.jvm.cpu.monitor.blocked is not in the current implementation. I have added it here because I feel it could be useful. jdk.JavaMonitorWait and jdk.JavaMonitorEnter produce those metrics. If others agree, I can add them to the implementation.

specification/metrics/semantic_conventions/runtime-environment-metrics.md

linux-foundation-easycla · 2023-04-20T21:26:37Z

The committers listed above are authorized under a signed CLA.

✅ login: roberttoyonaga / name: Robert Toyonaga (7e57099)

trask

can you add note sections and link to the Java APIs that would (typically) be used to collect these? (related to #3418)

specification/metrics/semantic_conventions/runtime-environment-metrics.md

trask · 2023-04-21T04:04:18Z

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

+  - id: metric.process.runtime.jvm.memory.allocation
+    type: metric
+    metric_name: process.runtime.jvm.memory.allocation
+    brief: "Size of object allocated by thread. Only available in JDK 17+."


I think(?) this could be implemented in Java 8 using https://docs.oracle.com/javase/8/docs/jre/api/management/extension/com/sun/management/ThreadMXBean.html#getThreadAllocatedBytes-long:A-

I think that's a little bit different. ThreadMXbean returns the cumulative allocation per thread, while the JFR event ObjectAllocationSample describes a single allocation instance (sampled to reduce overhead. Sampling only happens on the TLAB slow path). But now that I think about it, it might be more useful to know the total allocation per thread rather than have statistical data on allocation sizes per thread. Additionally, the statistical data would be skewed because sampling is only done on the slow path when a new TLAB is required, or allocations won't fit into a TLAB (this is because the events purpose is to show where the allocations are happening, not how big they are).

I think(?) this could be implemented in Java 8 using https://docs.oracle.com/javase/8/docs/jre/api/management/extension/com/sun/management/ThreadMXBean.html#getThreadAllocatedBytes-long:A-

That would be cool.

the JFR event ObjectAllocationSample describes a single allocation instance (sampled to reduce overhead. Sampling only happens on the TLAB slow path).

If we continue to report this in JFR, we'll want to somehow communicate to users that thee allocations are sampled.

this is because the events purpose is to show where the allocations are happening, not how big they are

Presumably for building out a profile?

Presumably for building out a profile?

Yup, you can generate flame graphs from the stack traces and other useful things like that.

If we continue to report this in JFR

I think that we should not report allocations with JFR because the purpose of those events is actually a little different than what we want to use them for. Also, the current implementation (jdk.ObjectAllocationInNewTLAB and jdk.ObjectAllocationOutsideTLAB) would result in too high an overhead for people to use in production. Those events are turned off by default in both monitoring and profiling JFR configurations. This is because they aren't throttled like jdk.ObjectAllocationSample is.

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

roberttoyonaga · 2023-04-21T13:26:45Z

can you add note sections and link to the Java APIs that would (typically) be used to collect these? (related to #3418)

Hi @trask , do you mean in the note section of attributes? I'm having trouble figuring out how to get the note's to show for the metrics themselves.

Additionally, I wasn't sure the best way to denote which metrics are available in JDK 17+ only. Based on https://github.com/open-telemetry/build-tools/blob/v0.17.0/semantic-conventions/syntax.md, maybe note ?(but notes only seem to be generated for attributes?

jack-berg · 2023-04-21T13:45:33Z

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

+    extends: attributes.process.runtime.jvm.cpu.monitor
+    brief: "Time thread was waiting at a monitor. Only available in JDK 17+."
+    instrument: histogram
+    unit: "ms"


Will want to use s unit for all durations

should we add bucket recommendation at the same time?

jack-berg · 2023-04-21T13:46:37Z

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

+    instrument: histogram
+    unit: "ms"
+
+  - id: metric.process.runtime.jvm.cpu.monitor.blocked


Is it ever useful to sum together the time a monitor was blocked and waiting? Trying to think about whether blocked vs waiting makes sense as an attribute rather than a separate metric.

seems similar to process.cpu.time which has attribute

state, if specified, SHOULD be one of: system, user, wait

so maybe process.runtime.jvm.cpu.monitor.time with attribute state?

Yup I think that's a good idea

updated with suggestion applied

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

trask · 2023-04-21T15:16:24Z

Hi @trask , do you mean in the note section of attributes? I'm having trouble figuring out how to get the note's to show for the metrics themselves.

oh, you're right, let's just add a manual "Note" at the end of each metric section in the markdown for now, and I'll open an issue in build-tools about adding "note" to metrics in yaml

…elemetry-specification into runtime-metrics-jfr

trask · 2023-04-28T15:53:54Z

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

+
+  - id: metric.process.runtime.jvm.cpu.context_switch
+    type: metric
+    metric_name: process.runtime.jvm.cpu.context_switch


can you check if there's a difference between this and process.context_switches metric?

Suggested change

metric_name: process.runtime.jvm.cpu.context_switch

metric_name: process.runtime.jvm.context_switches

Hi @trask I checked the Hotspot code and it seems to me like the the JFR source of this metric does not account for virtual threads, only platform threads. However, it does look like process.runtime.jvm.context_switches is a little different because it reports a rate in Hz rather than a count like process.context_switches does.

Also the description for process.context_switches says: "Number of times the process has been context switched." Does this mean it's referring to process context switches rather than thread context switches? The metrics derived from JFR refers to threads specifically.

jack-berg · 2023-04-28T16:59:18Z

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

+    attributes:
+      - ref: thread.id
+        requirement_level: opt_in
+      - id: class


Checkout the code.namespace field as an alternative to defining a new attribute.

jack-berg · 2023-04-28T17:00:50Z

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

+    attributes:
+      - ref: thread.id
+        requirement_level: opt_in
+      - id: mode


Once #3431 lands, should change this to network.direction.

I think ok to change it proactively (that PR could take a while...)

ok I changed it to network.direction

jack-berg · 2023-04-28T17:01:12Z

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

@@ -70,7 +94,7 @@ groups:
    metric_name: process.runtime.jvm.gc.duration
    brief: "Duration of JVM garbage collection actions."
    instrument: histogram
-    unit: "ms"
+    unit: "s"


Covered in #3458.

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

github-actions · 2023-05-06T03:16:53Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

…ication into runtime-metrics-jfr

reyang · 2023-05-09T04:38:20Z

@roberttoyonaga heads up - most likely this PR will be closed, and we'll ask you to resubmit the PR in a new repo, please refer to #3474 (comment).

jsuereth

Can you move this to https://github.com/open-telemetry/semantic-conventions ?

roberttoyonaga · 2023-05-19T18:14:33Z

I've copied this PR over to the new repo here: open-telemetry/semantic-conventions#44 @trask @jack-berg @mateuszrzeszutek

jack-berg · 2023-05-19T19:38:09Z

Thanks @roberttoyonaga. Closing this PR and picking up the convo over there!

roberttoyonaga mentioned this pull request Mar 30, 2023

New JVM runtime environment metrics open-telemetry/semantic-conventions#1222

Open

trask reviewed Apr 2, 2023

View reviewed changes

specification/metrics/semantic_conventions/runtime-environment-metrics.md Outdated Show resolved Hide resolved

specification/metrics/semantic_conventions/runtime-environment-metrics.md Outdated Show resolved Hide resolved

mateuszrzeszutek reviewed Apr 3, 2023

View reviewed changes

specification/metrics/semantic_conventions/runtime-environment-metrics.md Outdated Show resolved Hide resolved

trask mentioned this pull request Apr 4, 2023

Rename runtime-metrics to runtime-telemetry-jmx open-telemetry/opentelemetry-java-instrumentation#8165

Merged

roberttoyonaga marked this pull request as ready for review April 4, 2023 14:08

roberttoyonaga requested review from a team April 4, 2023 14:08

github-actions bot assigned yurishkuro Apr 4, 2023

jack-berg reviewed Apr 5, 2023

View reviewed changes

specification/metrics/semantic_conventions/runtime-environment-metrics.md Outdated Show resolved Hide resolved

reyang reviewed Apr 5, 2023

View reviewed changes

specification/metrics/semantic_conventions/runtime-environment-metrics.md Outdated Show resolved Hide resolved

jack-berg mentioned this pull request Apr 5, 2023

Mark "Instrumentation Units" and "Instrumentation Types" sections of the general metric semantic conventions as stable #3294

Merged

mateuszrzeszutek reviewed Apr 7, 2023

View reviewed changes

specification/metrics/semantic_conventions/runtime-environment-metrics.md Outdated Show resolved Hide resolved

roberttoyonaga requested review from a team April 20, 2023 21:26

add to semconv

7e57099

roberttoyonaga force-pushed the runtime-metrics-jfr branch from 82b88b4 to 7e57099 Compare April 20, 2023 21:31

trask reviewed Apr 21, 2023

View reviewed changes

roberttoyonaga and others added 2 commits April 21, 2023 09:06

Update semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

fc59fb3

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

Update semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

d33ceaf

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

jack-berg reviewed Apr 21, 2023

View reviewed changes

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml Outdated Show resolved Hide resolved

jack-berg reviewed Apr 21, 2023

View reviewed changes

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml Outdated Show resolved Hide resolved

trask mentioned this pull request Apr 21, 2023

Add note for metrics yaml definitions open-telemetry/build-tools#167

Closed

roberttoyonaga added 2 commits April 21, 2023 11:29

resolve comments

bed6a9a

Merge branch 'runtime-metrics-jfr' of github.com:roberttoyonaga/opent…

31aa02b

…elemetry-specification into runtime-metrics-jfr

roberttoyonaga mentioned this pull request Apr 27, 2023

Decide which JVM metrics should be included in initial stability #3419

Closed

trask reviewed Apr 28, 2023

View reviewed changes

jack-berg reviewed Apr 28, 2023

View reviewed changes

Update semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

40219ea

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

github-actions bot added the Stale label May 6, 2023

trask removed the Stale label May 6, 2023

roberttoyonaga added 3 commits May 8, 2023 12:09

Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…

71b5696

…ication into runtime-metrics-jfr

network.direction

694f538

update to process.runtime.jvm.cpu.monitor.time

126578b

reyang changed the title ~~Add new runtime environment metrics~~ Add new JVM runtime environment metrics May 8, 2023

reyang added the area:semantic-conventions Related to semantic conventions label May 9, 2023

jsuereth requested changes May 12, 2023

View reviewed changes

roberttoyonaga mentioned this pull request May 19, 2023

Add new JVM runtime environment metrics open-telemetry/semantic-conventions#44

Closed

jack-berg closed this May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new JVM runtime environment metrics #3352

Add new JVM runtime environment metrics #3352

roberttoyonaga commented Mar 30, 2023 •

edited

Loading

jack-berg Apr 5, 2023

roberttoyonaga Apr 5, 2023 •

edited

Loading

roberttoyonaga Apr 5, 2023

jack-berg Apr 5, 2023

roberttoyonaga Apr 5, 2023 •

edited

Loading

linux-foundation-easycla bot commented Apr 20, 2023 •

edited

Loading

trask left a comment

trask Apr 21, 2023

roberttoyonaga Apr 21, 2023

jack-berg Apr 21, 2023

roberttoyonaga Apr 21, 2023

roberttoyonaga commented Apr 21, 2023

jack-berg Apr 21, 2023

trask Apr 21, 2023

jack-berg Apr 21, 2023

trask Apr 21, 2023

roberttoyonaga Apr 27, 2023

roberttoyonaga May 8, 2023

trask commented Apr 21, 2023

trask Apr 28, 2023 •

edited

Loading

roberttoyonaga Apr 28, 2023 •

edited

Loading

roberttoyonaga Apr 28, 2023 •

edited

Loading

jack-berg Apr 28, 2023

jack-berg Apr 28, 2023

trask Apr 28, 2023

roberttoyonaga May 8, 2023

jack-berg Apr 28, 2023

github-actions bot commented May 6, 2023

reyang commented May 9, 2023

jsuereth left a comment

roberttoyonaga commented May 19, 2023

jack-berg commented May 19, 2023

	\| process.runtime.jvm.cpu.monitor.blocked \| Time thread spend blocked at a monitor \| Seconds \| `s` \| Histogram \| Int64 \| \| \| JDK 17+ \| Required \|
	\| process.runtime.jvm.cpu.monitor.blocked \| Time thread was blocked at a monitor \| Seconds \| `s` \| Histogram \| Int64 \| \| \| JDK 17+ \| Required \|

	metric_name: process.runtime.jvm.cpu.context_switch
	metric_name: process.runtime.jvm.context_switches

Add new JVM runtime environment metrics #3352

Add new JVM runtime environment metrics #3352

Conversation

roberttoyonaga commented Mar 30, 2023 • edited Loading

Changes

Choose a reason for hiding this comment

roberttoyonaga Apr 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roberttoyonaga Apr 5, 2023 • edited Loading

Choose a reason for hiding this comment

linux-foundation-easycla bot commented Apr 20, 2023 • edited Loading

trask left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roberttoyonaga commented Apr 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trask commented Apr 21, 2023

trask Apr 28, 2023 • edited Loading

Choose a reason for hiding this comment

roberttoyonaga Apr 28, 2023 • edited Loading

Choose a reason for hiding this comment

roberttoyonaga Apr 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented May 6, 2023

reyang commented May 9, 2023

jsuereth left a comment

Choose a reason for hiding this comment

roberttoyonaga commented May 19, 2023

jack-berg commented May 19, 2023

roberttoyonaga commented Mar 30, 2023 •

edited

Loading

roberttoyonaga Apr 5, 2023 •

edited

Loading

roberttoyonaga Apr 5, 2023 •

edited

Loading

linux-foundation-easycla bot commented Apr 20, 2023 •

edited

Loading

trask Apr 28, 2023 •

edited

Loading

roberttoyonaga Apr 28, 2023 •

edited

Loading

roberttoyonaga Apr 28, 2023 •

edited

Loading