Initial addition of Kafka metrics. #2485

carlosalberto · 2022-04-11T16:13:10Z

Initial addition of metrics Kafka metrics as reported by the JMX gatherer (leaving out the Consumer specific ones for now).

The present PR has what the JMX Gatherer has today, but I'd like to get feedback regarding some potential changes:

Some of these are pre-cooked metrics (directly from Kafka), e.g. counts in nanoseconds, some of them in milliseconds instead, and more importantly 50th/99th percentiles. Do we 'massage' them?
All the gauges reported here should maybe be converted from Gauge to UpDownCounter.
Name prefix for broker-specific metrics - mostly, some of the general Kafka metrics could have a broker component IMO, e.g. kafka.message.count maybe should become kafka.broker.message.count?
A few of the metrics use a count sufix. The metric guidelines seem to suggest use of plurals instead? e.g. rename kafka.messages.count to kafka.messages (or kafka.total-messages).
Is there any preference between usage of - and _ in names?

cc @jmacd

specification/metrics/semantic_conventions/instrumentation/kafka.md

pyohannes · 2022-04-11T18:07:53Z

It would be worth collaborating with Kafka folks on this. There are ideas on Kafka side to natively expose OTel Metrics. I'm not sure about the state of this effort, but there's a recent proposal here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability#KIP714:Clientmetricsandobservability-Metricsnamingandformat

The metric names proposed in the above document seem to follow a different convention, neither _ nor - are used.

carlosalberto · 2022-04-11T21:57:12Z

@pyohannes That looks interesting, thanks for posting that!

I think the plan is to keep the current JMX metrics around, but we should unify the used metric conventions first (then, when Kafka goes OTLP we can stop relying on JMX, etc). Will drop them a note ;)

specification/metrics/semantic_conventions/instrumentation/kafka.md

…ka.md Co-authored-by: Reiley Yang <reyang@microsoft.com>

jmacd

Thanks @carlosalberto for the initial draft. I've put a lot of potentially debatable suggestions here to help the community make decisions. Generally I would like us to specify the instrument with the best semantic fit, even though the standard for Kafka is to reduce the cost of the instrument. An example is request timing: the tradition is to use a Summary with individual quantile series, we'd prefer a Histogram, but really either is semantically meaningful for the same convention.

specification/metrics/semantic_conventions/instrumentation/kafka.md

jmacd · 2022-04-22T16:06:53Z

I understand now, the reason this PR and these conventions are problematic is that we have metrics data that is semantically produced from a Histogram or Summary instrument, but it has been precomputed by an external metrics system and is made available as individual Sum, Count, and quantile timeseries. As an OpenTelemetry API user, we have no way to capture a precomputed summary into OTLP as a single stream via the API, so there is a temptation to bypass the semantic definition and perpetuate the use of pre-calculated metrics. To me, it is not a good outcome if we specify individual precomputed timeseries in situations where semantically a Histogram or Summary instrument would be used, unless we also specify semantic-naming conventions for translating between OTLP Summary data points and individual timeseries.

I'm imagining optional, dedicated OTLP Exporter support for recognizing that a set of metrics are named like Summary timeseries (e.g., X_sum, X_total, X_p50, X_p90 and so on) that the Exporter SHOULD recombine those series into individual Summary data points. That is, if we have both a set of naming conventions for precomputed Summary timeseries as well as a directive for Exporters to rewrite them as Summary points on the wire, then I think we have satisfied our mission; these semantic conventions will refer to Histogram semantics, receivers will pass through precomputed timeseries, and OTLP exporters will correct the problem.

carlosalberto · 2022-05-04T21:54:37Z

@jmacd Updated the PR.

Summary is that the metrics that can be reported as Summary/Histogram will be added in a follow up PR, so I will be removing these for now.

carlosalberto · 2022-05-04T22:18:34Z

An additional note: all the counter metrics seem to be rates (e.g. kafka.requests.failed comes from FailedProduceRequestsPerSec, and kafka.request.count comes from TotalProduceRequestsPerSec).

Existing instrumentation (e.g. DataDog) seems to use these values as rate, rather than counter; and this IBM documentation hints that the granularity is 1 second: https://www.ibm.com/docs/en/obi/current?topic=technologies-monitoring-kafka

jmacd

On the topic of precomputed rate metrics, see #2485 (comment).

Suggest using -ratio for compression ratio, which is different than all the other rates in this PR. I've suggested changing some of the UpDownCounters to Gauge when the units are unitless (as -ratio) or {things}/s (as rates derived from counters).

You see there are three .rate suggestions here. I would support a generic semantics convention stating that a metric named X.rate is the rate derived from a Counter or UpDownCounter named X; this would allow us to specify the conventions as Counters even though current instrumentation reports the rate.

specification/metrics/semantic_conventions/instrumentation/kafka.md

carlosalberto · 2022-05-06T13:59:01Z

@jmacd @reyang Please review ;)

specification/metrics/semantic_conventions/instrumentation/kafka.md

…ka.md Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>

specification/metrics/semantic_conventions/instrumentation/kafka.md

…ka.md Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>

specification/metrics/semantic_conventions/instrumentation/kafka.md

reyang

LGTM.

Initial addition of Kafka metrics.

031445e

carlosalberto requested review from a team April 11, 2022 16:13

github-actions bot assigned jmacd Apr 11, 2022

carlosalberto commented Apr 11, 2022

View reviewed changes

specification/metrics/semantic_conventions/instrumentation/kafka.md Outdated Show resolved Hide resolved

Merge branch 'main' into kafka-metrics-addition

0dee47b

reyang reviewed Apr 12, 2022

View reviewed changes

specification/metrics/semantic_conventions/instrumentation/kafka.md Outdated Show resolved Hide resolved

Update specification/metrics/semantic_conventions/instrumentation/kaf…

51683b4

…ka.md Co-authored-by: Reiley Yang <reyang@microsoft.com>

carlosalberto mentioned this pull request Apr 12, 2022

Update JMX reported metrics open-telemetry/opentelemetry-java-contrib#291

Closed

reyang added area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory labels Apr 13, 2022

jmacd reviewed Apr 18, 2022

View reviewed changes

First pass of feedback.

7ed346d

jmacd mentioned this pull request Apr 25, 2022

Semantic conventions for Summary metrics #2511

Closed

Merge branch 'main' into kafka-metrics-addition

0729f57

Merge branch 'main' into kafka-metrics-addition

1d768ea

carlosalberto added 2 commits May 5, 2022 00:20

Remove metrics that may need Summary/Histogram points.

792222b

Merge branch 'main' into kafka-metrics-addition

26bfbc9

jmacd reviewed May 5, 2022

View reviewed changes

carlosalberto added 2 commits May 6, 2022 04:01

Apply jmacd's suggestions.

2b8ba12

A little naming tuning.

e8128b2

carlosalberto mentioned this pull request May 6, 2022

Initial update of JMX's Kafka metrics. open-telemetry/opentelemetry-java-contrib#326

Closed

carlosalberto added 2 commits May 6, 2022 15:23

Lint.

f6751c9

Merge branch 'main' into kafka-metrics-addition

d9600ce

jmacd reviewed May 6, 2022

View reviewed changes

specification/metrics/semantic_conventions/instrumentation/kafka.md Outdated Show resolved Hide resolved

carlosalberto and others added 2 commits May 6, 2022 11:28

Update specification/metrics/semantic_conventions/instrumentation/kaf…

6277132

…ka.md Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>

More tuning.

ad547a2

jmacd reviewed May 6, 2022

View reviewed changes

specification/metrics/semantic_conventions/instrumentation/kafka.md Outdated Show resolved Hide resolved

Improve names a little bit more.

ed2a097

jmacd reviewed May 6, 2022

View reviewed changes

specification/metrics/semantic_conventions/instrumentation/kafka.md Outdated Show resolved Hide resolved

carlosalberto and others added 2 commits May 6, 2022 17:35

Update specification/metrics/semantic_conventions/instrumentation/kaf…

5c81afe

…ka.md Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>

Align table.

b4c817c

jmacd approved these changes May 6, 2022

View reviewed changes

reyang reviewed May 9, 2022

View reviewed changes

specification/metrics/semantic_conventions/instrumentation/kafka.md Show resolved Hide resolved

reyang approved these changes May 9, 2022

View reviewed changes

reyang mentioned this pull request May 9, 2022

Add redis specification #2525

Closed

Merge branch 'main' into kafka-metrics-addition

c9997c9

reyang merged commit e6b292a into open-telemetry:main May 9, 2022

This was referenced May 9, 2022

Add prefix to Kafka metrics. #2528

Merged

Add remaining Kafka and Kafka consumer metrics. #2536

Merged

pyohannes mentioned this pull request Mar 13, 2023

Fix units in the Kafka metric semantic conventions #3300

Merged

carlosalberto added a commit to carlosalberto/opentelemetry-specification that referenced this pull request Oct 31, 2024

Initial addition of Kafka metrics. (open-telemetry#2485)

2789d7c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial addition of Kafka metrics. #2485

Initial addition of Kafka metrics. #2485

carlosalberto commented Apr 11, 2022

pyohannes commented Apr 11, 2022

carlosalberto commented Apr 11, 2022

jmacd left a comment

jmacd commented Apr 22, 2022

carlosalberto commented May 4, 2022

carlosalberto commented May 4, 2022

jmacd left a comment

carlosalberto commented May 6, 2022

reyang left a comment

Initial addition of Kafka metrics. #2485

Initial addition of Kafka metrics. #2485

Conversation

carlosalberto commented Apr 11, 2022

pyohannes commented Apr 11, 2022

carlosalberto commented Apr 11, 2022

jmacd left a comment

Choose a reason for hiding this comment

jmacd commented Apr 22, 2022

carlosalberto commented May 4, 2022

carlosalberto commented May 4, 2022

jmacd left a comment

Choose a reason for hiding this comment

carlosalberto commented May 6, 2022

reyang left a comment

Choose a reason for hiding this comment