-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial addition of Kafka metrics. #2485
Initial addition of Kafka metrics. #2485
Conversation
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
It would be worth collaborating with Kafka folks on this. There are ideas on Kafka side to natively expose OTel Metrics. I'm not sure about the state of this effort, but there's a recent proposal here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability#KIP714:Clientmetricsandobservability-Metricsnamingandformat The metric names proposed in the above document seem to follow a different convention, neither |
@pyohannes That looks interesting, thanks for posting that! I think the plan is to keep the current JMX metrics around, but we should unify the used metric conventions first (then, when Kafka goes OTLP we can stop relying on JMX, etc). Will drop them a note ;) |
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
…ka.md Co-authored-by: Reiley Yang <reyang@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @carlosalberto for the initial draft. I've put a lot of potentially debatable suggestions here to help the community make decisions. Generally I would like us to specify the instrument with the best semantic fit, even though the standard for Kafka is to reduce the cost of the instrument. An example is request timing: the tradition is to use a Summary with individual quantile series, we'd prefer a Histogram, but really either is semantically meaningful for the same convention.
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
I understand now, the reason this PR and these conventions are problematic is that we have metrics data that is semantically produced from a Histogram or Summary instrument, but it has been precomputed by an external metrics system and is made available as individual Sum, Count, and quantile timeseries. As an OpenTelemetry API user, we have no way to capture a precomputed summary into OTLP as a single stream via the API, so there is a temptation to bypass the semantic definition and perpetuate the use of pre-calculated metrics. To me, it is not a good outcome if we specify individual precomputed timeseries in situations where semantically a Histogram or Summary instrument would be used, unless we also specify semantic-naming conventions for translating between OTLP Summary data points and individual timeseries. I'm imagining optional, dedicated OTLP Exporter support for recognizing that a set of metrics are named like Summary timeseries (e.g., |
@jmacd Updated the PR. Summary is that the metrics that can be reported as Summary/Histogram will be added in a follow up PR, so I will be removing these for now. |
An additional note: all the counter metrics seem to be rates (e.g. Existing instrumentation (e.g. DataDog) seems to use these values as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the topic of precomputed rate metrics, see #2485 (comment).
Suggest using -ratio
for compression ratio, which is different than all the other rates in this PR. I've suggested changing some of the UpDownCounters to Gauge when the units are unitless (as -ratio) or {things}/s (as rates derived from counters).
You see there are three .rate
suggestions here. I would support a generic semantics convention stating that a metric named X.rate
is the rate derived from a Counter or UpDownCounter named X
; this would allow us to specify the conventions as Counters even though current instrumentation reports the rate.
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
…ka.md Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
specification/metrics/semantic_conventions/instrumentation/kafka.md
Outdated
Show resolved
Hide resolved
…ka.md Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Initial addition of metrics Kafka metrics as reported by the JMX gatherer (leaving out the Consumer specific ones for now).
The present PR has what the JMX Gatherer has today, but I'd like to get feedback regarding some potential changes:
nanoseconds
, some of them inmilliseconds
instead, and more importantly 50th/99th percentiles. Do we 'massage' them?broker
component IMO, e.g.kafka.message.count
maybe should becomekafka.broker.message.count
?count
sufix. The metric guidelines seem to suggest use of plurals instead? e.g. renamekafka.messages.count
tokafka.messages
(orkafka.total-messages
).-
and_
in names?cc @jmacd