-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kafka: Add fetch plan and execute latency metric #13485
Conversation
54272f6
to
6f0d2a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Are we planning on removing or renaming the existing kafka_latency_fetch_latency_us
metric which records per shards latency for each poll?
That is my longterm goal yes. |
6f0d2a1
to
37de022
Compare
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/40383#018b9544-60c5-4e05-84b0-79caa19405da |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/40383#018b9554-c2f8-4a44-b0ac-850f74666cdb |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/40383#018b9554-c2f2-4968-a106-095df4014e20 |
37de022
to
97a0e9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and thanks for the example numbers in the patch description: very useful!
Adds a histogram metric to measure the time it takes to create the fetch plan and execute it - aka a single fetch poll. It's an approximation for the time it takes to process the data in a fetch request once it is available. I have separated two series one which is tracking empty fetches and one that isn't. Further the count of the histogram can be used to calculate the ratio of fetch requests to polls like so: ``` sum(irate(vectorized_kafka_handler_requests_completed_total{..., handler="fetch"}[$__rate_interval])) by ($aggr_criteria) / sum(irate(vectorized_fetch_stats_plan_and_execute_latency_us_count{...}[$__rate_interval])) by ($aggr_criteria) ``` Looking at some scenarios we get the following values: - 500MB/s, 4P/4C, 288P, ~110k batch, 1ms debounce: ~0.37 - 500MB/s, 4P/4C, 288P, ~110k batch, 10ms debounce: ~0.66 - 125MB/s, 8kP/8kC, 40k partitions, 1ms debounce: ~0.012 - 125MB/s, 8kP/8kC, 40k partitions, 10ms debounce: ~0.035 - 125MB/s, 8kP/8kC, 40k partitions, 100ms debounce: ~0.24
97a0e9d
to
220b11e
Compare
Failure is: #14254 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as always, amazing commit messages @StephanDollberg
/backport v23.2.x |
/backport v23.1.x |
Failed to create a backport PR to v23.1.x branch. I tried:
|
Failed to create a backport PR to v23.2.x branch. I tried:
|
Adds a histogram metric to measure the time it takes to create the fetch
plan and execute it - aka a single fetch poll.
It's an approximation for the time it takes to process the data in a
fetch request once it is available.
I have separated two series one which is tracking empty fetches and one
that isn't.
Further the count of the histogram can be used to calculate the ratio of
fetch requests to polls like so:
Looking at some scenarios we get the following values:
Backports Required
Release Notes
Improvements