kafka: Add fetch plan and execute latency metric #13485

StephanDollberg · 2023-09-17T13:31:27Z

Adds a histogram metric to measure the time it takes to create the fetch
plan and execute it - aka a single fetch poll.

It's an approximation for the time it takes to process the data in a
fetch request once it is available.

I have separated two series one which is tracking empty fetches and one
that isn't.

Further the count of the histogram can be used to calculate the ratio of
fetch requests to polls like so:

sum(irate(vectorized_kafka_handler_requests_completed_total{...,
handler="fetch"}[$__rate_interval])) by ($aggr_criteria) /
sum(irate(vectorized_fetch_stats_plan_and_execute_latency_us_count{...}[$__rate_interval])) by
($aggr_criteria)

Looking at some scenarios we get the following values:

500MB/s, 4P/4C, 288P, ~110k batch, 1ms debounce: ~0.37
500MB/s, 4P/4C, 288P, ~110k batch, 10ms debounce: ~0.66
125MB/s, 8kP/8kC, 40k partitions, 1ms debounce: ~0.012
125MB/s, 8kP/8kC, 40k partitions, 10ms debounce: ~0.035
125MB/s, 8kP/8kC, 40k partitions, 100ms debounce: ~0.24

Backports Required

Release Notes

Improvements

Adds a metric to track fetch plan and execute latency

ballard26

LGTM. Are we planning on removing or renaming the existing kafka_latency_fetch_latency_us metric which records per shards latency for each poll?

StephanDollberg · 2023-11-02T09:24:45Z

LGTM. Are we planning on removing or renaming the existing kafka_latency_fetch_latency_us metric which records per shards latency for each poll?

That is my longterm goal yes.

vbotbuildovich · 2023-11-03T14:15:21Z

ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/40383#018b9544-60c5-4e05-84b0-79caa19405da

vbotbuildovich · 2023-11-03T14:22:48Z

ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/40383#018b9554-c2f8-4a44-b0ac-850f74666cdb

vbotbuildovich · 2023-11-03T14:34:23Z

ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/40383#018b9554-c2f2-4968-a106-095df4014e20

travisdowns

LGTM and thanks for the example numbers in the patch description: very useful!

Adds a histogram metric to measure the time it takes to create the fetch plan and execute it - aka a single fetch poll. It's an approximation for the time it takes to process the data in a fetch request once it is available. I have separated two series one which is tracking empty fetches and one that isn't. Further the count of the histogram can be used to calculate the ratio of fetch requests to polls like so: ``` sum(irate(vectorized_kafka_handler_requests_completed_total{..., handler="fetch"}[$__rate_interval])) by ($aggr_criteria) / sum(irate(vectorized_fetch_stats_plan_and_execute_latency_us_count{...}[$__rate_interval])) by ($aggr_criteria) ``` Looking at some scenarios we get the following values: - 500MB/s, 4P/4C, 288P, ~110k batch, 1ms debounce: ~0.37 - 500MB/s, 4P/4C, 288P, ~110k batch, 10ms debounce: ~0.66 - 125MB/s, 8kP/8kC, 40k partitions, 1ms debounce: ~0.012 - 125MB/s, 8kP/8kC, 40k partitions, 10ms debounce: ~0.035 - 125MB/s, 8kP/8kC, 40k partitions, 100ms debounce: ~0.24

StephanDollberg · 2023-11-07T16:49:13Z

Failure is: #14254

dotnwat

as always, amazing commit messages @StephanDollberg

vbotbuildovich · 2023-11-08T09:36:24Z

/backport v23.2.x

vbotbuildovich · 2023-11-08T09:36:25Z

/backport v23.1.x

vbotbuildovich · 2023-11-08T09:37:18Z

Failed to create a backport PR to v23.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-13485-v23.1.x-543 remotes/upstream/v23.1.x
git cherry-pick -x 220b11ed449bc553cd4c69b830b976f8d01db646

Workflow run logs.

vbotbuildovich · 2023-11-08T09:37:25Z

Failed to create a backport PR to v23.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-13485-v23.2.x-8 remotes/upstream/v23.2.x
git cherry-pick -x 220b11ed449bc553cd4c69b830b976f8d01db646

Workflow run logs.

StephanDollberg requested review from travisdowns and ballard26 September 17, 2023 13:31

github-actions bot added the area/redpanda label Sep 17, 2023

StephanDollberg mentioned this pull request Sep 17, 2023

kafka: Add poll count metric #13341

Closed

7 tasks

StephanDollberg force-pushed the stephan/fetch-plan-and-execute-latency branch from 54272f6 to 6f0d2a1 Compare September 27, 2023 09:03

ballard26 previously approved these changes Nov 2, 2023

View reviewed changes

StephanDollberg dismissed ballard26’s stale review via 37de022 November 3, 2023 11:48

StephanDollberg force-pushed the stephan/fetch-plan-and-execute-latency branch from 6f0d2a1 to 37de022 Compare November 3, 2023 11:48

StephanDollberg force-pushed the stephan/fetch-plan-and-execute-latency branch from 37de022 to 97a0e9d Compare November 6, 2023 09:30

travisdowns approved these changes Nov 6, 2023

View reviewed changes

StephanDollberg force-pushed the stephan/fetch-plan-and-execute-latency branch from 97a0e9d to 220b11e Compare November 7, 2023 09:34

ballard26 approved these changes Nov 7, 2023

View reviewed changes

dotnwat reviewed Nov 7, 2023

View reviewed changes

piyushredpanda merged commit 3f361a0 into dev Nov 8, 2023
30 of 32 checks passed

piyushredpanda deleted the stephan/fetch-plan-and-execute-latency branch November 8, 2023 09:36

vbotbuildovich mentioned this pull request Nov 8, 2023

[v23.1.x] kafka: Add fetch plan and execute latency metric #14825

Closed

vbotbuildovich mentioned this pull request Nov 8, 2023

[v23.2.x] kafka: Add fetch plan and execute latency metric #14826

Closed

StephanDollberg mentioned this pull request Nov 24, 2023

[v23.2.x] kafka: Add fetch plan and execute latency metric #15129

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kafka: Add fetch plan and execute latency metric #13485

kafka: Add fetch plan and execute latency metric #13485

StephanDollberg commented Sep 17, 2023

ballard26 left a comment

StephanDollberg commented Nov 2, 2023

vbotbuildovich commented Nov 3, 2023

vbotbuildovich commented Nov 3, 2023

vbotbuildovich commented Nov 3, 2023

travisdowns left a comment

StephanDollberg commented Nov 7, 2023

dotnwat left a comment

vbotbuildovich commented Nov 8, 2023

vbotbuildovich commented Nov 8, 2023

vbotbuildovich commented Nov 8, 2023

vbotbuildovich commented Nov 8, 2023

kafka: Add fetch plan and execute latency metric #13485

kafka: Add fetch plan and execute latency metric #13485

Conversation

StephanDollberg commented Sep 17, 2023

Backports Required

Release Notes

Improvements

ballard26 left a comment

Choose a reason for hiding this comment

StephanDollberg commented Nov 2, 2023

vbotbuildovich commented Nov 3, 2023

vbotbuildovich commented Nov 3, 2023

vbotbuildovich commented Nov 3, 2023

travisdowns left a comment

Choose a reason for hiding this comment

StephanDollberg commented Nov 7, 2023

dotnwat left a comment

Choose a reason for hiding this comment

vbotbuildovich commented Nov 8, 2023

vbotbuildovich commented Nov 8, 2023

vbotbuildovich commented Nov 8, 2023

vbotbuildovich commented Nov 8, 2023