[CORE-3081] kafka: add `read_distribution` histogram #18745

WillemKauf · 2024-05-31T18:13:32Z

Adds a read_distribution histogram and probe to the kafka::server. This is a new internal metric which tracks the timestamp delta between data read by Kafka fetches in fetch_ntps_in_parallel() and the current now() timestamp. This metric is aggregated across shards.

A new histogram type log_hist_read_dist is added. The timestamp delta is measured in minutes using 16 buckets in the histogram, such that the bounds span from up to 3 minutes in the first bucket, to older than 91 days in the last bucket.

There are no updates made to the Redpanda dashboard yet, so it is up to the user to configure a panel with this histogram if they are interested in seeing the statistics.

For Grafana visualization, I recommend:

Adding a new panel with vectorized_kafka_fetch_read_distribution_bucket as the selected metric.
In the "Options" drop-down, selecting {{le}} as the legend, with Heatmap as the format.
Setting the visualization type to Bar gauge.
Transforming the field names from minutes to something more human readable as the buckets increase in size.

The result should look like:

My configured panel JSON is:

{
  "id": 23763572047,
  "gridPos": {
    "x": 0,
    "y": 0,
    "w": 12,
    "h": 8
  },
  "type": "bargauge",
  "title": "Read Distribution",
  "targets": [
    {
      "datasource": {
        "uid": "fXEekgsSz",
        "type": "prometheus"
      },
      "refId": "A",
      "hide": false,
      "editorMode": "code",
      "expr": "vectorized_kafka_fetch_read_distribution_bucket",
      "legendFormat": "{{le}}",
      "range": true,
      "format": "heatmap"
    }
  ],
  "options": {
    "reduceOptions": {
      "values": false,
      "calcs": [
        "lastNotNull"
      ],
      "fields": ""
    },
    "orientation": "auto",
    "displayMode": "gradient",
    "showUnfilled": true,
    "minVizWidth": 0,
    "minVizHeight": 10
  },
  "fieldConfig": {
    "defaults": {
      "mappings": [],
      "thresholds": {
        "mode": "absolute",
        "steps": [
          {
            "value": null,
            "color": "green"
          },
          {
            "value": 80,
            "color": "red"
          }
        ]
      },
      "color": {
        "mode": "thresholds"
      }
    },
    "overrides": []
  },
  "datasource": {
    "uid": "fXEekgsSz",
    "type": "prometheus"
  },
  "pluginVersion": "9.2.10",
  "transformations": [
    {
      "id": "organize",
      "options": {
        "excludeByName": {
          "Time": false
        },
        "indexByName": {},
        "renameByName": {
          "3.000000": "3m",
          "7.000000": "7m",
          "15.000000": "15m",
          "31.000000": "31m",
          "63.000000": "1h",
          "127.000000": "2h",
          "255.000000": "4h",
          "511.000000": "8.5h",
          "1023.000000": "17h",
          "2047.000000": "34h",
          "4095.000000": "68h",
          "8191.000000": "6d",
          "16383.000000": "11d",
          "32767.000000": "22d",
          "65535.000000": "45d",
          "131071.000000": "91d",
          "+Inf": ">91d"
        }
      }
    }
  ]
}

Backports Required

Release Notes

Features

Adds a read distribution histogram to the internal metrics that can be visualized using Prometheus and Grafana:
- vectorized_kafka_fetch_read_distribution_bucket
- vectorized_kafka_fetch_read_distribution_count
- vectorized_kafka_fetch_read_distribution_sum

andrwng

Structurally looks pretty good, just a couple high level questions about the max timestamp.

Also it'd be great to include some tests (I think fixture tests might be able to directly access the partition probe? otherwise a single-node test in ducktape is fine)

src/v/kafka/server/handlers/fetch.cc

src/v/cluster/partition_probe.h

src/v/kafka/server/handlers/fetch.cc

src/v/utils/log_hist.h

andrwng · 2024-06-03T21:56:39Z

src/v/utils/log_hist.h

@@ -199,6 +199,8 @@ class log_hist {
     */
    seastar::metrics::histogram internal_histogram_logform() const;

+    seastar::metrics::histogram read_dist_histogram_logform() const;


I suspect the public and internal histograms were introduced to be generic off-the-shelf reusables that metric authors could use without worrying about things like metric cardinality blowing grafana up. I'm wondering if they're reusable enough for our purposes here, or whether this read_dist_histogram should be made generic enough to guide others who may introduce metrics of data timestamp deltas

@ballard26 any thoughts here?

log_hist is designed to be an approximation of a histogram. Take the type;

using log_hist_internal = log_hist<std::chrono::microseconds, 26, 8ul>;

there will be 26 buckets each of which is an unsigned integer. Each bucket will end up as another metric series. So 26 in this case or 18 in the public histogram case. for log_hist_internal the buckets are laid out like;

[0, 8us), [8us, 16us), [16us, 32us), [32us, 64us)....[~1min, +inf)

when a duration is record the bucket whose range includes the duration is found. Then the int for that bucket is incremented.

From this we can see that the two existing resuables are great for recording latency since there is a lot of granularity around the 0 to 1s range. However, for this metric it seems we care a bit more about broader ranges like minutes or hours. So the existing resuables won't work as all those values will be recorded to the last bucket and the resulting histogram won't be very useful.

So to that ends I think what @WillemKauf has done here is fine. It gives up the granularity we have in the latency histograms around 0-1s in exchange for having more buckets over a longer range of time.

We probably want to make the comments on the resuable types a bit more instructive. I.e, this is the histogram you want to use if you only care about accurately representing sub-second durations, minute/hour durations, day durations... etc.

TDLR; the resuable types were only designed for sub-second latency durations. It makes sense to me to add more that more accurately represent other ranges of times folks may be interested in.

src/v/kafka/server/replicated_partition.cc

WillemKauf · 2024-06-04T21:01:27Z

Structurally looks pretty good, just a couple high level questions about the max timestamp.

Also it'd be great to include some tests (I think fixture tests might be able to directly access the partition probe? otherwise a single-node test in ducktape is fine)

Added ducktape test test_read_distribution_metric, removed max_timestamp() related functions, and we now compare the Kafka fetch timestamp against model::timestamp::now() for a more real-time age of data reads.

tests/rptest/tests/read_distribution_test.py

WillemKauf · 2024-06-05T12:41:10Z

/ci-repeat

StephanDollberg

I am wondering whether we somehow want to weigh this by bytes?

Like this we don't differentiate between a 100 byte read and a 100KB read.

Maybe this makes no difference because both kinda have the same overhead when having to reach out to cloud and I guess the main thing why we are interested in this is for local retention reasons.

src/v/cluster/partition_probe.cc

For ease of accessing `is_duration_v` from elsewhere in the codebase.

WillemKauf · 2024-06-06T21:09:59Z

Thanks for the extensive feedback @StephanDollberg.

Reworked the probe into the kafka::server object
The metric is no longer associated with any NTP label or aggregation, but is still aggregated across all shards (by default).

src/v/kafka/read_distribution_probe.h

Add a new probe that will be used in the `kafka::server`. Contains the `log_hist_read_dist` object for recording the read distribution of data fetched, based on the timestamp delta between the data read and the current `now()` timestamp. Histogram metrics is aggregated across shards when cluster config `aggregate_metrics` is `true`.

github-actions bot added area/redpanda area/wasm WASM Data Transforms labels May 31, 2024

WillemKauf force-pushed the read_histogram branch from 260b1ea to 9604f7a Compare May 31, 2024 18:15

WillemKauf requested review from ballard26 and andrwng May 31, 2024 20:30

WillemKauf force-pushed the read_histogram branch from 9604f7a to 70adf04 Compare June 3, 2024 17:47

WillemKauf marked this pull request as ready for review June 3, 2024 17:51

WillemKauf changed the title ~~metrics: add read_distribution histogram~~ [CORE-3081] metrics: add read_distribution histogram Jun 3, 2024

andrwng reviewed Jun 3, 2024

View reviewed changes

WillemKauf force-pushed the read_histogram branch from 70adf04 to bb8aa31 Compare June 4, 2024 20:54

github-actions bot removed the area/wasm WASM Data Transforms label Jun 4, 2024

WillemKauf force-pushed the read_histogram branch from bb8aa31 to 127922b Compare June 4, 2024 21:07

andrwng reviewed Jun 4, 2024

View reviewed changes

tests/rptest/tests/read_distribution_test.py Outdated Show resolved Hide resolved

tests/rptest/tests/read_distribution_test.py Show resolved Hide resolved

tests/rptest/tests/read_distribution_test.py Outdated Show resolved Hide resolved

WillemKauf force-pushed the read_histogram branch from 127922b to da437b1 Compare June 5, 2024 01:16

WillemKauf force-pushed the read_histogram branch 3 times, most recently from b91584f to e38be80 Compare June 5, 2024 17:41

StephanDollberg reviewed Jun 6, 2024

View reviewed changes

src/v/cluster/partition_probe.cc Outdated Show resolved Hide resolved

WillemKauf force-pushed the read_histogram branch from e38be80 to a7688c4 Compare June 6, 2024 13:39

WillemKauf added 3 commits June 6, 2024 15:27

reflection: move std::chrono type_traits from pandaproxy

af432b8

For ease of accessing `is_duration_v` from elsewhere in the codebase.

utils: add record() overload for duration types

0198d85

utils: add log_hist_read_dist

514e7b8

WillemKauf force-pushed the read_histogram branch from a7688c4 to 582e518 Compare June 6, 2024 21:06

WillemKauf requested review from StephanDollberg and andrwng June 6, 2024 21:06

WillemKauf changed the title ~~[CORE-3081] metrics: add read_distribution histogram~~ [CORE-3081] kafka: add read_distribution histogram Jun 6, 2024

StephanDollberg reviewed Jun 7, 2024

View reviewed changes

src/v/kafka/read_distribution_probe.h Outdated Show resolved Hide resolved

src/v/kafka/read_distribution_probe.h Outdated Show resolved Hide resolved

WillemKauf added 5 commits June 7, 2024 09:16

kafka: add read_distribution_probe to kafka::server

0b7dd3a

kafka: add first_timestamp to kafka_batch_consumer

7d2f04c

kafka: use server::read_probe() to record fetch deltas

fd56e40

rptest: add test_read_distribution_metric

3055a0e

WillemKauf force-pushed the read_histogram branch from 582e518 to 3055a0e Compare June 7, 2024 13:18

StephanDollberg approved these changes Jun 7, 2024

View reviewed changes

WillemKauf merged commit 18fa688 into redpanda-data:dev Jun 7, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE-3081] kafka: add `read_distribution` histogram #18745

[CORE-3081] kafka: add `read_distribution` histogram #18745

WillemKauf commented May 31, 2024 •

edited

Loading

andrwng left a comment

andrwng Jun 3, 2024

ballard26 Jun 7, 2024

ballard26 Jun 7, 2024

ballard26 Jun 7, 2024

WillemKauf commented Jun 4, 2024 •

edited

Loading

WillemKauf commented Jun 5, 2024

StephanDollberg left a comment

WillemKauf commented Jun 6, 2024 •

edited

Loading

[CORE-3081] kafka: add read_distribution histogram #18745

[CORE-3081] kafka: add read_distribution histogram #18745

Conversation

WillemKauf commented May 31, 2024 • edited Loading

Backports Required

Release Notes

Features

andrwng left a comment

Choose a reason for hiding this comment

andrwng Jun 3, 2024

Choose a reason for hiding this comment

ballard26 Jun 7, 2024

Choose a reason for hiding this comment

ballard26 Jun 7, 2024

Choose a reason for hiding this comment

ballard26 Jun 7, 2024

Choose a reason for hiding this comment

WillemKauf commented Jun 4, 2024 • edited Loading

WillemKauf commented Jun 5, 2024

StephanDollberg left a comment

Choose a reason for hiding this comment

WillemKauf commented Jun 6, 2024 • edited Loading

[CORE-3081] kafka: add `read_distribution` histogram #18745

[CORE-3081] kafka: add `read_distribution` histogram #18745

WillemKauf commented May 31, 2024 •

edited

Loading

WillemKauf commented Jun 4, 2024 •

edited

Loading

WillemKauf commented Jun 6, 2024 •

edited

Loading