enforce max series for metrics queries #4525

ie-pham · 2025-01-07T23:58:18Z

What this PR does: Add config to enforce max time series returned in a metrics query. This is enforced in the combiner. As soon as the max series is reached, the shouldQuit function will return true and the combiner returns everything that it has combined so far even if there are just a single data point in each of the series when the max series is reached.

new config: max_response_series <default 1000>

Which issue(s) this PR fixes:
Fixes #

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

ie-pham · 2025-01-21T15:33:04Z

the way this is implemented, tempo will truncate the final results at the frontend level. We can implement it in a way that will return as soon as 1000 series is reached regardless of how many data points are in each series to exit early. Not sure which we prefer.

pkg/api/http.go

pkg/tempopb/tempo.proto

CHANGELOG.md

docs/sources/tempo/configuration/_index.md

modules/frontend/combiner/metrics_query_range.go

pkg/api/http.go

pkg/tempopb/tempo.proto

knylander-grafana

Thank you for adding docs.

electron0zero · 2025-02-04T17:25:39Z

modules/frontend/combiner/metrics_query_range.go

@@ -55,15 +61,18 @@ func NewQueryRange(req *tempopb.QueryRangeRequest, trackDiffs bool) (Combiner, e
 			sortResponse(resp)
 			return resp, nil
 		},
+		quit: func(resp *tempopb.QueryRangeResponse) bool {


can we add a test for early exit from the combiner?

…n all data points

joe-elliott · 2025-02-12T13:03:33Z

docs/sources/tempo/configuration/_index.md

+
+    metrics:
+        # Maximum number of time series returned for a metrics query.
+        [max_response_series: <int> | default = 1000]


this is an interesting choice. normally we would communicate the max series through a query param from the frontend to the queriers. the negative of your approach is that we have to make sure that 2 settings are aligned or tempo may appear subtly broken. the advantage is that we don't repeatedly marshal something like series=1000 once for every subquery.

can you bring this up with the team and see if we have consensus either way?

this does make me wonder if we should have a shared section of the config for querying like we do for storage. that feels like overkill for one setting tho.

joe-elliott · 2025-02-12T13:05:16Z

integration/e2e/config-query-range-max-series.yaml

+    query_backend_after: 0 # setting these both to 0 will force all range searches to hit the backend
+    query_ingesters_until: 0
+  metrics:
+    max_response_series: 3


for the test should we set it on the querier as well?

joe-elliott · 2025-02-12T13:09:38Z

integration/e2e/query_range_test.go

+sendLoop:
+	for {
+		select {
+		case <-ticker.C:


curious as to why you chose this loop structure. the goal seems to be loop 10 times and send data?

joe-elliott · 2025-02-12T13:10:34Z

CHANGELOG.md

@@ -17,6 +17,7 @@
 * [CHANGE] **BREAKING CHANGE** Enforce max attribute size at event, link, and instrumentation scope. Make config per-tenant.
  Renamed max_span_attr_byte to max_attribute_bytes
  [#4633](https://github.com/grafana/tempo/pull/4633) (@ie-pham)
+* [CHANGE] Enforce max series in response for metrics queries [#4525](https://github.com/grafana/tempo/pull/4525) (@ie-pham)


i'd mention the addition of the config param(s) to control behavior.

joe-elliott · 2025-02-12T13:14:52Z

modules/frontend/combiner/metrics_query_range.go

@@ -45,6 +46,13 @@ func NewQueryRange(req *tempopb.QueryRangeRequest) (Combiner, error) {
 			if resp == nil {
 				resp = &tempopb.QueryRangeResponse{}
 			}
+			if maxSeries > 0 && len(resp.Series) >= maxSeries {


i believe we need a similar check in the diff function?

joe-elliott · 2025-02-12T13:38:58Z

modules/frontend/metrics_query_handler.go

 		if err != nil {
 			return err
 		}

 		collector := pipeline.NewGRPCCollector(next, cfg.ResponseConsumers, c, func(qrr *tempopb.QueryRangeResponse) error {
 			// Translate each diff into the instant version and send it
 			resp := translateQueryRangeToInstant(*qrr)
+			// series already limited by the query range combiner just need to copy the status and message
+			resp.Status = qrr.Status


why don't we do this in translateQueryRangeToInstant? seems like we need similar logic in the http handler

joe-elliott · 2025-02-12T13:41:29Z

modules/querier/querier_query_range.go

 	if err != nil {
 		return nil, err
 	}

 	mtx := sync.Mutex{} // combiner doesn't lock, so take lock before calling Combine to make is safe
 	forEach := func(ctx context.Context, client tempopb.MetricsGeneratorClient) error {
+		if c.MaxSeriesReached() {


should we not enforce this in the generators to prevent {} | rate() by (span:id) or whatever from overwhelming them?

have you tested such a query on this branch?

also, don't we need similar code on the backend path? we also need to be thoughtful about the situation where the output series are different than the intermediate series.

for instance a quantile_over_time() calculation will pass up intermediate histograms that are then turned into quantiles in the frontend. so the queriers/generators may actually be handling more series than the output result. i'm wondering if we want a 2 tier limit where the queriers/generators that are doing the intermediate work have a higher limit than the frontend? can you push this discussion internally?

joe-elliott · 2025-02-12T13:44:02Z

pkg/traceql/combine.go

+	// used to track which series were updated since the previous diff
+	// todo: it may not be worth it to track the diffs per series. it would be simpler (and possibly nearly as effective) to just calculate a global
+	//  max/min for all series
+	seriesUpdated map[string]tsRange


i think this was from a previous feature that's been removed

joe-elliott · 2025-02-12T13:52:38Z

pkg/traceql/combine.go

 		return
 	}

 	// Here is where the job results are reentered into the pipeline
 	q.eval.ObserveSeries(resp.Series)

+	if q.maxSeries > 0 && len(q.eval.Results()) >= q.maxSeries {


i'm guessing "Results()" can get quite expensive. it might be easier to count and limit input series? this would be similar in concept to limiting input streams in a loki or prometheus query. can you do some research here to determine the cost?

ie-pham mentioned this pull request Jan 8, 2025

Limit series produced by TraceQL Metrics #4219

Open

ie-pham marked this pull request as ready for review January 8, 2025 17:54

ie-pham requested review from knylander-grafana, joe-elliott, mdisibio, mapno, yvrhdn, zalegrala, electron0zero, stoewer and javiermolinar as code owners January 8, 2025 17:54

electron0zero reviewed Jan 21, 2025

View reviewed changes

joe-elliott reviewed Jan 21, 2025

View reviewed changes

pkg/api/http.go Outdated Show resolved Hide resolved

pkg/tempopb/tempo.proto Show resolved Hide resolved

knylander-grafana reviewed Jan 27, 2025

View reviewed changes

electron0zero reviewed Feb 4, 2025

View reviewed changes

ie-pham added 5 commits February 10, 2025 11:47

enforce max series for metrics queries

68147d7

changelog

85dcd9a

manifest.md

91b941d

bail as soon as max series is hit even if each series does not contai…

3de1392

…n all data points

added test and removed v2

2c986ed

ie-pham force-pushed the maxmetricsseries branch from 604930d to 2c986ed Compare February 10, 2025 17:56

add querier implementation

19591e0

joe-elliott reviewed Feb 12, 2025

View reviewed changes

lint

4aa4ee3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enforce max series for metrics queries #4525

enforce max series for metrics queries #4525

ie-pham commented Jan 7, 2025 •

edited

Loading

ie-pham commented Jan 21, 2025

knylander-grafana left a comment

electron0zero Feb 4, 2025 •

edited

Loading

joe-elliott Feb 12, 2025

joe-elliott Feb 12, 2025

joe-elliott Feb 12, 2025

joe-elliott Feb 12, 2025

joe-elliott Feb 12, 2025

joe-elliott Feb 12, 2025

joe-elliott Feb 12, 2025

joe-elliott Feb 12, 2025

joe-elliott Feb 12, 2025

joe-elliott Feb 12, 2025

joe-elliott Feb 12, 2025

enforce max series for metrics queries #4525

Are you sure you want to change the base?

enforce max series for metrics queries #4525

Conversation

ie-pham commented Jan 7, 2025 • edited Loading

ie-pham commented Jan 21, 2025

knylander-grafana left a comment

Choose a reason for hiding this comment

electron0zero Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ie-pham commented Jan 7, 2025 •

edited

Loading

electron0zero Feb 4, 2025 •

edited

Loading