QueryRange use protobuf internally instead of json to reduce latency #3745

mdisibio · 2024-06-03T16:17:07Z

What this PR does:
Similar to #3731 but for query_range. For high cardinality metrics queriers on larger clusters, the marshaling/unmarshaling of json was becoming the bottleneck in the query-frontend. Metrics queries have larger and more numerous jobs than autocomplete, so the improvement is even better. I was thinking about why this was just now being noticed, since autocomplete and query_range have existed for some time and the request/response formats haven't changed. I believe it is due to the recently added asynchronous frontend pipeline. Two changes (1) It is more efficient at issuing jobs, so the bottleneck in the frontend from unmarshaling is now readily apparent (2) the unmarshaling was moved to a single goroutine, which will be restored in #3713 however this PR is more impactful because it greatly reduces the overall amount of work done by the frontend (versus spreading it out among more cpu cores).

Here is another graph showing the improvement for the query { } | rate() by (resource.service.name) of 3h time range, from ~80s to 20s (all cluster and request parameters the same).

NOTE - I only did this for the rf1 read path.

Which issue(s) this PR fixes:
Fixes #

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

…of json for improved performance

joe-elliott · 2024-06-03T16:44:20Z

modules/frontend/metrics_query_range_sharder.go

@@ -438,7 +439,10 @@ func (s *queryRangeSharder) generatorRequest(searchReq tempopb.QueryRangeRequest

 	searchReq.QueryMode = querier.QueryModeRecent

-	return s.toUpstreamRequest(parent.Context(), searchReq, parent, tenantID)
+	req := s.toUpstreamRequest(parent.Context(), searchReq, parent, tenantID)
+	req.Header.Set(api.HeaderAccept, api.HeaderAcceptProtobuf)


should we add this proto header in prepareRequestForQueriers()?

this feels like a few changes away from just making this the default relationship between the queriers and frontend.

also, fantastic find! agree with the analysis this is largely cropping up due to the streaming frontend refactor.

Yes, when we make this universal that is the ideal place to add the header. And on the querier side the writeFormattedContentForRequest(w, r, resp) is also easy to incorporate. I'm on the fence to do it in this PR, since splitting out the remaining work would be a nice opportunity to get others get involved. But happy to do it if that's not needed.

my .02: just do them all for consistency and simplicity

i don't disagree with your thoughts though if you decide to just merge

…3745) * Update query_range to use protobuf between frontend->querier instead of json for improved performance * changelog

Update query_range to use protobuf between frontend->querier instead …

9e13c6c

…of json for improved performance

mdisibio requested review from joe-elliott, annanay25, mapno, yvrhdn, zalegrala, electron0zero, ie-pham and stoewer as code owners June 3, 2024 16:17

changelog

f8c5edd

joe-elliott reviewed Jun 3, 2024

View reviewed changes

joe-elliott approved these changes Jun 3, 2024

View reviewed changes

mapno approved these changes Jun 4, 2024

View reviewed changes

electron0zero approved these changes Jun 4, 2024

View reviewed changes

mdisibio merged commit 3bbb45f into grafana:main Jun 4, 2024
14 checks passed

mapno pushed a commit that referenced this pull request Jun 6, 2024

QueryRange use protobuf internally instead of json to reduce latency (#…

9cefa83

…3745) * Update query_range to use protobuf between frontend->querier instead of json for improved performance * changelog

mdisibio mentioned this pull request Aug 6, 2024

Swap querier /api/search and remaining endpoints to proto #3944

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QueryRange use protobuf internally instead of json to reduce latency #3745

QueryRange use protobuf internally instead of json to reduce latency #3745

mdisibio commented Jun 3, 2024 •

edited

Loading

joe-elliott Jun 3, 2024

mdisibio Jun 3, 2024

joe-elliott Jun 4, 2024

QueryRange use protobuf internally instead of json to reduce latency #3745

QueryRange use protobuf internally instead of json to reduce latency #3745

Conversation

mdisibio commented Jun 3, 2024 • edited Loading

joe-elliott Jun 3, 2024

Choose a reason for hiding this comment

mdisibio Jun 3, 2024

Choose a reason for hiding this comment

joe-elliott Jun 4, 2024

Choose a reason for hiding this comment

mdisibio commented Jun 3, 2024 •

edited

Loading