Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueryRange use protobuf internally instead of json to reduce latency #3745

Merged
merged 2 commits into from
Jun 4, 2024

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Jun 3, 2024

What this PR does:
Similar to #3731 but for query_range. For high cardinality metrics queriers on larger clusters, the marshaling/unmarshaling of json was becoming the bottleneck in the query-frontend. Metrics queries have larger and more numerous jobs than autocomplete, so the improvement is even better. I was thinking about why this was just now being noticed, since autocomplete and query_range have existed for some time and the request/response formats haven't changed. I believe it is due to the recently added asynchronous frontend pipeline. Two changes (1) It is more efficient at issuing jobs, so the bottleneck in the frontend from unmarshaling is now readily apparent (2) the unmarshaling was moved to a single goroutine, which will be restored in #3713 however this PR is more impactful because it greatly reduces the overall amount of work done by the frontend (versus spreading it out among more cpu cores).

Here is another graph showing the improvement for the query { } | rate() by (resource.service.name) of 3h time range, from ~80s to 20s (all cluster and request parameters the same).

image

NOTE - I only did this for the rf1 read path.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@@ -438,7 +439,10 @@ func (s *queryRangeSharder) generatorRequest(searchReq tempopb.QueryRangeRequest

searchReq.QueryMode = querier.QueryModeRecent

return s.toUpstreamRequest(parent.Context(), searchReq, parent, tenantID)
req := s.toUpstreamRequest(parent.Context(), searchReq, parent, tenantID)
req.Header.Set(api.HeaderAccept, api.HeaderAcceptProtobuf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add this proto header in prepareRequestForQueriers()?

this feels like a few changes away from just making this the default relationship between the queriers and frontend.

also, fantastic find! agree with the analysis this is largely cropping up due to the streaming frontend refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when we make this universal that is the ideal place to add the header. And on the querier side the writeFormattedContentForRequest(w, r, resp) is also easy to incorporate. I'm on the fence to do it in this PR, since splitting out the remaining work would be a nice opportunity to get others get involved. But happy to do it if that's not needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my .02: just do them all for consistency and simplicity

i don't disagree with your thoughts though if you decide to just merge

@mdisibio mdisibio merged commit 3bbb45f into grafana:main Jun 4, 2024
14 checks passed
mapno pushed a commit that referenced this pull request Jun 6, 2024
…3745)

* Update query_range to use protobuf between frontend->querier instead of json for improved performance

* changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants