Merge hydrant runners flatly for realtime queries. #15757
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Prior to this patch, we have two layers of mergeRunners for realtime queries: one for each Sink (a logical segment) and one across all Sinks. This is to keep metrics and results grouped by Sink, given that each FireHydrant within a Sink has its own separate storage adapter.
However, it costs extra memory usage due to the extra layer of materialization. This is especially pronounced for groupBy queries, which only use their merge buffers at the top layer of merging. The lower layer of merging materializes ResultRows directly into the heap, which can cause heap exhaustion if there are enough ResultRows.
This patch changes to a single layer of merging when
bySegment: false
, just like Historicals. To accommodate that, segment metrics likequery/segment/time
are now per-FireHydrant instead of per-Sink.Two layers of merging are retained when
bySegment: true
. This isn't common, because it's typically only used when segment level caching is enabled on the Broker, which is off by default.