Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Support sub aggregation in filter rewrite optimization #12602

Open
bowenlan-amzn opened this issue Mar 11, 2024 · 1 comment
Open
Assignees

Comments

@bowenlan-amzn
Copy link
Member

bowenlan-amzn commented Mar 11, 2024

Follow up task of #9310

Currently sub aggregation is not supported in filter rewrite optimization, only single date histogram is supported.
This makes the applicable scenarios very limited. It would be great we can find a way to support sub aggregation while applying the filter rewrite optimization.

I notice one possible path when applying the optimization to composite aggregation previously. There's a established pattern to defer the sub aggregation collection. The idea is to do the aggregation collection in 2 pass. 1st pass is to get the docIdSets per bucket, 2nd pass is to run the collection of the sub aggregation on these docIdSets per bucket.

for (Entry entry : entries) {
DocIdSetIterator docIdSetIterator = entry.docIdSet.iterator();
if (docIdSetIterator == null) {
continue;
}
final LeafBucketCollector subCollector = deferredCollectors.getLeafCollector(entry.context);
final LeafBucketCollector collector = queue.getLeafCollector(entry.context, getSecondPassCollector(subCollector));
DocIdSetIterator scorerIt = null;
if (needsScores) {
Scorer scorer = weight.scorer(entry.context);
if (scorer != null) {
scorerIt = scorer.iterator();
subCollector.setScorer(scorer);
}
}
int docID;
while ((docID = docIdSetIterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
if (needsScores) {
assert scorerIt != null && scorerIt.docID() < docID;
scorerIt.advance(docID);
// aggregations should only be replayed on matching documents
assert scorerIt.docID() == docID;
}
collector.collect(docID);
}
}

Theoretically, the performance improvement still comes from using index structure instead of iteration to get the matching docs to collect at the date histogram level. Sub aggregation collection on these matching docs is expected to be at same speed. And there would be some memory cost of saving the docIdSets for a certain period for 2nd pass.

In the end, we are expected performance improvement on these 2 operations from big5 workload. These operations have sub-aggregation.

Some other issues will also improve the performance of sub-aggregation, and they are coming from indexing side — compute some special index structure to improve the sub-aggregation performance, whereas this approach is focused on the query-time improvement.
#3734
#12498

@getsaurabh02 getsaurabh02 added the v2.15.0 Issues and PRs related to version 2.15.0 label May 28, 2024
@getsaurabh02 getsaurabh02 added v2.16.0 Issues and PRs related to version 2.16.0 and removed v2.15.0 Issues and PRs related to version 2.15.0 labels Jun 6, 2024
@finnegancarroll
Copy link
Contributor

Picking up this issue.

@mch2 mch2 added v2.17.0 and removed v2.16.0 Issues and PRs related to version 2.16.0 labels Jul 22, 2024
@getsaurabh02 getsaurabh02 added the Roadmap:Search Project-wide roadmap label label Aug 12, 2024
@getsaurabh02 getsaurabh02 added the v2.18.0 Issues and PRs related to version 2.18.0 label Sep 6, 2024
@getsaurabh02 getsaurabh02 changed the title Support sub aggregation in filter rewrite optimization [Proposal] Support sub aggregation in filter rewrite optimization Sep 6, 2024
@bowenlan-amzn bowenlan-amzn removed v2.17.0 v2.18.0 Issues and PRs related to version 2.18.0 labels Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 2.17 (First RC 09/03, Release 09/17)
Status: New
Status: Todo
Status: Later (6 months plus)
Development

No branches or pull requests

4 participants