[Proposal] Support sub aggregation in filter rewrite optimization #12602

bowenlan-amzn · 2024-03-11T23:08:35Z

Follow up task of #9310

Currently sub aggregation is not supported in filter rewrite optimization, only single date histogram is supported.
This makes the applicable scenarios very limited. It would be great we can find a way to support sub aggregation while applying the filter rewrite optimization.

I notice one possible path when applying the optimization to composite aggregation previously. There's a established pattern to defer the sub aggregation collection. The idea is to do the aggregation collection in 2 pass. 1st pass is to get the docIdSets per bucket, 2nd pass is to run the collection of the sub aggregation on these docIdSets per bucket.

OpenSearch/server/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java

Lines 648 to 673 in 246557c

    
           for (Entry entry : entries) { 
        
               DocIdSetIterator docIdSetIterator = entry.docIdSet.iterator(); 
        
               if (docIdSetIterator == null) { 
        
                   continue; 
        
               } 
        
               final LeafBucketCollector subCollector = deferredCollectors.getLeafCollector(entry.context); 
        
               final LeafBucketCollector collector = queue.getLeafCollector(entry.context, getSecondPassCollector(subCollector)); 
        
               DocIdSetIterator scorerIt = null; 
        
               if (needsScores) { 
        
                   Scorer scorer = weight.scorer(entry.context); 
        
                   if (scorer != null) { 
        
                       scorerIt = scorer.iterator(); 
        
                       subCollector.setScorer(scorer); 
        
                   } 
        
               } 
        
               int docID; 
        
               while ((docID = docIdSetIterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { 
        
                   if (needsScores) { 
        
                       assert scorerIt != null && scorerIt.docID() < docID; 
        
                       scorerIt.advance(docID); 
        
                       // aggregations should only be replayed on matching documents 
        
                       assert scorerIt.docID() == docID; 
        
                   } 
        
                   collector.collect(docID); 
        
               } 
        
           }

Theoretically, the performance improvement still comes from using index structure instead of iteration to get the matching docs to collect at the date histogram level. Sub aggregation collection on these matching docs is expected to be at same speed. And there would be some memory cost of saving the docIdSets for a certain period for 2nd pass.

In the end, we are expected performance improvement on these 2 operations from big5 workload. These operations have sub-aggregation.

Some other issues will also improve the performance of sub-aggregation, and they are coming from indexing side — compute some special index structure to improve the sub-aggregation performance, whereas this approach is focused on the query-time improvement.
#3734
#12498

finnegancarroll · 2024-06-18T20:10:08Z

Picking up this issue.

github-actions bot added the untriaged label Mar 11, 2024

bowenlan-amzn added Search:Aggregations Search:Performance and removed untriaged labels Mar 11, 2024

getsaurabh02 added the v2.15.0 Issues and PRs related to version 2.15.0 label May 28, 2024

getsaurabh02 added v2.16.0 Issues and PRs related to version 2.16.0 and removed v2.15.0 Issues and PRs related to version 2.15.0 labels Jun 6, 2024

This was referenced Jun 18, 2024

Supporting fast bucket aggregation on numeric multi field aggregation #11740

Open

[Profiling deep dive] Default aggregation vs. optimization code path #14438

Open

bowenlan-amzn assigned finnegancarroll Jun 18, 2024

mch2 added v2.17.0 and removed v2.16.0 Issues and PRs related to version 2.16.0 labels Jul 22, 2024

This was referenced Aug 3, 2024

Aggregation filter rewrite optimization follow up #15078

Closed

Filter rewrite sub agg support bowenlan-amzn/OpenSearch#2

Open

getsaurabh02 added the Roadmap:Search Project-wide roadmap label label Aug 12, 2024

finnegancarroll mentioned this issue Aug 14, 2024

Support sub aggregations on filter rewrite optimization #15253

Closed

3 tasks

peterzhuamazon mentioned this issue Aug 20, 2024

[RFC] Building a GitHub Automation App for OpenSearch GitHub Org opensearch-project/opensearch-build#4958

Open

getsaurabh02 added the v2.18.0 Issues and PRs related to version 2.18.0 label Sep 6, 2024

getsaurabh02 changed the title ~~Support sub aggregation in filter rewrite optimization~~ [Proposal] Support sub aggregation in filter rewrite optimization Sep 6, 2024

bowenlan-amzn removed v2.17.0 v2.18.0 Issues and PRs related to version 2.18.0 labels Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Support sub aggregation in filter rewrite optimization #12602

[Proposal] Support sub aggregation in filter rewrite optimization #12602

bowenlan-amzn commented Mar 11, 2024 •

edited

Loading

finnegancarroll commented Jun 18, 2024

[Proposal] Support sub aggregation in filter rewrite optimization #12602

[Proposal] Support sub aggregation in filter rewrite optimization #12602

Comments

bowenlan-amzn commented Mar 11, 2024 • edited Loading

finnegancarroll commented Jun 18, 2024

bowenlan-amzn commented Mar 11, 2024 •

edited

Loading