Refactor the filter rewrite optimization #14464

bowenlan-amzn · 2024-06-19T22:51:39Z

Description

As more code coming into the filter rewrite optimization, it starts to become harder to understand.
Not only making the code review slower and painful, it also will slow down the new contributors into this area. So here comes the refactoring work.

Idea

The refactoring shouldn't change any business logic.
After the refactor, reader can easily find all the important information by just reading the class doc and checking the public methods of all classes.

Add only declarative code to the Aggregator, while keep the optimization business logic in the new package.
- Declarative code would be the Context object, which has several public methods to invoke the optimization workflow, combined with a Bridge object to provide the optimization any necessary access to the data in Aggregator.
- The necessary data can be passed into methods of Bridge. Comparing to saving the field into the Bridge class, this way is more readable because it tells you where this field is actually needed directly from the method name.
- Other than providing access, Bridge can also host/hide the optimization business logic.

Refactoring

Split the old huge Helper calss into independent components.
Tighten up any member access modifier of the components, left the important methods as public.
Clean the unnecessary references from the components. For example, SearchContext, instead of passing into the OptimizationContext, try to utilize the functions in AggregatorBridge to provide it whenever needed.

Why the name — `filter rewrite optimization`?

Filter in OpenSearch world has similar meaning as query, while it indicates no relavance scoring calculated.
Rewrite in OpenSearch world can mean transform OpenSearch query into lucene query, or transform a query to perform better.

Generally speaking, the optimization rewrites the aggregation into certain filters to improve performance. Aggregation execution is plain and simple iteration and collection on all matches, while filters can take advantage of the Lucene index to get expected results in log or even constant time.

Benchmark

Using the new tool to trigger benchmark from PR #14464 (comment)

Related Issues

Resolves #14435

Check List

~~[ ] Functionality includes testing.~~
~~[ ] API changes companion pull request created, if applicable.~~
~~[ ] Public documentation issue/PR created, if applicable.~~

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

github-actions · 2024-06-20T00:00:11Z

❌ Gradle check result for 1a067ba: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

extract segment match all logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

github-actions · 2024-06-20T19:51:24Z

✅ Gradle check result for 7c491b9: SUCCESS

codecov · 2024-06-20T19:54:33Z

Codecov Report

Attention: Patch coverage is 85.25799% with 60 lines in your changes missing coverage. Please review.

Project coverage is 71.15%. Comparing base (97c1bf0) to head (86cacab).
Report is 3 commits behind head on main.

Files	Patch %	Lines
...ilterrewrite/FilterRewriteOptimizationContext.java	82.60%	9 Missing and 3 partials ⚠️
...arch/aggregations/bucket/filterrewrite/Helper.java	85.00%	4 Missing and 8 partials ⚠️
...tions/bucket/filterrewrite/PointTreeTraversal.java	88.15%	5 Missing and 4 partials ⚠️
...t/filterrewrite/DateHistogramAggregatorBridge.java	87.03%	1 Missing and 6 partials ⚠️
...ns/bucket/filterrewrite/RangeAggregatorBridge.java	80.00%	2 Missing and 5 partials ⚠️
...arch/aggregations/bucket/filterrewrite/Ranges.java	75.00%	2 Missing and 3 partials ⚠️
...egations/bucket/composite/CompositeAggregator.java	86.95%	2 Missing and 1 partial ⚠️
...ucket/filterrewrite/CompositeAggregatorBridge.java	77.77%	0 Missing and 2 partials ⚠️
.../bucket/histogram/AutoDateHistogramAggregator.java	94.73%	1 Missing ⚠️
...ions/bucket/histogram/DateHistogramAggregator.java	90.90%	1 Missing ⚠️
... and 1 more

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #14464      +/-   ##
============================================
- Coverage     71.74%   71.15%   -0.59%     
+ Complexity    62904    62235     -669     
============================================
  Files          5178     5185       +7     
  Lines        295167   295146      -21     
  Branches      42679    42660      -19     
============================================
- Hits         211774   210020    -1754     
- Misses        66011    67800    +1789     
+ Partials      17382    17326      -56

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

inline class Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

...rc/main/java/org/opensearch/search/aggregations/bucket/filterrewrite/PointTreeTraversal.java

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

github-actions · 2024-08-07T18:37:08Z

✅ Gradle check result for 9040f6f: SUCCESS

...rg/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteOptimizationContext.java

.../src/main/java/org/opensearch/search/aggregations/bucket/filterrewrite/AggregatorBridge.java

...rg/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteOptimizationContext.java

...r/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java

github-actions · 2024-08-07T19:19:22Z

✅ Gradle check result for e896927: SUCCESS

- remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

github-actions · 2024-08-08T01:00:50Z

✅ Gradle check result for 8962ee3: SUCCESS

github-actions · 2024-08-08T01:03:20Z

❕ Gradle check result for 86cacab: UNSTABLE

TEST FAILURES:

      2 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.classMethod
      1 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.testResizeQueueDown

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

mch2

Thanks for these changes @bowenlan-amzn I think this is much easier to follow than the original helper class. I think we can keep going with some cleanup but my major concern re concurrent search appears resolved.

.../src/main/java/org/opensearch/search/aggregations/bucket/filterrewrite/AggregatorBridge.java

* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor extract segment match all logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor inline class Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Fix a bug Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * prepareFromSegment now doesn't return Ranges Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * how it looks like when introduce interfaces Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove interface, clean up Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improve doc Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * move multirangetraversal logic to helper Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address Marc's comments Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove circular dependency Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> (cherry picked from commit 170ea27) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

sandeshkr419 · 2024-08-15T19:44:08Z

@mch2 @bowenlan-amzn We shouldn't skip changelog for these changes.

* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic * Refactor refactor the data provider and try optimize logic * Refactor * Refactor extract segment match all logic * Refactor * Refactor inline class * Fix a bug * address comment * prepareFromSegment now doesn't return Ranges * how it looks like when introduce interfaces * remove interface, clean up * improve doc * move multirangetraversal logic to helper * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge * Address Marc's comments * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path * remove circular dependency * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info --------- (cherry picked from commit 170ea27) Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor extract segment match all logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor inline class Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Fix a bug Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * prepareFromSegment now doesn't return Ranges Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * how it looks like when introduce interfaces Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove interface, clean up Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improve doc Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * move multirangetraversal logic to helper Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address Marc's comments Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove circular dependency Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

bowenlan-amzn added 3 commits June 19, 2024 15:50

Refactor

3f43898

Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

Refactor

7d9d57e

refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

Refactor

1a067ba

refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

bowenlan-amzn changed the title ~~Refactor~~ Refactor the filter rewrite optimization Jun 19, 2024

github-actions bot added Search:Aggregations v2.16.0 Issues and PRs related to version 2.16.0 labels Jun 19, 2024

bowenlan-amzn added the skip-changelog label Jun 19, 2024

opensearch-ci-bot mentioned this pull request Jun 19, 2024

[AUTOCUT] Gradle Check Flaky Test Report for MinDocCountIT #14313

Closed

bowenlan-amzn added 2 commits June 19, 2024 23:24

Refactor

e8e9ad3

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

Refactor

7c491b9

extract segment match all logic Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

bowenlan-amzn added 2 commits June 20, 2024 21:20

Refactor

a158f78

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

Refactor

8f10faf

inline class Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

bowenlan-amzn marked this pull request as ready for review June 21, 2024 04:38

bowenlan-amzn requested review from anasalkouz, andrross, Bukhtawar, CEHENKLE, dblock, dbwiddis, gbbafna, kotwanikunal, mch2, msfroh, nknize, owaiskazi19, reta and Rishikesh1159 as code owners June 21, 2024 04:38

mch2 reviewed Aug 7, 2024

View reviewed changes

...rc/main/java/org/opensearch/search/aggregations/bucket/filterrewrite/PointTreeTraversal.java Show resolved Hide resolved

remove circular dependency

e896927

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

bowenlan-amzn force-pushed the 14435-refactor-range-agg-optimization branch from 9040f6f to e896927 Compare August 7, 2024 18:30

mch2 reviewed Aug 7, 2024

View reviewed changes

bowenlan-amzn added 2 commits August 7, 2024 17:05

Address comment

8962ee3

- remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

Merge branch 'main' into 14435-refactor-range-agg-optimization

86cacab

mch2 approved these changes Aug 9, 2024

View reviewed changes

.../src/main/java/org/opensearch/search/aggregations/bucket/filterrewrite/AggregatorBridge.java Show resolved Hide resolved

mch2 added the backport 2.x Backport to 2.x branch label Aug 9, 2024

mch2 merged commit 170ea27 into opensearch-project:main Aug 9, 2024
39 checks passed

opensearch-trigger-bot bot mentioned this pull request Aug 9, 2024

[Backport 2.x] Refactor the filter rewrite optimization #15179

Merged

finnegancarroll mentioned this pull request Aug 14, 2024

Support sub aggregations on filter rewrite optimization #15253

Closed

3 tasks

bowenlan-amzn removed the v2.16.0 Issues and PRs related to version 2.16.0 label Aug 17, 2024

opensearch-ci-bot mentioned this pull request Sep 6, 2024

[AUTOCUT] Gradle Check Flaky Test Report for MasterServiceTests #15809

Open

opensearch-ci-bot mentioned this pull request Sep 6, 2024

[AUTOCUT] Gradle Check Flaky Test Report for IndexServiceTests #14407

Open

opensearch-ci-bot mentioned this pull request Jul 3, 2024

[AUTOCUT] Gradle Check Flaky Test Report for VerifyVersionConstantsIT #14585

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the filter rewrite optimization #14464

Refactor the filter rewrite optimization #14464

bowenlan-amzn commented Jun 19, 2024 •

edited

Loading

github-actions bot commented Jun 20, 2024

github-actions bot commented Jun 20, 2024

codecov bot commented Jun 20, 2024 •

edited

Loading

github-actions bot commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

github-actions bot commented Aug 8, 2024

github-actions bot commented Aug 8, 2024

mch2 left a comment

sandeshkr419 commented Aug 15, 2024

Refactor the filter rewrite optimization #14464

Refactor the filter rewrite optimization #14464

Conversation

bowenlan-amzn commented Jun 19, 2024 • edited Loading

Description

Idea

Refactoring

Why the name — filter rewrite optimization?

Benchmark

Related Issues

Check List

github-actions bot commented Jun 20, 2024

github-actions bot commented Jun 20, 2024

codecov bot commented Jun 20, 2024 • edited Loading

Codecov Report

github-actions bot commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

github-actions bot commented Aug 8, 2024

github-actions bot commented Aug 8, 2024

mch2 left a comment

Choose a reason for hiding this comment

sandeshkr419 commented Aug 15, 2024

bowenlan-amzn commented Jun 19, 2024 •

edited

Loading

Why the name — `filter rewrite optimization`?

codecov bot commented Jun 20, 2024 •

edited

Loading