Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Logical Plan Optimizer rework #1872

Open
5 tasks
Yury-Fridlyand opened this issue Jul 15, 2023 · 0 comments
Open
5 tasks

[RFC] Logical Plan Optimizer rework #1872

Yury-Fridlyand opened this issue Jul 15, 2023 · 0 comments
Labels
design documentation Improvements or additions to documentation enhancement New feature or request

Comments

@Yury-Fridlyand
Copy link
Collaborator

Yury-Fridlyand commented Jul 15, 2023

Please, track first design draft and discussion in #1752

TL;DR

Current behavior:

def Optimize:
    for node in PlanTree: # Traverse the Logical Plan Tree
        for rule in rules: # Enumerate rules
            tryApplyRule()

New behavior:

def Optimize:
    for rule in rules: # Enumerate rules
        for node in PlanTree: # Traverse the Logical Plan Tree
            tryApplyRule()

No new features, all tests pass, nothing changed for the end-user.

Background

Currently each storage engine adds its own logical operator as concrete implementation for TableScanOperator abstraction. Typically each data source needs to add 2 logical operators for table scan with without aggregation. Take OpenSearch for example, there are OpenSearchLogicalIndexScan and OpenSearchLogicalIndexAgg and a bunch of pushdown optimization rules for each accordingly.

class LogicalPlanOptimizer:
  /*
   * OpenSearch rules include:
   *   PUSH_DOWN_PAGE_SIZE
   *   PUSH_DOWN_FILTER
   *   PUSH_DOWN_AGGREGATION
   *   PUSH_DOWN_SORT
   *   PUSH_DOWN_HIGHLIGHT
   *   PUSH_DOWN_NESTED
   *   PUSH_DOWN_PROJECT
   *   PUSH_DOWN_LIMIT
   *
   * that return *OpenSearchLogicalIndexAgg*
   *  or *OpenSearchLogicalIndexScan* finally
   */
  val rules: List<Rule>

  def optimize(plan: LogicalPlan):
    for rule in rules: # Enumerate rules
      for node in plan: # Traverse the Logical Plan Tree
        tryApplyRule()

Optimization Protocol

There are optimizaion guidelines which should be strictly followed to ensure that search query built completely matches user request.

1.

Optimizer should apply rules in the strict order they are defined. For example, PUSH_DOWN_LIMIT should be applied last.
Violation of that causes bugs, for example #1764, #1774, #1788.

Sample queries
  1. Pagination with LIMIT
{
    "query": "SELECT * from `calcs` limit 10",
    "fetch_size": 3
}

TODO ❗ expected behavior once LIMIT supported in pagination.

stateDiagram-v2
  state "Before" as Before {
    state "LogicalPaginate" as Paginate
    state "LogicalProject" as ProjectB
    state "LogicalLimit" as LimitB
    state "LogicalRelation" as Relation

    Paginate --> ProjectB
    ProjectB --> LimitB
    LimitB --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalLimit" as LimitA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> LimitA
    LimitA --> TableScanBuilder
  }
Loading
  1. NESTED with LIMIT
SELECT nested(message.*) from nested-type limit 4

TODO ❗ expected behavior once #1764 fixed - need to reorder tree nodes

stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalNested" as NestedB
    state "LogicalLimit" as LimitB
    state "LogicalRelation" as Relation

    ProjectB --> NestedB
    NestedB --> LimitB
    LimitB --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalLimit" as LimitA
    state "LogicalNested" as NestedA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> LimitA
    LimitA --> NestedA
    NestedA --> TableScanBuilder
  }
Loading

2.

Optimizer should be able to apply a rule matching Something even when plan tree has something in between of LogicalSomething and TableScanBuilder, unless exception specified. TableScanBuilder could be wrapped by another tree node, for example in join implementation (see #1623).

Sample queries
  1. PPL with SORT then LIMIT
source=calcs | sort - int0 | head 10 | fields int0;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalLimit" as Limit
    state "LogicalSort" as Sort
    state "LogicalRelation" as Relation

    ProjectB --> Limit
    Limit --> Sort
    Sort --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> TableScanBuilder
  }
Loading
  1. PPL with LIMIT then SORT
source=calcs | head 10 | sort - int0 | fields int0;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalSort" as Sort
    state "LogicalLimit" as Limit
    state "LogicalRelation" as Relation

    ProjectB --> Sort
    Sort --> Limit
    Limit --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> TableScanBuilder
  }
Loading
  1. An SQL query; likely plan trees of all SQL queries are always in the same order
select * from calcs where int0 > 0 order by int2 limit 10;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalLimit" as Limit
    state "LogicalSort" as Sort
    state "LogicalFilter" as Filter
    state "LogicalRelation" as Relation

    ProjectB --> Limit
    Limit --> Sort
    Sort --> Filter
    Filter --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> TableScanBuilder
  }
Loading

3.

A query might have multiple highlights backed by LogicalHighlight (and filters and sorts) - all of them should be pushed down.
A corresponding rule should be attempted multiple times.

Sample queries
  1. Multiple highlights
SELECT highlight(Title), highlight(Body, pre_tags='<mark style="background-color: green;">', post_tags='</mark>') FROM 
   beer.stackexchange WHERE multi_match([Title, Body], 'IPA') ORDER BY Id LIMIT 1;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalHighlight" as Highlight1
    state "LogicalHighlight" as Highlight2
    state "LogicalLimit" as Limit
    state "LogicalSort" as Sort
    state "LogicalFilter" as Filter
    state "LogicalRelation" as Relation

    ProjectB --> Highlight1
    Highlight1 --> Highlight2
    Highlight2 --> Limit
    Limit --> Sort
    Sort --> Filter
    Filter --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> TableScanBuilder
  }
Loading
  1. Multiple filters
source=account | where age > 30 | where age < 35 | fields age;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalFilter" as Filter1
    state "LogicalFilter" as Filter2
    state "LogicalRelation" as Relation

    ProjectB --> Filter1
    Filter1 --> Filter2
    Filter2 --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> TableScanBuilder
  }
Loading
  1. Multiple sorts
source=account | sort age | sort lastname | head 20;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalLimit" as Limit
    state "LogicalSort" as Sort1
    state "LogicalSort" as Sort2
    state "LogicalRelation" as Relation

    ProjectB --> Limit
    Limit --> Sort1
    Sort1 --> Sort2
    Sort2 --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> TableScanBuilder
  }
Loading

4.

Dislike 3 most of the rules should be applied once only.

TODO ❗ optimization doesn't work correctly in that cases, #917

Sample queries
  1. PPL
source=account | fields firstname, lastname | head 10;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as Project1B
    state "LogicalLimit" as LimitB
    state "LogicalProject" as Project2B
    state "LogicalRelation" as Relation

    Project1B --> LimitB
    LimitB --> Project2B
    Project2B --> Relation
  }

  state "After" as After {
    state "LogicalProject" as Project1A
    state "LogicalLimit" as LimitA
    state "LogicalProject" as Project2A
    state "TableScanBuilder" as TableScanBuilder

    Project1A --> LimitA
    LimitA --> Project2A
    Project2A --> TableScanBuilder
  }
Loading

5.

PUSH_DOWN_PROJECT should not happen if there is a LogicalEval between LogicalProject and TableScanBuilder.

Sample queries
source=bank | eval f = abs(age) | fields f;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalEval" as EvalB
    state "LogicalRelation" as Relation

    ProjectB --> EvalB
    EvalB --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalEval" as EvalA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> EvalA
    EvalA --> TableScanBuilder
  }
Loading

6.

Similar to 5, LogicalWindow in the plan tree between LogicalProject and TableScanBuilder should block PUSH_DOWN_PROJECT operation.

Sample queries
SELECT avg(date0) OVER(PARTITION BY datetime1) from calcs;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalWindow" as WindowB
    state "LogicalSort" as Sort
    state "LogicalRelation" as Relation

    ProjectB --> WindowB
    WindowB --> Sort
    Sort --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalWindow" as WindowA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> WindowA
    WindowA --> TableScanBuilder
  }
Loading

7.

Some push down operations could be rejected (e.g. pushDownWhatever returns false), so corresponding LogicalSomething node remains in the tree. Avoid infinite re-trying to apply a rule for that node.

Sample queries
  1. OpenSearchIndexScanAggregationBuilder rejects pushDownFilter
SELECT SUM(1) AS `cnt_` FROM calcs HAVING COUNT(2) > 1;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalFilter" as FilterB
    state "LogicalAggregation" as Aggregation
    state "LogicalRelation" as Relation

    ProjectB --> FilterB
    FilterB --> Aggregation
    Aggregation --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalFilter" as FilterA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> FilterA
    FilterA --> TableScanBuilder
  }
Loading
  1. OpenSearchIndexScanAggregationBuilder rejects pushDownSort and pushDownLimit
SELECT COUNT(*) FROM account GROUP BY age ORDER BY COUNT(*) LIMIT 5;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalLimit" as LimitB
    state "LogicalSort" as SortB
    state "LogicalAggregation" as Aggregation
    state "LogicalRelation" as Relation

    ProjectB --> LimitB
    LimitB --> SortB
    SortB --> Aggregation
    Aggregation --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalLimit" as LimitA
    state "LogicalSort" as SortA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> LimitA
    LimitA --> SortA
    SortA --> TableScanBuilder
  }
Loading

8.

PUSH_DOWN_LIMIT should be blocked if there is a LogicalSort or LogicalFilter between LogicalLimit and TableScanBuilder.

Sample queries
  1. PUSH_DOWN_SORT can't be performed due to implementation restrictions (#1471), so PUSH_DOWN_LIMIT shouldn't be performed too.
SELECT CAST(balance AS FLOAT) AS jdbc_float_alias FROM account ORDER BY jdbc_float_alias LIMIT 1;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalLimit" as LimitB
    state "LogicalSort" as SortB
    state "LogicalRelation" as Relation

    ProjectB --> LimitB
    LimitB --> SortB
    SortB --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalLimit" as LimitA
    state "LogicalSort" as SortA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> LimitA
    LimitA --> SortA
    SortA --> TableScanBuilder
  }
Loading
  1. OpenSearchIndexScanAggregationBuilder rejects pushDownFilter, so LogicalFilter remains in the tree and PUSH_DOWN_LIMIT shouldn't be performed.
SELECT gender from account GROUP BY gender HAVING count(*) > 5 LIMIT 1;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalLimit" as LimitB
    state "LogicalFilter" as FilterB
    state "LogicalAggregation" as Aggregation
    state "LogicalRelation" as Relation

    ProjectB --> LimitB
    LimitB --> FilterB
    FilterB --> Aggregation
    Aggregation --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalLimit" as LimitA
    state "LogicalFilter" as FilterA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> LimitA
    LimitA --> FilterA
    FilterA --> TableScanBuilder
  }
Loading

9.

PUSH_DOWN_SORT and PUSH_DOWN_FILTER should be after PUSH_DOWN_AGGREGATION if LogicalSort or LogicalFilter are on top of LogicalAggregation, see rules in OpenSearchIndexScanAggregationBuilder.

Sample queries
  1. SQL: LogicalFilter is on top of LogicalAggregation
SELECT gender from account GROUP BY gender HAVING count(*) > 500;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalFilter" as FilterB
    state "LogicalAggregation" as Aggregation
    state "LogicalRelation" as Relation

    ProjectB --> FilterB
    FilterB --> Aggregation
    Aggregation --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalFilter" as FilterA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> FilterA
    FilterA --> TableScanBuilder
  }
Loading
  1. PPL: LogicalFilter is on top of LogicalAggregation
source=account | stats sum(balance) as a by state | where a > 780000;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalFilter" as FilterB
    state "LogicalAggregation" as Aggregation
    state "LogicalRelation" as Relation

    ProjectB --> FilterB
    FilterB --> Aggregation
    Aggregation --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalFilter" as FilterA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> FilterA
    FilterA --> TableScanBuilder
  }
Loading

10.

PUSH_DOWN_FILTER should be before PUSH_DOWN_AGGREGATION if LogicalFilteris under of LogicalAggregation.

Sample queries
  1. LogicalFilter is under LogicalAggregation
SELECT gender from account WHERE age > 20 GROUP BY gender;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalAggregation" as Aggregation
    state "LogicalFilter" as Filter
    state "LogicalRelation" as Relation

    ProjectB --> Aggregation
    Aggregation --> Filter
    Filter --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> TableScanBuilder
  }
Loading

11.

As combination of 9 and 10, PUSH_DOWN_FILTER should be attempted before and after PUSH_DOWN_AGGREGATION.

Sample queries
  1. LogicalAggregation surrounded by two LogicalFilters
SELECT gender from account WHERE age > 20 GROUP BY gender HAVING count(*) > 80;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalFilter" as Filter1B
    state "LogicalAggregation" as Aggregation
    state "LogicalFilter" as Filter2B
    state "LogicalRelation" as Relation

    ProjectB --> Filter1B
    Filter1B --> Aggregation
    Aggregation --> Filter2B
    Filter2B --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalFilter" as FilterA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> FilterA
    FilterA --> TableScanBuilder
  }
Loading

12.

Subqueries: don’t PUSH_DOWN_AGGREGATION, PUSH_DOWN_LIMIT, PUSH_DOWN_FILTER, PUSH_DOWN_SORT for the outer query, only for inner one (under most bottom LogicalProject).

Sample queries
  1. Aggregation
SELECT COUNT(*) FILTER(WHERE age > 35) FROM (SELECT * FROM bank) as a;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as Project1B
    state "LogicalAggregation" as AggergationB
    state "LogicalProject" as Project2B
    state "LogicalRelation" as Relation

    Project1B --> AggergationB
    AggergationB --> Project2B
    Project2B --> Relation
  }

  state "After" as After {
    state "LogicalProject" as Project1A
    state "LogicalAggregation" as AggergationA
    state "LogicalProject" as Project2A
    state "TableScanBuilder" as TableScanBuilder

    Project1A --> AggergationA
    AggergationA --> Project2A
    Project2A --> TableScanBuilder
  }
Loading
  1. Filter
SELECT origin FROM (SELECT Origin AS origin, AvgTicketPrice AS price FROM flights) AS f WHERE f.price > 100;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as Project1B
    state "LogicalFilter" as FilterB
    state "LogicalProject" as Project2B
    state "LogicalRelation" as Relation

    Project1B --> FilterB
    FilterB --> Project2B
    Project2B --> Relation
  }

  state "After" as After {
    state "LogicalProject" as Project1A
    state "LogicalFilter" as FilterA
    state "LogicalProject" as Project2A
    state "TableScanBuilder" as TableScanBuilder

    Project1A --> FilterA
    FilterA --> Project2A
    Project2A --> TableScanBuilder
  }
Loading
  1. Sort
SELECT origin FROM (SELECT Origin AS origin, AvgTicketPrice AS price FROM flights) AS f ORDER BY f.price;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as Project1B
    state "LogicalSort" as SortB
    state "LogicalProject" as Project2B
    state "LogicalRelation" as Relation

    Project1B --> SortB
    SortB --> Project2B
    Project2B --> Relation
  }

  state "After" as After {
    state "LogicalProject" as Project1A
    state "LogicalSort" as SortA
    state "LogicalProject" as Project2A
    state "TableScanBuilder" as TableScanBuilder

    Project1A --> SortA
    SortA --> Project2A
    Project2A --> TableScanBuilder
  }
Loading
  1. Sort and Aggregation
SELECT Origin, MIN(AvgTicketPrice) FROM (SELECT * FROM flights) AS flights GROUP BY Origin ORDER BY MAX(AvgTicketPrice)
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as Project1B
    state "LogicalSort" as SortB
    state "LogicalAggregation" as AggergationB
    state "LogicalProject" as Project2B
    state "LogicalRelation" as Relation

    Project1B --> SortB
    SortB --> AggergationB
    AggergationB --> Project2B
    Project2B --> Relation
  }

  state "After" as After {
    state "LogicalProject" as Project1A
    state "LogicalSort" as SortA
    state "LogicalAggregation" as AggergationA
    state "LogicalProject" as Project2A
    state "TableScanBuilder" as TableScanBuilder

    Project1A --> SortA
    SortA --> AggergationA
    AggergationA --> Project2A
    Project2A --> TableScanBuilder
  }
Loading
  1. Sort and Limit
SELECT price FROM (SELECT AvgTicketPrice AS price FROM flights LIMIT 10) AS flights ORDER BY price LIMIT 5, 5;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as Project1B
    state "LogicalLimit" as Limit1B
    state "LogicalSort" as SortB
    state "LogicalProject" as Project2B
    state "LogicalLimit" as Limit2B
    state "LogicalRelation" as Relation

    Project1B --> Limit1B
    Limit1B --> SortB
    SortB --> Project2B
    Project2B --> Limit2B
    Limit2B --> Relation
  }

  state "After" as After {
    state "LogicalProject" as Project1A
    state "LogicalLimit" as LimitA
    state "LogicalSort" as SortA
    state "LogicalProject" as Project2A
    state "TableScanBuilder" as TableScanBuilder

    Project1A --> LimitA
    LimitA --> SortA
    SortA --> Project2A
    Project2A --> TableScanBuilder
  }
Loading

TODO add IT
TODO nested, pagination, highlight, window

13.

Push down absolutely identical tree nodes (LogicalSort, LogicalFilter).

Sample queries
  1. Sort
SELECT FlightDelayMin, AvgTicketPrice, STDDEV_SAMP(AvgTicketPrice) OVER (ORDER BY FlightDelayMin) AS num FROM flights ORDER BY FlightDelayMin;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalWindow" as WindowB
    state "LogicalSort" as Sort1
    state "LogicalSort" as Sort2
    state "LogicalRelation" as Relation

    ProjectB --> WindowB
    WindowB --> Sort1
    Sort1 --> Sort2
    Sort2 --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalWindow" as WindowA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> WindowA
    WindowA --> TableScanBuilder
  }
Loading
  1. Filter
source=account | where age > 38 | where age > 38 | fields firstname, age;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalFilter" as Filter1
    state "LogicalFilter" as Filter2
    state "LogicalRelation" as Relation

    ProjectB --> Filter1
    Filter1 --> Filter2
    Filter2 --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> TableScanBuilder
  }
Loading

TBD merge them (remove duplicates) or push down (current behavior - push down)?
TODO add ITs (includehighlight)

14.

Optimize subqueries for SQL and complex queries in PPL. ❗ TODO not implemented.

Sample queries
  1. Example from 4 - not optimized:
source=account | fields firstname, lastname | head 10;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as Project1B
    state "LogicalLimit" as LimitB
    state "LogicalProject" as Project2B
    state "LogicalRelation" as Relation

    Project1B --> LimitB
    LimitB --> Project2B
    Project2B --> Relation
  }

  state "After" as After {
    state "LogicalProject" as Project1A
    state "LogicalLimit" as LimitA
    state "LogicalProject" as Project2A
    state "TableScanBuilder" as TableScanBuilder

    Project1A --> LimitA
    LimitA --> Project2A
    Project2A --> TableScanBuilder
  }
Loading
  1. Relevance search with subquery:
SELECT *, highlight(origin), _score
FROM (SELECT Origin AS origin, AvgTicketPrice AS price FROM flights WHERE AvgTicketPrice > 100) AS f
WHERE score(origin = match_query('Base'));

This query fails because LogicalFilter from outer query wasn’t pushed down, so V2 tried to do apply relevance search in memory.

UnsupportedOperationException: OpenSearch defined function [match_query] is only supported in WHERE and HAVING clause.
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as Project1B
    state "LogicalHighlight" as Highlight
    state "LogicalFilter" as Filter1B
    state "LogicalProject" as Project2B
    state "LogicalFilter" as Filter2B
    state "LogicalRelation" as Relation

    Project1B --> Highlight
    Highlight --> Filter1B
    Filter1B --> Project2B
    Project2B --> Filter2B
    Filter2B --> Relation
  }

  state "After" as After {
    state "LogicalProject" as Project1A
    state "LogicalFilter" as FilterA
    state "LogicalProject" as Project2A
    state "TableScanBuilder" as TableScanBuilder

    Project1A --> FilterA
    FilterA --> Project2A
    Project2A --> TableScanBuilder
  }
Loading
  1. Complex PPL query returns incorrect results
source=account | where age > 30 | head 1000 | sort +balance | where age < 40 | head 100 | sort -balance | where balance > 10000 | fields age;
stateDiagram-v2
  state "Before" as Before {
    state "LogicalProject" as ProjectB
    state "LogicalFilter" as Filter1
    state "LogicalSort" as Sort1
    state "LogicalLimit" as Limit1
    state "LogicalFilter" as Filter2
    state "LogicalSort" as Sort2
    state "LogicalLimit" as Limit2
    state "LogicalFilter" as Filter3
    state "LogicalRelation" as Relation

    ProjectB --> Filter1
    Filter1 --> Sort1
    Sort1 --> Limit1
    Limit1 --> Filter2
    Filter2 --> Sort2
    Sort2 --> Limit2
    Limit2 --> Filter3
    Filter3 --> Relation
  }

  state "After" as After {
    state "LogicalProject" as ProjectA
    state "LogicalLimit" as LimitA
    state "TableScanBuilder" as TableScanBuilder

    ProjectA --> LimitA
    LimitA --> TableScanBuilder
  }
Loading

Optimizer rule update

To satisfy requirement listed in 2, new format rule format was created. See example for PUSH_DOWN_FILTER below; ... matches any amount of Logical Plan Tree nodes of any types, except LogicalFilter.

stateDiagram-v2
  state "Old PUSH_DOWN_FILTER implementation" as OldFilter {
    state "LogicalFilter" as FilterOld
    state "ScanBuilder" as RelationOld

    FilterOld --> RelationOld
  }

  state "New PUSH_DOWN_FILTER implementation" as NewFilter {
    state "LogicalFilter" as FilterNew
    state "..." as dots
    state "ScanBuilder" as RelationNew

    FilterNew --> dots
    dots --> RelationNew
  }
Loading

This new format was applied to PUSH_DOWN_PAGE_SIZE, PUSH_DOWN_FILTER, PUSH_DOWN_AGGREGATION, PUSH_DOWN_FILTER, PUSH_DOWN_SORT, PUSH_DOWN_HIGHLIGHT, PUSH_DOWN_NESTED, PUSH_DOWN_PROJECT and PUSH_DOWN_LIMIT.
CreateTableScanBuilder, CreateTableWriteBuilder and Prometheum related rules are not changed.

stateDiagram-v2
  state "CreateTableScanBuilder" as Builder {
    state "LogicalRelation" as Scan
  }
Loading

PushDownRule class used to build PUSH_DOWN_* rules. The class architecture follows:

classDiagram
  class PushDownRule~T~ {
    -Class~T~ clazz
    -BiFunction pushDownFunction
    -List~Function~ exceptions
    -boolean canBeAppliedMultipleTimes*
    -getDefaultException() Function
    +PushDownRule(Class~T~, boolean, BiFunction)
    +PushDownRule(Class~T~, boolean, BiFunction, Function)
    +pattern()* Pattern~T~
    +apply(T, Captures)* LogicalPlan
    -findTableScanBuilder(LogicalPlan) Optional~TableScanBuilder~
  }
Loading

The following rule configurations are created:

Rule Tree Node Type Can be applied multiple times Push Down function Exception
PUSH_DOWN_FILTER LogicalFilter true pushDownFilter LogicalAggregation, LogicalProject
PUSH_DOWN_AGGREGATION LogicalAggregation false pushDownAggregation LogicalProject
PUSH_DOWN_SORT LogicalSort true pushDownSort LogicalProject
PUSH_DOWN_LIMIT LogicalLimit false pushDownLimit LogicalSort, LogicalFilter, LogicalProject
PUSH_DOWN_PROJECT LogicalProject false pushDownProject LogicalEval, LogicalWindow
PUSH_DOWN_HIGHLIGHT LogicalHighlight true pushDownHighlight none
PUSH_DOWN_NESTED LogicalNested false pushDownNested none
PUSH_DOWN_PAGE_SIZE LogicalPaginate false pushDownPageSize none

Note that default exception is applied to all rules.

Rules are defined, checked and applied in the following order:

PUSH_DOWN_PAGE_SIZE
PUSH_DOWN_FILTER
PUSH_DOWN_AGGREGATION
PUSH_DOWN_FILTER
PUSH_DOWN_SORT
PUSH_DOWN_HIGHLIGHT
PUSH_DOWN_NESTED
PUSH_DOWN_PROJECT
PUSH_DOWN_LIMIT

To satisfy requirement 11, PUSH_DOWN_FILTER is listed twice.

Code

Please, see code MVP PoC in Bit-Quill:dev-optimizer-rework branch: https://github.com/opensearch-project/sql/compare/main..Bit-Quill:opensearch-project-sql:dev-optimizer-rework?expand=1

This code passes all IT and contains only Optimizer rework. There're no new features, so things marked as not implemented are still not implemented.

References

Current optimizer doc: https://github.com/opensearch-project/sql/blob/main/docs/dev/query-optimizer-improvement.md
Last optimizer update was done in #1091

Purpose

Proposed changes will unblock other fixes and features:

@Yury-Fridlyand Yury-Fridlyand added documentation Improvements or additions to documentation enhancement New feature or request design labels Jul 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design documentation Improvements or additions to documentation enhancement New feature or request
Projects
Status: Planned work items
Development

No branches or pull requests

1 participant