-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planner/core: enhance the rule of group pruning #9431
Conversation
acf5bc9
to
17ea4ef
Compare
Codecov Report
@@ Coverage Diff @@
## master #9431 +/- ##
===============================================
- Coverage 79.9286% 79.8707% -0.058%
===============================================
Files 460 460
Lines 102878 102582 -296
===============================================
- Hits 82229 81933 -296
- Misses 14688 14709 +21
+ Partials 5961 5940 -21 |
// tryToEliminateAggregation will eliminate aggregation grouped by unique key. | ||
// nestedAggPattern stores nested LogicalAggregations, so they can be accessed easily | ||
type nestedAggPattern struct { | ||
outter *LogicalAggregation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, I checked some other spelling typos as well, thanks
@@ -31,11 +31,20 @@ type aggregationEliminator struct { | |||
type aggregationEliminateChecker struct { | |||
} | |||
|
|||
// tryToEliminateAggregation will eliminate aggregation grouped by unique key. | |||
// nestedAggPattern stores nested LogicalAggregations, so they can be accessed easily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add .
at the end of sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
outter *LogicalAggregation | ||
proj *LogicalProjection | ||
inner *LogicalAggregation | ||
// isTrivial indicates if there's constraints(like LogicalSelection/LogicalLimit) "between" them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
// genColMaps generates exprMap(`col -> definition expr`) and aggMap(`col -> definition aggFunc`) from column definitions | ||
func genColMaps(ctx sessionctx.Context, ptn *nestedAggPattern) (map[string]expression.Expression, map[string]*aggregation.AggFuncDesc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx
is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
func genColMaps(ctx sessionctx.Context, ptn *nestedAggPattern) (map[string]expression.Expression, map[string]*aggregation.AggFuncDesc) { | ||
exprMap := make(map[string]expression.Expression, len(ptn.proj.Schema().Columns)) | ||
for _, col := range ptn.proj.Schema().Columns { | ||
idx := ptn.proj.Schema().ColumnIndex(col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not the offset in the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refined
var newExprs []expression.Expression | ||
replaced := false | ||
for idx, expr := range exprs { | ||
if e, ok := exprMap[string(expr.HashCode(nil))]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why pass nil? Will it cause panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, I added a test case as well
var newExprs []expression.Expression | ||
replaced := false | ||
for idx, expr := range exprs { | ||
if fun, ok := aggMap[string(expr.HashCode(nil))]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
17ea4ef
to
387ccd9
Compare
hi @lamxTyler , addressed, thanks for the comments! |
69659ab
to
ee61818
Compare
PTAL @eurekaka , thanks! |
outer *LogicalAggregation | ||
proj *LogicalProjection | ||
inner *LogicalAggregation | ||
// isTrivial indicates if there's constraints(like LogicalSelection/LogicalLimit) "between" them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// isTrivial indicates if there's constraints(like LogicalSelection/LogicalLimit) "between" them. | |
// isTrivial indicates if there are operators(like LogicalSelection/LogicalLimit) between 2 aggs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
@@ -136,6 +145,114 @@ func (a *aggregationEliminateChecker) wrapCastFunction(ctx sessionctx.Context, a | |||
return expression.BuildCastFunction(ctx, arg, targetTp) | |||
} | |||
|
|||
// tryToEliminateAggregationByMapping tries to eliminate an aggregation from nested aggregations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we are eliminating the nested aggregation? Please make this comment clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
// can be rewritten as | ||
// `select a + b as p, count(d) as `max(dt)` from t group by a + b` | ||
// | ||
// 2. Nested aggregations in which outer group-by items are proper subset of the inner ones, for example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// 2. Nested aggregations in which outer group-by items are proper subset of the inner ones, for example: | |
// 2. Group by items of outer aggregation are super set of nested aggregation, for example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed, thanks
// can be rewritten as | ||
// `select a as at, max(b) as `max(bt)` from t group by a` | ||
// | ||
// In order to apply the optimization, we tries to map definition of outer aggregation to schema of inner aggregation and check if rules are met. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// In order to apply the optimization, we tries to map definition of outer aggregation to schema of inner aggregation and check if rules are met. | |
// We map definition of outer aggregation to schema of inner aggregation and check if the rules can be applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed, thanks
// Map group-by items in outer aggregation to the schema of inner aggregation, so the group-by items can be compared. | ||
_, items := exprSubstitute(la.ctx, exprMap, ptn.outer.GroupByItems) | ||
_, items = aggSubstitute(la.ctx, aggMap, items, func(fun *aggregation.AggFuncDesc) bool { | ||
// Outer group-by items cannot be aggregated result of inner aggregation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this check? the "group-by subset check" has already guarantees this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check here is to prevent the following case:
select bt, max(ct) from (select b as bt, count(c) as ct from t group by c, b) tt group by ct;
ct
is defined as count(c)
and it is tricky to indicate the semantics of aggregations here.
newFuns := make([]*aggregation.AggFuncDesc, len(ptn.outer.AggFuncs)) | ||
for idx, fun := range ptn.outer.AggFuncs { | ||
_, exprs := exprSubstitute(la.ctx, exprMap, fun.Args) | ||
expr := exprs[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a check for single-param aggregation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
return false, exprs | ||
} | ||
|
||
func deepContains(ctx sessionctx.Context, exprs []expression.Expression, expr expression.Expression) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about renaming it to EqualContains
and move it to be with Contains
in expression/util.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
if subItem.Equal(la.ctx, item) { | ||
found = true | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use deepContains
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed, thanks!
combined.Args[0] = expr | ||
|
||
if innerIsGroup { | ||
if (inner.Name == ast.AggFuncFirstRow || inner.Name == ast.AggFuncMax || inner.Name == ast.AggFuncMin) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the rationale behind this condition? From my understanding, if innerIsGroup
is true, we can just keep the outer aggregation without change, no matter what kind of aggregation it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the item being aggregated in outter aggregation is group-by item of inner aggregation. We can only do elimination on very few conditions.
For example, given
select sum(at) from (select count(a) as at, b as bt from t group by a, b) as t group by bt
AggFunc count(a)
is always 1 and sum(at)
is distinct number of a
. We only consider the simplest case here because for other cases, there is no common rule to indicate the semantics.
// * true / substituted exprs if the substitution happens. | ||
// * false / original exprs if substitution doesn't happen. | ||
// * false / nil if predicate is not nil and it failed. | ||
func aggSubstitute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function shares a lot of logic with exprSubstitute
, is it possible to merge them together? or using ColumnSubstitute
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
case inner.Name == ast.AggFuncBitOr && outer.Name == ast.AggFuncBitOr: | ||
case inner.Name == ast.AggFuncBitXor && outer.Name == ast.AggFuncBitXor: | ||
case inner.Name == ast.AggFuncGroupConcat && outer.Name == ast.AggFuncGroupConcat: | ||
if inner.HasDistinct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first_row / min / max / BitAnd / BitOr / BitXor
is applicable even if inner.HasDistinct
is true? Also, group_concat
cannot be simply merged because it may contain ORDER BY
and SEPARATOR
keyword inside the group_concat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first_row
, min
, max
, bit_and
, bit_or
accept HasDistinct
because they're idempotent. But bit_xor
is not, I've fixed it, thanks!
@bb7133 friendly ping, any update? |
Still working on this PR |
Close this PR temporarily because of the PR quota. |
3826745
to
60f6568
Compare
/run-all-tests |
a51548b
to
5522bfd
Compare
planner/core/logical_plan_test.go
Outdated
@@ -1647,6 +1647,126 @@ func (s *testPlanSuite) TestAggPrune(c *C) { | |||
sql: "select a, count(distinct a, b) from t group by a", | |||
best: "DataScan(t)->Projection", | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better move these tests to json file now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the explaintest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not explain test, you can refer to #12091
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed, PTAL
} | ||
// Generalize isGbItem: aggregate function can also be treated as group by item. | ||
// For example, select max(at), sum(bt) from (select a as at, count(b) as bt from t group by a) as tt, | ||
// in such case column `at` is an alias of `max(a)`, and `max(a)` is equivalent of `a`, which is the group by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
column
at
is an alias ofmax(a)
, andmax(a)
is equivalent ofa
Why is at
an alias of max(a)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a typo, I've fixed the comment as following:
...
// For example, select max(at), sum(bt) from (select max(a) as at, count(b) as bt from t group by a) as tt,
// in such case column `at` is an alias of `max(a)`, and `max(a)` is equivalent of `a`(`firstrow(a)`), which is the group by
// item of inner aggregation.
Thanks.
|
||
// tryToCombineAggFunc checks the types of inner/outer aggregate function and check if they can be combined as one based on their semantics. | ||
// for example, since max(max(PARTIAL)) can be combined as max(TOTAL), we can combine inner max() and outer max() as a final max() | ||
// `innerIsGroup` indicates if the inner aggregate function is aggregating group-by items(values are de-duplicated). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
innerIsGroup
-> innerIsGbItem
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
// Generalize isGbItem: aggregate function can also be treated as group by item. | ||
// For example, select max(at), sum(bt) from (select a as at, count(b) as bt from t group by a) as tt, | ||
// in such case column `at` is an alias of `max(a)`, and `max(a)` is equivalent of `a`, which is the group by | ||
// item of inner aggregation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add .
at the end of the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
return nil | ||
} | ||
|
||
newFuns := make([]*aggregation.AggFuncDesc, len(ptn.outer.AggFuncs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newFuncs
seems better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
@lamxTyler, @eurekaka, @lzmhhh123, PTAL. |
2 similar comments
@lamxTyler, @eurekaka, @lzmhhh123, PTAL. |
@lamxTyler, @eurekaka, @lzmhhh123, PTAL. |
1. fix some spelling typo 2. fix expr.HashCode(nil) bug 3. refined some other code styles
6f2788c
to
e2ce80d
Compare
Hi @francis0407 , thanks for the reviews, PTAL |
@lamxTyler, @eurekaka, @lzmhhh123, PTAL. |
1 similar comment
@lamxTyler, @eurekaka, @lzmhhh123, PTAL. |
@lamxTyler, @eurekaka, @lzmhhh123, PTAL. |
What problem does this PR solve?
This PR aims to implement the rule described in #7700, in current 'core' planner
What is changed and how it works?
A new
tryToEliminateAggregationByMapping()
is added, it does the following things:LogicalAggregation
plan pattern fromgroup by
query with anothergroup by
subquery. Let's call them outter and inner aggregations, respectively.Tests
Side effects