Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner/core: enhance the rule of group pruning #9431

Closed
wants to merge 7 commits into from

Conversation

bb7133
Copy link
Member

@bb7133 bb7133 commented Feb 24, 2019

What problem does this PR solve?

This PR aims to implement the rule described in #7700, in current 'core' planner

What is changed and how it works?

A new tryToEliminateAggregationByMapping() is added, it does the following things:

  1. Find a nested LogicalAggregation plan pattern from group by query with another group by subquery. Let's call them outter and inner aggregations, respectively.
  2. Tries to map schema of outter aggregation to the schema of inner aggregation, so that group-by expressions/aggregated functions can be compared.
  3. Check the semantic of all aggregated functions to determine if one of the aggregations can be eliminated.

Tests

  • Integration test

Side effects

  • Possible performance regression
  • Increased code complexity

@bb7133 bb7133 force-pushed the bb7133/enhance_agg_elim branch from acf5bc9 to 17ea4ef Compare February 24, 2019 14:21
@morgo morgo added contribution This PR is from a community contributor. sig/planner SIG: Planner labels Feb 24, 2019
@codecov-io
Copy link

codecov-io commented Feb 24, 2019

Codecov Report

Merging #9431 into master will decrease coverage by 0.0579%.
The diff coverage is 82.0809%.

@@               Coverage Diff               @@
##             master      #9431       +/-   ##
===============================================
- Coverage   79.9286%   79.8707%   -0.058%     
===============================================
  Files           460        460               
  Lines        102878     102582      -296     
===============================================
- Hits          82229      81933      -296     
- Misses        14688      14709       +21     
+ Partials       5961       5940       -21

@zz-jason zz-jason requested review from eurekaka and alivxxx February 25, 2019 08:27
// tryToEliminateAggregation will eliminate aggregation grouped by unique key.
// nestedAggPattern stores nested LogicalAggregations, so they can be accessed easily
type nestedAggPattern struct {
outter *LogicalAggregation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, I checked some other spelling typos as well, thanks

@@ -31,11 +31,20 @@ type aggregationEliminator struct {
type aggregationEliminateChecker struct {
}

// tryToEliminateAggregation will eliminate aggregation grouped by unique key.
// nestedAggPattern stores nested LogicalAggregations, so they can be accessed easily
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add . at the end of sentences.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

outter *LogicalAggregation
proj *LogicalProjection
inner *LogicalAggregation
// isTrivial indicates if there's constraints(like LogicalSelection/LogicalLimit) "between" them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

// genColMaps generates exprMap(`col -> definition expr`) and aggMap(`col -> definition aggFunc`) from column definitions
func genColMaps(ctx sessionctx.Context, ptn *nestedAggPattern) (map[string]expression.Expression, map[string]*aggregation.AggFuncDesc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx is not used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

func genColMaps(ctx sessionctx.Context, ptn *nestedAggPattern) (map[string]expression.Expression, map[string]*aggregation.AggFuncDesc) {
exprMap := make(map[string]expression.Expression, len(ptn.proj.Schema().Columns))
for _, col := range ptn.proj.Schema().Columns {
idx := ptn.proj.Schema().ColumnIndex(col)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not the offset in the array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refined

var newExprs []expression.Expression
replaced := false
for idx, expr := range exprs {
if e, ok := exprMap[string(expr.HashCode(nil))]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass nil? Will it cause panic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, I added a test case as well

var newExprs []expression.Expression
replaced := false
for idx, expr := range exprs {
if fun, ok := aggMap[string(expr.HashCode(nil))]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@bb7133 bb7133 force-pushed the bb7133/enhance_agg_elim branch from 17ea4ef to 387ccd9 Compare March 1, 2019 11:19
@bb7133
Copy link
Member Author

bb7133 commented Mar 1, 2019

hi @lamxTyler , addressed, thanks for the comments!

@bb7133 bb7133 force-pushed the bb7133/enhance_agg_elim branch from 69659ab to ee61818 Compare April 1, 2019 12:06
@bb7133
Copy link
Member Author

bb7133 commented Apr 1, 2019

PTAL @eurekaka , thanks!

@eurekaka eurekaka added the type/enhancement The issue or PR belongs to an enhancement. label Apr 3, 2019
outer *LogicalAggregation
proj *LogicalProjection
inner *LogicalAggregation
// isTrivial indicates if there's constraints(like LogicalSelection/LogicalLimit) "between" them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// isTrivial indicates if there's constraints(like LogicalSelection/LogicalLimit) "between" them.
// isTrivial indicates if there are operators(like LogicalSelection/LogicalLimit) between 2 aggs.

Copy link
Member Author

@bb7133 bb7133 Aug 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@@ -136,6 +145,114 @@ func (a *aggregationEliminateChecker) wrapCastFunction(ctx sessionctx.Context, a
return expression.BuildCastFunction(ctx, arg, targetTp)
}

// tryToEliminateAggregationByMapping tries to eliminate an aggregation from nested aggregations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we are eliminating the nested aggregation? Please make this comment clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

// can be rewritten as
// `select a + b as p, count(d) as `max(dt)` from t group by a + b`
//
// 2. Nested aggregations in which outer group-by items are proper subset of the inner ones, for example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// 2. Nested aggregations in which outer group-by items are proper subset of the inner ones, for example:
// 2. Group by items of outer aggregation are super set of nested aggregation, for example:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed, thanks

// can be rewritten as
// `select a as at, max(b) as `max(bt)` from t group by a`
//
// In order to apply the optimization, we tries to map definition of outer aggregation to schema of inner aggregation and check if rules are met.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// In order to apply the optimization, we tries to map definition of outer aggregation to schema of inner aggregation and check if rules are met.
// We map definition of outer aggregation to schema of inner aggregation and check if the rules can be applied.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed, thanks

// Map group-by items in outer aggregation to the schema of inner aggregation, so the group-by items can be compared.
_, items := exprSubstitute(la.ctx, exprMap, ptn.outer.GroupByItems)
_, items = aggSubstitute(la.ctx, aggMap, items, func(fun *aggregation.AggFuncDesc) bool {
// Outer group-by items cannot be aggregated result of inner aggregation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this check? the "group-by subset check" has already guarantees this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check here is to prevent the following case:

select bt, max(ct) from (select b as bt, count(c) as ct from t group by c, b) tt group by ct;

ct is defined as count(c) and it is tricky to indicate the semantics of aggregations here.

newFuns := make([]*aggregation.AggFuncDesc, len(ptn.outer.AggFuncs))
for idx, fun := range ptn.outer.AggFuncs {
_, exprs := exprSubstitute(la.ctx, exprMap, fun.Args)
expr := exprs[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a check for single-param aggregation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

return false, exprs
}

func deepContains(ctx sessionctx.Context, exprs []expression.Expression, expr expression.Expression) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming it to EqualContains and move it to be with Contains in expression/util.go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

if subItem.Equal(la.ctx, item) {
found = true
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use deepContains here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed, thanks!

combined.Args[0] = expr

if innerIsGroup {
if (inner.Name == ast.AggFuncFirstRow || inner.Name == ast.AggFuncMax || inner.Name == ast.AggFuncMin) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the rationale behind this condition? From my understanding, if innerIsGroup is true, we can just keep the outer aggregation without change, no matter what kind of aggregation it is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the item being aggregated in outter aggregation is group-by item of inner aggregation. We can only do elimination on very few conditions.
For example, given

select sum(at) from (select count(a) as at, b as bt from t group by a, b) as t group by bt

AggFunc count(a) is always 1 and sum(at) is distinct number of a. We only consider the simplest case here because for other cases, there is no common rule to indicate the semantics.

// * true / substituted exprs if the substitution happens.
// * false / original exprs if substitution doesn't happen.
// * false / nil if predicate is not nil and it failed.
func aggSubstitute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function shares a lot of logic with exprSubstitute, is it possible to merge them together? or using ColumnSubstitute instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

case inner.Name == ast.AggFuncBitOr && outer.Name == ast.AggFuncBitOr:
case inner.Name == ast.AggFuncBitXor && outer.Name == ast.AggFuncBitXor:
case inner.Name == ast.AggFuncGroupConcat && outer.Name == ast.AggFuncGroupConcat:
if inner.HasDistinct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first_row / min / max / BitAnd / BitOr / BitXor is applicable even if inner.HasDistinct is true? Also, group_concat cannot be simply merged because it may contain ORDER BY and SEPARATOR keyword inside the group_concat?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first_row, min, max, bit_and, bit_or accept HasDistinct because they're idempotent. But bit_xor is not, I've fixed it, thanks!

@lzmhhh123 lzmhhh123 self-requested a review July 31, 2019 09:58
@zz-jason
Copy link
Member

zz-jason commented Aug 3, 2019

@bb7133 friendly ping, any update?

@bb7133
Copy link
Member Author

bb7133 commented Aug 5, 2019

I will update the code this week @zz-jason

Thanks for the reviews @eurekaka

@bb7133
Copy link
Member Author

bb7133 commented Aug 14, 2019

Still working on this PR

@bb7133
Copy link
Member Author

bb7133 commented Aug 23, 2019

Close this PR temporarily because of the PR quota.

@bb7133 bb7133 closed this Aug 23, 2019
@bb7133 bb7133 reopened this Aug 31, 2019
@bb7133 bb7133 force-pushed the bb7133/enhance_agg_elim branch 3 times, most recently from 3826745 to 60f6568 Compare September 4, 2019 09:40
@bb7133
Copy link
Member Author

bb7133 commented Sep 4, 2019

/run-all-tests

@francis0407 francis0407 self-requested a review September 9, 2019 04:58
@bb7133 bb7133 force-pushed the bb7133/enhance_agg_elim branch from a51548b to 5522bfd Compare September 12, 2019 14:15
@@ -1647,6 +1647,126 @@ func (s *testPlanSuite) TestAggPrune(c *C) {
sql: "select a, count(distinct a, b) from t group by a",
best: "DataScan(t)->Projection",
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better move these tests to json file now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the explaintest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not explain test, you can refer to #12091

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed, PTAL

}
// Generalize isGbItem: aggregate function can also be treated as group by item.
// For example, select max(at), sum(bt) from (select a as at, count(b) as bt from t group by a) as tt,
// in such case column `at` is an alias of `max(a)`, and `max(a)` is equivalent of `a`, which is the group by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column at is an alias of max(a), and max(a) is equivalent of a

Why is at an alias of max(a) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a typo, I've fixed the comment as following:

...
// For example, select max(at), sum(bt) from (select max(a) as at, count(b) as bt from t group by a) as tt,
// in such case column `at` is an alias of `max(a)`, and `max(a)` is equivalent of `a`(`firstrow(a)`), which is the group by
// item of inner aggregation.

Thanks.


// tryToCombineAggFunc checks the types of inner/outer aggregate function and check if they can be combined as one based on their semantics.
// for example, since max(max(PARTIAL)) can be combined as max(TOTAL), we can combine inner max() and outer max() as a final max()
// `innerIsGroup` indicates if the inner aggregate function is aggregating group-by items(values are de-duplicated).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

innerIsGroup -> innerIsGbItem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

// Generalize isGbItem: aggregate function can also be treated as group by item.
// For example, select max(at), sum(bt) from (select a as at, count(b) as bt from t group by a) as tt,
// in such case column `at` is an alias of `max(a)`, and `max(a)` is equivalent of `a`, which is the group by
// item of inner aggregation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add . at the end of the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

return nil
}

newFuns := make([]*aggregation.AggFuncDesc, len(ptn.outer.AggFuncs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newFuncs seems better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@zz-jason zz-jason removed the contribution This PR is from a community contributor. label Sep 25, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Sep 25, 2019

@lamxTyler, @eurekaka, @lzmhhh123, PTAL.

2 similar comments
@sre-bot
Copy link
Contributor

sre-bot commented Sep 27, 2019

@lamxTyler, @eurekaka, @lzmhhh123, PTAL.

@sre-bot
Copy link
Contributor

sre-bot commented Oct 1, 2019

@lamxTyler, @eurekaka, @lzmhhh123, PTAL.

@bb7133 bb7133 force-pushed the bb7133/enhance_agg_elim branch from 6f2788c to e2ce80d Compare October 3, 2019 08:19
@bb7133
Copy link
Member Author

bb7133 commented Oct 3, 2019

Hi @francis0407 , thanks for the reviews, PTAL

@sre-bot
Copy link
Contributor

sre-bot commented Oct 5, 2019

@lamxTyler, @eurekaka, @lzmhhh123, PTAL.

1 similar comment
@sre-bot
Copy link
Contributor

sre-bot commented Oct 7, 2019

@lamxTyler, @eurekaka, @lzmhhh123, PTAL.

@SunRunAway SunRunAway removed their request for review October 8, 2019 07:32
@XuHuaiyu XuHuaiyu removed their request for review October 8, 2019 08:41
@qw4990 qw4990 removed their request for review October 9, 2019 08:47
@Reminiscent Reminiscent removed their request for review October 9, 2019 09:01
@sre-bot
Copy link
Contributor

sre-bot commented Oct 10, 2019

@lamxTyler, @eurekaka, @lzmhhh123, PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/expression sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants