feat(planner): support `having` and `scalar expression in group by` for new planner #5131

xudong963 · 2022-05-02T06:53:48Z

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

support having and scalar expression in group by for new planner

mysql> set enable_planner_v2=1;
Query OK, 0 rows affected (0.03 sec)
Read 0 rows, 0 B in 0.002 sec., 0 rows/sec., 0 B/sec.

mysql> select sum(a) from t group by a having sum(a) = 1;
+------------+
| sum("a"_0) |
+------------+
|          1 |
+------------+
1 row in set (0.05 sec)
Read 3 rows, 12 B in 0.021 sec., 144.59 rows/sec., 578.35 B/sec.

mysql> select sum(a) from t1 group by a + 1, b;
+------------+
| sum("a"_0) |
+------------+
|          1 |
|          3 |
|          2 |
+------------+
3 rows in set (0.09 sec)
Read 3 rows, 24 B in 0.032 sec., 92.74 rows/sec., 741.91 B/sec.

Changelog

New Feature

Related Issues

Fixes #5120

vercel · 2022-05-02T06:53:53Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
databend	✅ Ready (Inspect)	Visit Preview	May 4, 2022 at 3:04AM (UTC)

mergify · 2022-05-02T06:54:21Z

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

xudong963 · 2022-05-02T15:35:33Z

The commit supports scalar expression, such as a + 1, b + 1 in group by : )

query/src/sql/planner/plans/having.rs

query/src/sql/planner/binder/aggregate.rs

zhang2014 · 2022-05-03T00:42:47Z

Need some test:

SELECT sum(a) FROM t GROUP BY a HAVING sum(b) = 1;
-- Does this calculate `a + 1` twice?
SELECT COUNT() FROM t GROUP BY a + 1, b + 2 HAVING a + 1 = 3;
SELECT COUNT() FROM t GROUP BY a + 1, b + 2 HAVING b + 2 = 3;
SELECT COUNT() FROM t GROUP BY a + 1, b + 2 HAVING c = 4; -- {expect error }
SELECT a + 1 AS a, COUNT() FROM t GROUP BY a, b + 2 HAVING a + 1 = 3;
SELECT a + 1 AS a, COUNT() FROM t GROUP BY a + 1, b + 2 HAVING a + 1 = 3;

BohuTANG · 2022-05-03T01:06:35Z

In some cases, we should integrate the stateless test for the new planner (like this PR). Do we have the plan?

BohuTANG · 2022-05-03T04:21:11Z

In some cases, we should integrate the stateless test for the new planner (like this PR). Do we have the plan?

Oh, already working on it: #5133

xudong963 · 2022-05-03T07:08:23Z

In some cases, we should integrate the stateless test for the new planner (like this PR). Do we have the plan?

Oh, already working on it: #5133

Yes, @leiysky is preparing for integrating the stateless, after he is ready, I'll open a new ticket to make aggregate-related tests pass. cc @zhang2014

BohuTANG · 2022-05-03T07:16:22Z

@mergify update

mergify · 2022-05-03T07:16:55Z

update

✅ Branch has been successfully updated

query/src/sql/exec/mod.rs

leiysky · 2022-05-03T13:31:34Z

query/src/sql/planner/plans/expression.rs

+use crate::sql::ScalarExprRef;
+
+#[derive(Clone)]
+pub struct ExpressionPlan {


This is the same as Project operator, you should use that instead.

By the way, I don't think there is requirement for this. Aggregate plan should have produced the aggregate functions in its output, which can be referenced by the following HAVING and SELECT clauses.

As leiysky suggested, will we do it in this PR or another? @xudong963

Sorry for the late reply, I took a day off yesterday.

I use ExpressionPlan to process scalar expression in group by, such as group by a+1. Its role is to bridge the schema of scan plan and aggregate plan. I didn't use the direct project operator because it's a bit strange to have two project operators in the whole plan tree, and there is no need to record column index in Expression Plan.

Sorry for the late reply, I took a day off yesterday.

I use ExpressionPlan to process scalar expression in group by, such as group by a+1. Its role is to bridge the schema of scan plan and aggregate plan. I didn't use the direct project operator because it's a bit strange to have two project operators in the whole plan tree, and there is no need to record column index in Expression Plan.

I got you.

The point is, every expression can be evaluated on the fly with a context, so we don't need a ExpressionPlan to materialize it in the context.

A special case is projection. In projection, an expression can be assigned to a variable through expr AS "var", which makes it can be referenced by other expressions. Canonically, all the variable "var" are directly replaced by expr. But we don't like that, it's meaningless.

Our solution is, for the expressions(aliased expression in projection or aggregate function produced by GROUP BY) that can be explicitly referenced by other expressions, we will store it in the context as a column. In particular, we give the Scalar a unique identifer(i.e. column_index) and store it in BindContext as a ColumnBinding.

For the rest expressions, since most of them are not reuseable, we just store them inside the Plans where they are needed.

In your case, there is no need to pre-evaluate some expressions for GROUP BY, since they are already involved in the AggregatePlan.

After all, there is no schema in the planning phase. It's all about logical representation of relational algebra.

If you are confused by schema, I'm betting you encountered a execution issue. Try to fix it in PipelineBuilder.

BohuTANG · 2022-05-04T02:15:35Z

There is a panic:

}] panic.file="common/functions/src/rdoc/function_doc_asset.rs" panic.line=58 panic.column=65
2022-05-04T02:07:30.669735Z ERROR databend_query::servers::mysql::writers::query_result_writer: OnQuery Error: Code: 1068, displayText = Cannot join handle from context's runtime, cause: panic.

<disabled>
[ FAIL ] - result differs with:
--- /Users/runner/work/databend/databend/tests/suites/0_stateless/06_show/06_0005_show_functions.result	2022-05-04 02:02:59.000000000 +0000
+++ /Users/runner/work/databend/databend/tests/suites/0_stateless/06_show/06_0005_show_functions.stdout	2022-05-04 02:07:30.000000000 +0000
@@ -1,8 +1,2 @@
-today	1	0		Returns current date.
-todayofmonth	1	0		Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31).
-todayofweek	1	0		Converts a date or date with time to a UInt8 number containing the number of the day of the week (Monday is 1, and Sunday is 7).
-todayofyear	1	0		Converts a date or date with time to a UInt16 number containing the number of the day of the year (1-366).
-today	1	0		Returns current date.
-todayofmonth	1	0		Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31).
-todayofweek	1	0		Converts a date or date with time to a UInt8 number containing the number of the day of the week (Monday is 1, and Sunday is 7).
-todayofyear	1	0		Converts a date or date with time to a UInt16 number containing the number of the day of the year (1-366).
+ERROR 1105 (HY000) at line 1: Code: 1068, displayText = Cannot join handle from context's runtime, cause: panic.
+ERROR 1105 (HY000) at line 2: Code: 1068, displayText = Cannot join handle from context's runtime, cause: panic.

https://github.com/datafuselabs/databend/runs/6283239742?check_suite_focus=true#step:3:7814
https://github.com/datafuselabs/databend/runs/6283239615?check_suite_focus=true#step:3:473

sundy-li · 2022-05-04T03:02:57Z

@mergify update

mergify · 2022-05-04T03:03:30Z

update

✅ Branch has been successfully updated

BohuTANG · 2022-05-05T00:41:28Z

Oops, conflicts need to be resolved

xudong963 · 2022-05-05T00:51:26Z

I'll close the ticket because basic aggregator can't work now in the main branch.

feat(planner): support having for new planner

ff799bd

xudong963 requested a review from BohuTANG as a code owner May 2, 2022 06:53

databend-bot added the need-review label May 2, 2022

xudong963 requested review from leiysky and sundy-li May 2, 2022 06:54

mergify bot added the pr-feature this PR introduces a new feature to the codebase label May 2, 2022

xudong963 mentioned this pull request May 2, 2022

Todos for aggregator in new planner: #5120

Closed

5 tasks

xudong963 added 3 commits May 2, 2022 15:13

delete useless code

9e7a60a

add safe_cast_to_scalar

ba60388

support scalar expression, such as a + 1, b + 1 in group by

e25c955

xudong963 changed the title ~~feat(planner): support having for new planner~~ feat(planner): support having and scalar expression in group by for new planner May 2, 2022

xudong963 requested a review from zhang2014 May 2, 2022 15:36

leiysky reviewed May 2, 2022

View reviewed changes

query/src/sql/planner/plans/having.rs Outdated Show resolved Hide resolved

zhang2014 reviewed May 3, 2022

View reviewed changes

query/src/sql/planner/binder/aggregate.rs Outdated Show resolved Hide resolved

xudong963 added 2 commits May 3, 2022 15:08

add address, delete having

4caadb6

minor: delete having

2af166d

Merge branch 'main' into having_support

f1a283b

vercel bot deployed to Preview May 3, 2022 07:18 View deployment

xudong963 mentioned this pull request May 3, 2022

Integrate the stateless test for the new planner's aggregation #5140

Closed

zhang2014 approved these changes May 3, 2022

View reviewed changes

zhang2014 reviewed May 3, 2022

View reviewed changes

query/src/sql/exec/mod.rs Show resolved Hide resolved

Group by clause cannot contain aggregate functions

3e4b37f

leiysky reviewed May 3, 2022

View reviewed changes

Merge branch 'main' into having_support

965d87f

vercel bot deployed to Preview May 4, 2022 03:04 View deployment

xudong963 closed this May 5, 2022

xudong963 mentioned this pull request May 6, 2022

feat(planner): support having and scalar expression in group by for new planner #5200

Merged

xudong963 deleted the having_support branch May 6, 2022 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(planner): support `having` and `scalar expression in group by` for new planner #5131

feat(planner): support `having` and `scalar expression in group by` for new planner #5131

xudong963 commented May 2, 2022 •

edited

Loading

vercel bot commented May 2, 2022 •

edited

Loading

mergify bot commented May 2, 2022

xudong963 commented May 2, 2022

zhang2014 commented May 3, 2022 •

edited

Loading

BohuTANG commented May 3, 2022

BohuTANG commented May 3, 2022

xudong963 commented May 3, 2022

BohuTANG commented May 3, 2022

mergify bot commented May 3, 2022

leiysky May 3, 2022

BohuTANG May 4, 2022

xudong963 May 5, 2022

leiysky May 5, 2022

BohuTANG commented May 4, 2022 •

edited

Loading

sundy-li commented May 4, 2022

mergify bot commented May 4, 2022

BohuTANG commented May 5, 2022

xudong963 commented May 5, 2022

feat(planner): support having and scalar expression in group by for new planner #5131

feat(planner): support having and scalar expression in group by for new planner #5131

Conversation

xudong963 commented May 2, 2022 • edited Loading

Summary

Changelog

Related Issues

vercel bot commented May 2, 2022 • edited Loading

mergify bot commented May 2, 2022

xudong963 commented May 2, 2022

zhang2014 commented May 3, 2022 • edited Loading

BohuTANG commented May 3, 2022

BohuTANG commented May 3, 2022

xudong963 commented May 3, 2022

BohuTANG commented May 3, 2022

mergify bot commented May 3, 2022

✅ Branch has been successfully updated

leiysky May 3, 2022

Choose a reason for hiding this comment

BohuTANG May 4, 2022

Choose a reason for hiding this comment

xudong963 May 5, 2022

Choose a reason for hiding this comment

leiysky May 5, 2022

Choose a reason for hiding this comment

BohuTANG commented May 4, 2022 • edited Loading

sundy-li commented May 4, 2022

mergify bot commented May 4, 2022

✅ Branch has been successfully updated

BohuTANG commented May 5, 2022

xudong963 commented May 5, 2022

feat(planner): support `having` and `scalar expression in group by` for new planner #5131

feat(planner): support `having` and `scalar expression in group by` for new planner #5131

xudong963 commented May 2, 2022 •

edited

Loading

vercel bot commented May 2, 2022 •

edited

Loading

zhang2014 commented May 3, 2022 •

edited

Loading

BohuTANG commented May 4, 2022 •

edited

Loading