Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(planner): support having and scalar expression in group by for new planner #5131

Closed
wants to merge 9 commits into from

Conversation

xudong963
Copy link
Member

@xudong963 xudong963 commented May 2, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

support having and scalar expression in group by for new planner

mysql> set enable_planner_v2=1;
Query OK, 0 rows affected (0.03 sec)
Read 0 rows, 0 B in 0.002 sec., 0 rows/sec., 0 B/sec.

mysql> select sum(a) from t group by a having sum(a) = 1;
+------------+
| sum("a"_0) |
+------------+
|          1 |
+------------+
1 row in set (0.05 sec)
Read 3 rows, 12 B in 0.021 sec., 144.59 rows/sec., 578.35 B/sec.

mysql> select sum(a) from t1 group by a + 1, b;
+------------+
| sum("a"_0) |
+------------+
|          1 |
|          3 |
|          2 |
+------------+
3 rows in set (0.09 sec)
Read 3 rows, 24 B in 0.032 sec., 92.74 rows/sec., 741.91 B/sec.

Changelog

  • New Feature

Related Issues

Fixes #5120

@vercel
Copy link

vercel bot commented May 2, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
databend ✅ Ready (Inspect) Visit Preview May 4, 2022 at 3:04AM (UTC)

@xudong963 xudong963 requested review from leiysky and sundy-li May 2, 2022 06:54
@mergify
Copy link
Contributor

mergify bot commented May 2, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label May 2, 2022
@xudong963 xudong963 changed the title feat(planner): support having for new planner feat(planner): support having and scalar expression in group by for new planner May 2, 2022
@xudong963
Copy link
Member Author

The commit supports scalar expression, such as a + 1, b + 1 in group by : )

@xudong963 xudong963 requested a review from zhang2014 May 2, 2022 15:36
@zhang2014
Copy link
Member

zhang2014 commented May 3, 2022

Need some test:

SELECT sum(a) FROM t GROUP BY a HAVING sum(b) = 1;
-- Does this calculate `a + 1` twice?
SELECT COUNT() FROM t GROUP BY a + 1, b + 2 HAVING a + 1 = 3;
SELECT COUNT() FROM t GROUP BY a + 1, b + 2 HAVING b + 2 = 3;
SELECT COUNT() FROM t GROUP BY a + 1, b + 2 HAVING c = 4; -- {expect error }
SELECT a + 1 AS a, COUNT() FROM t GROUP BY a, b + 2 HAVING a + 1 = 3;
SELECT a + 1 AS a, COUNT() FROM t GROUP BY a + 1, b + 2 HAVING a + 1 = 3;

@BohuTANG
Copy link
Member

BohuTANG commented May 3, 2022

In some cases, we should integrate the stateless test for the new planner (like this PR). Do we have the plan?

@BohuTANG
Copy link
Member

BohuTANG commented May 3, 2022

In some cases, we should integrate the stateless test for the new planner (like this PR). Do we have the plan?

Oh, already working on it: #5133

@xudong963
Copy link
Member Author

In some cases, we should integrate the stateless test for the new planner (like this PR). Do we have the plan?

Oh, already working on it: #5133

Yes, @leiysky is preparing for integrating the stateless, after he is ready, I'll open a new ticket to make aggregate-related tests pass. cc @zhang2014

@BohuTANG
Copy link
Member

BohuTANG commented May 3, 2022

@mergify update

@mergify
Copy link
Contributor

mergify bot commented May 3, 2022

update

✅ Branch has been successfully updated

use crate::sql::ScalarExprRef;

#[derive(Clone)]
pub struct ExpressionPlan {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as Project operator, you should use that instead.

By the way, I don't think there is requirement for this. Aggregate plan should have produced the aggregate functions in its output, which can be referenced by the following HAVING and SELECT clauses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As leiysky suggested, will we do it in this PR or another? @xudong963

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, I took a day off yesterday.

I use ExpressionPlan to process scalar expression in group by, such as group by a+1. Its role is to bridge the schema of scan plan and aggregate plan. I didn't use the direct project operator because it's a bit strange to have two project operators in the whole plan tree, and there is no need to record column index in Expression Plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, I took a day off yesterday.

I use ExpressionPlan to process scalar expression in group by, such as group by a+1. Its role is to bridge the schema of scan plan and aggregate plan. I didn't use the direct project operator because it's a bit strange to have two project operators in the whole plan tree, and there is no need to record column index in Expression Plan.

I got you.

The point is, every expression can be evaluated on the fly with a context, so we don't need a ExpressionPlan to materialize it in the context.

A special case is projection. In projection, an expression can be assigned to a variable through expr AS "var", which makes it can be referenced by other expressions. Canonically, all the variable "var" are directly replaced by expr. But we don't like that, it's meaningless.

Our solution is, for the expressions(aliased expression in projection or aggregate function produced by GROUP BY) that can be explicitly referenced by other expressions, we will store it in the context as a column. In particular, we give the Scalar a unique identifer(i.e. column_index) and store it in BindContext as a ColumnBinding.

For the rest expressions, since most of them are not reuseable, we just store them inside the Plans where they are needed.

In your case, there is no need to pre-evaluate some expressions for GROUP BY, since they are already involved in the AggregatePlan.

After all, there is no schema in the planning phase. It's all about logical representation of relational algebra.

If you are confused by schema, I'm betting you encountered a execution issue. Try to fix it in PipelineBuilder.

@BohuTANG
Copy link
Member

BohuTANG commented May 4, 2022

There is a panic:

}] panic.file="common/functions/src/rdoc/function_doc_asset.rs" panic.line=58 panic.column=65
2022-05-04T02:07:30.669735Z ERROR databend_query::servers::mysql::writers::query_result_writer: OnQuery Error: Code: 1068, displayText = Cannot join handle from context's runtime, cause: panic.

<disabled>
[ FAIL ] - result differs with:
--- /Users/runner/work/databend/databend/tests/suites/0_stateless/06_show/06_0005_show_functions.result	2022-05-04 02:02:59.000000000 +0000
+++ /Users/runner/work/databend/databend/tests/suites/0_stateless/06_show/06_0005_show_functions.stdout	2022-05-04 02:07:30.000000000 +0000
@@ -1,8 +1,2 @@
-today	1	0		Returns current date.
-todayofmonth	1	0		Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31).
-todayofweek	1	0		Converts a date or date with time to a UInt8 number containing the number of the day of the week (Monday is 1, and Sunday is 7).
-todayofyear	1	0		Converts a date or date with time to a UInt16 number containing the number of the day of the year (1-366).
-today	1	0		Returns current date.
-todayofmonth	1	0		Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31).
-todayofweek	1	0		Converts a date or date with time to a UInt8 number containing the number of the day of the week (Monday is 1, and Sunday is 7).
-todayofyear	1	0		Converts a date or date with time to a UInt16 number containing the number of the day of the year (1-366).
+ERROR 1105 (HY000) at line 1: Code: 1068, displayText = Cannot join handle from context's runtime, cause: panic.
+ERROR 1105 (HY000) at line 2: Code: 1068, displayText = Cannot join handle from context's runtime, cause: panic.

https://github.com/datafuselabs/databend/runs/6283239742?check_suite_focus=true#step:3:7814
https://github.com/datafuselabs/databend/runs/6283239615?check_suite_focus=true#step:3:473

@sundy-li
Copy link
Member

sundy-li commented May 4, 2022

@mergify update

@mergify
Copy link
Contributor

mergify bot commented May 4, 2022

update

✅ Branch has been successfully updated

@BohuTANG
Copy link
Member

BohuTANG commented May 5, 2022

Oops, conflicts need to be resolved

@xudong963
Copy link
Member Author

I'll close the ticket because basic aggregator can't work now in the main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-review pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Todos for aggregator in new planner:
6 participants