-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(planner): support having
and scalar expression in group by
for new planner
#5131
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Thanks for the contribution! Please review the labels and make any necessary changes. |
having
and scalar expression in group by
for new planner
The commit supports scalar expression, such as a + 1, b + 1 in group by : ) |
Need some test: SELECT sum(a) FROM t GROUP BY a HAVING sum(b) = 1;
-- Does this calculate `a + 1` twice?
SELECT COUNT() FROM t GROUP BY a + 1, b + 2 HAVING a + 1 = 3;
SELECT COUNT() FROM t GROUP BY a + 1, b + 2 HAVING b + 2 = 3;
SELECT COUNT() FROM t GROUP BY a + 1, b + 2 HAVING c = 4; -- {expect error }
SELECT a + 1 AS a, COUNT() FROM t GROUP BY a, b + 2 HAVING a + 1 = 3;
SELECT a + 1 AS a, COUNT() FROM t GROUP BY a + 1, b + 2 HAVING a + 1 = 3; |
In some cases, we should integrate the stateless test for the new planner (like this PR). Do we have the plan? |
Oh, already working on it: #5133 |
Yes, @leiysky is preparing for integrating the stateless, after he is ready, I'll open a new ticket to make aggregate-related tests pass. cc @zhang2014 |
@mergify update |
✅ Branch has been successfully updated |
use crate::sql::ScalarExprRef; | ||
|
||
#[derive(Clone)] | ||
pub struct ExpressionPlan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as Project
operator, you should use that instead.
By the way, I don't think there is requirement for this. Aggregate plan should have produced the aggregate functions in its output, which can be referenced by the following HAVING
and SELECT
clauses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As leiysky suggested, will we do it in this PR or another? @xudong963
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply, I took a day off yesterday.
I use ExpressionPlan
to process scalar expression in group by, such as group by a+1. Its role is to bridge the schema of scan plan and aggregate plan. I didn't use the direct project operator because it's a bit strange to have two project operators in the whole plan tree, and there is no need to record column index in Expression Plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply, I took a day off yesterday.
I use
ExpressionPlan
to process scalar expression in group by, such as group by a+1. Its role is to bridge the schema of scan plan and aggregate plan. I didn't use the direct project operator because it's a bit strange to have two project operators in the whole plan tree, and there is no need to record column index in Expression Plan.
I got you.
The point is, every expression can be evaluated on the fly with a context, so we don't need a ExpressionPlan
to materialize it in the context.
A special case is projection. In projection, an expression can be assigned to a variable through expr AS "var"
, which makes it can be referenced by other expressions. Canonically, all the variable "var"
are directly replaced by expr
. But we don't like that, it's meaningless.
Our solution is, for the expressions(aliased expression in projection or aggregate function produced by GROUP BY
) that can be explicitly referenced by other expressions, we will store it in the context as a column. In particular, we give the Scalar
a unique identifer(i.e. column_index
) and store it in BindContext
as a ColumnBinding
.
For the rest expressions, since most of them are not reuseable, we just store them inside the Plan
s where they are needed.
In your case, there is no need to pre-evaluate some expressions for GROUP BY
, since they are already involved in the AggregatePlan
.
After all, there is no schema
in the planning phase. It's all about logical representation of relational algebra.
If you are confused by schema
, I'm betting you encountered a execution issue. Try to fix it in PipelineBuilder
.
There is a panic:
https://github.com/datafuselabs/databend/runs/6283239742?check_suite_focus=true#step:3:7814 |
@mergify update |
✅ Branch has been successfully updated |
Oops, conflicts need to be resolved |
I'll close the ticket because basic aggregator can't work now in the main branch. |
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
support having and scalar expression in group by for new planner
Changelog
Related Issues
Fixes #5120