-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(planner): support cross join #5715
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Thanks for the contribution! Please review the labels and make any necessary changes. |
pub struct LogicalJoin { | ||
pub left_conditions: Vec<Scalar>, | ||
pub right_conditions: Vec<Scalar>, | ||
pub join_type: JoinType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to introduce another struct CrossJoin
.
The predicate of inner join and outer join are different.
We can split conditions into left_conditions
and right_conditions
for inner join, but for outer join, we have to keep the other_conditions
too.
And for CrossJoin
, we will lift the predicates into a Filter
for push down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can split conditions into
left_conditions
andright_conditions
for inner join, but for outer join, we have to keep theother_conditions
too.
I don't particularly understand the difference in predicates here, why the outer join predicate needs to be put into other_conditions
, I understand that only the non-equi-predicate
needs to be put into
other_conditions
, and then wrap a filter plan
I'm going to implement outer join like this: when hash join probe, if the corresponding key is not found in hash table, generate null data block and merge with data block in build table.
And for
CrossJoin
, we will lift the predicates into aFilter
for push down.
Isn't cross join supposed to have no predicate? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't particularly understand the difference in predicates here, why the outer join predicate needs to be put into
other_conditions
, I understand that only thenon-equi-predicate
needs to be put into
other_conditions
, and then wrap a filter plan
Consider the case:
CREATE TABLE t (a INT, b INT);
INSERT INTO t VALUES(1, 2);
SELECT * FROM t LEFT JOIN t t1 ON t.a > t1.a;
You cannot literally move the other_conditions
out of the join operator, because of the outer join semantic.
This is a tricky part of outer join, we can just reject the outer join like this. But we still need to take care of the difference between inner join and outer join.
And for
CrossJoin
, we will lift the predicates into aFilter
for push down.
What I mean here is, if we distinguish between InnerJoin
and CrossJoin
in type-level, then it's possible to transform CrossJoin
into InnerJoin
in a more clear way.
A struct CrossJoin
is not necessary, you can just treat CrossJoin
as a InnerJoin
as current implementation and distinguish between them with JoinType
. But we should keep the InnerJoin
struct just as I explained above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot literally move the
other_conditions
out of the join operator, because of the outer join semantic.
I got you.
Let's align the results of our discussion:
- Keep the struct
LogicalInnerJoin
, only addJoinType
to distinguish inner join, cross join, semi join, anti join. - Add a new struct
LogicalOuterJoin
to process outer join, the struct will addother_conditions
in it to process the case you mentioned above(we can't wrap a filter plan to processother_conditions
for outer join).
You can implement this as |
In fact, cross join related logic isn't much, I prefer to keep the logic not to do a conversion for cross join to inner join. If we do this conversion, we need to construct If in the binder phase, we need to construct If in the pipeline build phase, we need to construct Expressions for 1, getting Then in hash join we need to construct build key and probe key, where there may be a potential performance loss. But if we handle cross join directly in hash join based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mergify update |
✅ Branch has been successfully updated |
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
Changelog
Related Issues
Fixes #5499