Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(planner): support cross join #5715

Merged
merged 8 commits into from
Jun 1, 2022
Merged

Conversation

xudong963
Copy link
Member

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

  1. Minor refactor for hash join
  2. support cross join

Changelog

  • New Feature

Related Issues

Fixes #5499

@vercel
Copy link

vercel bot commented Jun 1, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Jun 1, 2022 at 1:26PM (UTC)

@mergify
Copy link
Contributor

mergify bot commented Jun 1, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Jun 1, 2022
Comment on lines 38 to 41
pub struct LogicalJoin {
pub left_conditions: Vec<Scalar>,
pub right_conditions: Vec<Scalar>,
pub join_type: JoinType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to introduce another struct CrossJoin.

The predicate of inner join and outer join are different.

We can split conditions into left_conditions and right_conditions for inner join, but for outer join, we have to keep the other_conditions too.

And for CrossJoin, we will lift the predicates into a Filter for push down.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can split conditions into left_conditions and right_conditions for inner join, but for outer join, we have to keep the other_conditions too.

I don't particularly understand the difference in predicates here, why the outer join predicate needs to be put into other_conditions, I understand that only the non-equi-predicate needs to be put into
other_conditions, and then wrap a filter plan

I'm going to implement outer join like this: when hash join probe, if the corresponding key is not found in hash table, generate null data block and merge with data block in build table.

And for CrossJoin, we will lift the predicates into a Filter for push down.

Isn't cross join supposed to have no predicate? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly understand the difference in predicates here, why the outer join predicate needs to be put into other_conditions, I understand that only the non-equi-predicate needs to be put into
other_conditions, and then wrap a filter plan

Consider the case:

CREATE TABLE t (a INT, b INT);
INSERT INTO t VALUES(1, 2);
SELECT * FROM t LEFT JOIN t t1 ON t.a > t1.a;

You cannot literally move the other_conditions out of the join operator, because of the outer join semantic.

This is a tricky part of outer join, we can just reject the outer join like this. But we still need to take care of the difference between inner join and outer join.

And for CrossJoin, we will lift the predicates into a Filter for push down.

What I mean here is, if we distinguish between InnerJoin and CrossJoin in type-level, then it's possible to transform CrossJoin into InnerJoin in a more clear way.

A struct CrossJoin is not necessary, you can just treat CrossJoin as a InnerJoin as current implementation and distinguish between them with JoinType. But we should keep the InnerJoin struct just as I explained above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot literally move the other_conditions out of the join operator, because of the outer join semantic.

I got you.

Let's align the results of our discussion:

  1. Keep the struct LogicalInnerJoin, only add JoinType to distinguish inner join, cross join, semi join, anti join.
  2. Add a new struct LogicalOuterJoin to process outer join, the struct will add other_conditions in it to process the case you mentioned above(we can't wrap a filter plan to process other_conditions for outer join).

@leiysky
Copy link
Contributor

leiysky commented Jun 1, 2022

You can implement this as select * from t inner join t1 on 1 = 1, so you don't need to write cross join related logic.

@xudong963 xudong963 requested a review from leiysky June 1, 2022 07:01
@xudong963
Copy link
Member Author

You can implement this as select * from t inner join t1 on 1 = 1, so you don't need to write cross join related logic.

In fact, cross join related logic isn't much, I prefer to keep the logic not to do a conversion for cross join to inner join.

If we do this conversion, we need to construct Scalar or Expression for 1=1 in a suitable place.

If in the binder phase, we need to construct ConstantExpr for 1 to get left_conditions and right_conditions.

If in the pipeline build phase, we need to construct Expressions for 1, getting build_expressions and probe_expressions.

Then in hash join we need to construct build key and probe key, where there may be a potential performance loss.

But if we handle cross join directly in hash join based on JoinType, we will not have the above process and will directly merge the data blocks.

Copy link
Contributor

@leiysky leiysky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BohuTANG
Copy link
Member

BohuTANG commented Jun 1, 2022

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Jun 1, 2022

update

✅ Branch has been successfully updated

@mergify mergify bot merged commit 887117c into databendlabs:main Jun 1, 2022
@xudong963 xudong963 deleted the cross_join branch June 1, 2022 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-review pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: support cross join
5 participants