-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add single match join #623
Conversation
eaa055c
to
10970ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this task on. I've got a first round of questions up.
@@ -725,6 +726,18 @@ message NestedLoopJoinRel { | |||
substrait.extensions.AdvancedExtension advanced_extension = 10; | |||
} | |||
|
|||
// A single join enforces that each row is matched at once most counterpart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// A single join enforces that each row is matched at once most counterpart. | |
// A single join enforces that each row is matched with at most one counterpart. |
Rel right = 3; | ||
// optional, defaults to true (a cartesian join) | ||
Expression expression = 4; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No post_join_filter
?
|
||
## Single Join Operator | ||
|
||
The single match join works by enforcing that each row is involved in only a single match. If this violated, an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The single match join works by enforcing that each row is involved in only a single match. If this violated, an error | |
The single match join works by enforcing that each row is involved in only a single match. If this is violated, an error |
|
||
## Single Join Operator | ||
|
||
The single match join works by enforcing that each row is involved in only a single match. If this violated, an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you say "only a single match" In the protobuf you say "at most one match". Which is it? At most one? Or exactly one?
If it "at most one" then what happens if there is no match? Is this an inner join (no row is emitted) or a left join (a row is emitted with the values from the left side and nulls on the right side)?
is raised. Typical use would be to have one side's duplicates eliminated and then be pushed into a column scan on the | ||
other side. This operator is useful when converting subqueries into joins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably dense but I don't understand how this is helpful. This doesn't seem like it would be useful in eliminating duplicates (an error would be raised). Are you saying this is somehow used on the deduplicated values? How so? What does "pushed into a column scan on the other side" mean?
@@ -444,6 +444,7 @@ message Rel { | |||
HashJoinRel hash_join = 13; | |||
MergeJoinRel merge_join = 14; | |||
NestedLoopJoinRel nested_loop_join = 18; | |||
SingleJoinRel single_join = 22; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this was previously attempted as a separate join type:
substrait/proto/substrait/algebra.proto
Line 175 in 7b79437
JOIN_TYPE_SINGLE = 7; |
We should either deprecate that join type or collapse this PR into better documenting that join type. I'm uncertain which approach I'd prefer and think it hinges on the "at most one" or "exactly one" question
As noted above we already have a single join, closing this one as unneeded. |
This establishes the physical single join relation. Similar to other joins it uses an expression to compare the left and right inputs. What sets it apart from other joins is that it requires that each row be involved in at most one match. This kind of join is commonly used in unnesting arbitrary subqueries.