-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34708][SQL] Code-gen for left semi/anti broadcast nested loop join (build right side) #31874
Conversation
cc @cloud-fan and @maropu could you help take a look when you have time, thanks. |
retest this please |
Closed & reopened the PR to trigger test rerun. |
val arrayIndex = ctx.freshName("arrayIndex") | ||
|
||
s""" | ||
|boolean $findMatchedRow = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|boolean $findMatchedRow = false; | |
|boolean $foundMatch = false; |
Maybe we can be consistent with non-codegen naming in BroadcastNestedLoopJoinExec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, updated.
override def supportCodegen: Boolean = { | ||
joinType.isInstanceOf[InnerLike] | ||
override def supportCodegen: Boolean = (joinType, buildSide) match { | ||
case (_: InnerLike, _) | (LeftSemi, BuildRight) | (LeftAnti, BuildRight) => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: (LeftSemi | LeftAnti, BuildRight)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - updated.
Just FYI - the unit test failure here is legit, and I found there's a bug around LIMIT codegen - https://issues.apache.org/jira/browse/SPARK-34796 . Will submit a PR to fix that first. |
if (buildRowArray.nonEmpty == exists) { | ||
// Return streamed side if join condition is empty and | ||
// 1. build side is non-empty for LeftSemi join | ||
// or | ||
// 2. build side is empty for LeftAnti join. | ||
s""" | ||
|$numOutput.add(1); | ||
|${consume(ctx, input)} | ||
""".stripMargin | ||
} else { | ||
// Return nothing if join condition is empty and | ||
// 1. build side is empty for LeftSemi join | ||
// or | ||
// 2. build side is non-empty for LeftAnti join. | ||
"" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case, we don't need to add a mutable state for buildRowArrayTerm
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already added inside prepareBroadcast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my qestion is wrong. I meant, is Ah, nvm, it looks fine.buildRowArrayTerm
referenced in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments and the PR is ready to review again, thanks.
override def supportCodegen: Boolean = { | ||
joinType.isInstanceOf[InnerLike] | ||
override def supportCodegen: Boolean = (joinType, buildSide) match { | ||
case (_: InnerLike, _) | (LeftSemi, BuildRight) | (LeftAnti, BuildRight) => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - updated.
val arrayIndex = ctx.freshName("arrayIndex") | ||
|
||
s""" | ||
|boolean $findMatchedRow = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, updated.
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #136300 has finished for PR 31874 at commit
|
Refer to this link for build results (access rights to CI server needed): |
thanks, merging to master! |
Thank you @cloud-fan, @maropu and @linzebing for review! |
What changes were proposed in this pull request?
This PR is to add code-gen support for left semi / left anti BroadcastNestedLoopJoin (build side is right side). The execution code path for build left side cannot fit into whole stage code-gen framework, so only add the code-gen for build right side here.
Reference: the iterator (non-code-gen) code path is
BroadcastNestedLoopJoinExec.leftExistenceJoin()
withBuildRight
.Why are the changes needed?
Improve query CPU performance.
Tested with a simple query:
Seeing 5x run time improvement:
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Changed existing unit test in
ExistenceJoinSuite.scala
to cover all code paths:Added unit test in
WholeStageCodegenSuite.scala
to make sure code-gen for broadcast nested loop join is taking effect, and test for multiple join case as well.Example query:
Example generated code (
bnlj_doConsume_0
method):This is for left semi join. The generated code for left anti join is mostly to be same as here, except L55 to be
if (bnlj_findMatchedRow_0 == false) {
.