-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow returning an EmptyHashedRelation when a broadcast result is empty [databricks] #4256
Allow returning an EmptyHashedRelation when a broadcast result is empty [databricks] #4256
Conversation
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastHelper.scala
Outdated
Show resolved
Hide resolved
Ok |
* @param broadcastPlan - the SparkPlan to use to obtain the schema for the broadcast | ||
* batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing an entire plan just to get the schema is very heavyweight. This should simply take a schema parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed now.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastHelper.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/31xdb/scala/com/nvidia/spark/rapids/shims/v2/GpuBroadcastHashJoinExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastExchangeExec.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastExchangeExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastExchangeExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastExchangeExec.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastHelper.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastHelper.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastNestedLoopJoinExecBase.scala
Outdated
Show resolved
Hide resolved
860e35c
to
3f31a67
Compare
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
build |
...in/src/main/311+-nondb/scala/com/nvidia/spark/rapids/shims/v2/GpuBroadcastHashJoinExec.scala
Outdated
Show resolved
Hide resolved
Note that: f62167c adds That said I am not entirely sure yet of the order of things, so I am not 100% there yet. |
I've been reading more about this and I think it makes sense now. Yes Given the optimization, we are looking at the |
build |
build |
1 similar comment
build |
This is broken in databricks:
I was expecting to be able to transform the I am looking into it further. |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastExchangeExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastHelper.scala
Outdated
Show resolved
Hide resolved
build |
Thanks @jlowe, I believe I have addressed the comments |
@revans2 this is ready for another look when you get a chance. |
Yes, that is correct. I ran some tests to confirm this. |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastHelper.scala
Outdated
Show resolved
Hide resolved
build |
Signed-off-by: Alessandro Bellina abellina@nvidia.com
Closes #4134.
The PR allows the broadcast exchange to produce an
EmptyHashedRelation
or an empty array in the case of the identity broadcast, in order for AQE'sEliminateJoinToEmptyRelation
rule to be able to optimize the plan. This changes q16 massively in the way we run things, but it lets us match what the CPU does.In terms of performance, I ran this at 3TB and Q16 is now ~12 seconds which is slightly faster than the CPU (this is 3.5x faster than what we had before on the GPU). I don't see regressions with other queries.
This PR enables
isFoldableNonLitAllowed
forUnaryExprMeta
so that expressions likecast(null as bigint)
can be handled. These casts show up given an empty projection due to the AQE rule to remove the join. That said,ConstantFolding
does not re-execute as part of AQE, so they are left in the plan. Both tests I have added will generate plans with these for AQE.