-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33848][SQL] Push the UnaryExpression into (if / case) branches #30853
Conversation
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test status success |
Test build #133060 has finished for PR 30853 at commit
|
Test build #133061 has finished for PR 30853 at commit
|
Test build #133075 has finished for PR 30853 at commit
|
Retest this please. |
Test build #133087 has finished for PR 30853 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #133091 has finished for PR 30853 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Outdated
Show resolved
Hide resolved
Looks fine if the tests pass. |
...t/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PushFoldableIntoBranchesSuite.scala
Outdated
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #133153 has finished for PR 30853 at commit
|
Test build #133159 has finished for PR 30853 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @wangyum and all.
Merged to master for Apache Spark 3.2.0.
Nice, late LGTM |
@@ -542,29 +542,42 @@ object PushFoldableIntoBranches extends Rule[LogicalPlan] with PredicateHelper { | |||
|
|||
def apply(plan: LogicalPlan): LogicalPlan = plan transform { | |||
case q: LogicalPlan => q transformExpressionsUp { | |||
case a: Alias => a // Skip an alias. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alias
is not the only exception, we can't apply this optimization for Generator
as well, as the logical plan Generate
requires explicit type Generator
.
It happened many times that an optimization rule introduces bugs because it uses denylist instead of allowlist. Let's avoid similar mistakes here, and explicitly list what expressions we should support.
An initial list from my mind:
- IsNull, IsNotNull
- UnaryMathExpression
- String2StringExpression
- Cast
- BinaryComparison
- BinaryArithmetic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think BinaryExpression
also has this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but it's safer to start with an allowlist. We can extend it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changes were proposed in this pull request?
This pr push the
UnaryExpression
into (if / case) branches. The use case is:Before this pr:
After this pr:
This change can also improve this case:
spark/sql/core/src/test/resources/tpcds/q62.sql
Lines 5 to 22 in a78d6ce
Why are the changes needed?
Improve query performance.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test.