-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17075][SQL][followup] Add Estimation of Constant Literal #17446
Conversation
Ideally, our optimizer rule |
Test build #75289 has finished for PR 17446 at commit
|
Test build #75291 has finished for PR 17446 at commit
|
The logic is straightforward. LGTM. |
Thank you! @ron8hu |
*/ | ||
def evaluateLiteral(literal: Literal): Option[Double] = { | ||
literal match { | ||
case Literal(null, _) => Some(0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handling null
in filter estimation is not trivial, e.g. null and false
returns false, null and true
returns true. If we estimate cond && null
, we will report 0 selectivity, which is wrong.
I think we should eliminate null literal in optimizer when it's involved in filter condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let me close it.
not NULL = NULL
NULL or false = NULL
NULL or true = true
NULL or NULL = NULL
NULL and false = false
NULL and true = NULL
NULL and NULL = NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait... It behaves correctly, right?
Sorry I may miss some context, but I prefer enhancing |
@wzhfy |
Since this PR does not correctly handle the cases like |
|
LGTM |
Test build #75348 has finished for PR 17446 at commit
|
Thanks! Merging to master. |
What changes were proposed in this pull request?
FalseLiteral
andTrueLiteral
should have been eliminated by optimizer ruleBooleanSimplification
, but null literals might be added by optimizer ruleNullPropagation
. For safety, our filter estimation should handle all the eligible literal cases.Our optimizer rule BooleanSimplification is unable to remove the null literal in many cases. For example,
a < 0 or null
. Thus, we need to handle null literal in filter estimation.Not
can be pushed down belowAnd
andOr
. Then, we could see two consecutiveNot
, which need to be collapsed into one. Because of the limited expression support for filter estimation, we just need to handle the caseNot(null)
for avoiding incorrect error due to the boolean operation on null. For details, see below matrix.How was this patch tested?
Added the test cases.