-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49306][SQL] Create new SQL functions 'zeroifnull' and 'nullifzero' #47817
Conversation
cc @HyukjinKwon @MaxGekk can one of you guys maybe help with this one :) |
@@ -384,6 +385,7 @@ object FunctionRegistry { | |||
expression[Rand]("random", true, Some("3.0.0")), | |||
expression[Randn]("randn"), | |||
expression[Stack]("stack"), | |||
expression[ZeroIfNull]("zeroifnull"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably add them into functions.scala and functions.py. could be done in a separate PR tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will add DataFrame support in a separate PR :)
@@ -285,6 +285,48 @@ class MiscFunctionsSuite extends QueryTest with SharedSparkSession { | |||
assert(df.selectExpr("random(1)").collect() != null) | |||
assert(df.select(random(lit(1))).collect() != null) | |||
} | |||
|
|||
test("SPARK-49306 nullifzero and zeroifnull functions") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test suite is for misc functions mostly. Could you move expression tests to ConditionalExpressionSuite
and end-to-end tests to conditional-functions.sql
, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it turns out all these were end to end tests so I just moved them all to conditional-functions.sql
.
respond to code review comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MaxGekk @HyukjinKwon for your reviews, please take another look.
@@ -285,6 +285,48 @@ class MiscFunctionsSuite extends QueryTest with SharedSparkSession { | |||
assert(df.selectExpr("random(1)").collect() != null) | |||
assert(df.select(random(lit(1))).collect() != null) | |||
} | |||
|
|||
test("SPARK-49306 nullifzero and zeroifnull functions") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it turns out all these were end to end tests so I just moved them all to conditional-functions.sql
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
+1, LGTM. Merging to master. |
…l' and 'nullifzero' ### What changes were proposed in this pull request? In #47817 we added new SQL functions `zeroifnull` and `nullifzero`. In this PR we add Scala and Python DataFrame API endpoints for them. For example, in Scala: ``` var df = Seq((0)).toDF("a") df.selectExpr("nullifzero(0)").collect() > null df.select(nullifzero(lit(0))).collect() > null df.selectExpr("nullifzero(a)").collect() > null df.select(nullifzero(lit(5))).collect() > 5 df = Seq[(Integer)]((null)).toDF("a") df.selectExpr("zeroifnull(null)").collect() > 5 df.select(nullifzero(lit(null))).collect() > 0 df.selectExpr("zeroifnull(a)").collect() > 0 df.select(zeroifnull(lit(5))) > 5 ``` ### Why are the changes needed? This improves DataFrame parity with the SQL API. ### Does this PR introduce _any_ user-facing change? Yes, see above. ### How was this patch tested? This PR adds unit test coverage. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47851 from dtenedor/dataframe-zeroifnull. Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
…l' and 'nullifzero' ### What changes were proposed in this pull request? In apache/spark#47817 we added new SQL functions `zeroifnull` and `nullifzero`. In this PR we add Scala and Python DataFrame API endpoints for them. For example, in Scala: ``` var df = Seq((0)).toDF("a") df.selectExpr("nullifzero(0)").collect() > null df.select(nullifzero(lit(0))).collect() > null df.selectExpr("nullifzero(a)").collect() > null df.select(nullifzero(lit(5))).collect() > 5 df = Seq[(Integer)]((null)).toDF("a") df.selectExpr("zeroifnull(null)").collect() > 5 df.select(nullifzero(lit(null))).collect() > 0 df.selectExpr("zeroifnull(a)").collect() > 0 df.select(zeroifnull(lit(5))) > 5 ``` ### Why are the changes needed? This improves DataFrame parity with the SQL API. ### Does this PR introduce _any_ user-facing change? Yes, see above. ### How was this patch tested? This PR adds unit test coverage. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47851 from dtenedor/dataframe-zeroifnull. Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
…ero' ### What changes were proposed in this pull request? This PR creates new SQL functions `zeroifnull` and `nullifzero`. These are equivalent to `nvl(input, 0)` and `if(input = 0, null, input)`, respectively. The analyzer performs these replacements inline in this PR to take advantage of existing query optimizations for `nvl` and `if` expressions. For example: ``` > select zeroifnull(null); 0 > select zeroifnull(2); 2 > select nullifzero(0); null > select nullifzero(2); 2 ``` ### Why are the changes needed? We are adding these functions for compatibility with other systems which also have these SQL functions. ### Does this PR introduce _any_ user-facing change? Yes, see above. ### How was this patch tested? This PR adds unit test coverage. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47817 from dtenedor/zeroifnull. Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
…l' and 'nullifzero' ### What changes were proposed in this pull request? In apache#47817 we added new SQL functions `zeroifnull` and `nullifzero`. In this PR we add Scala and Python DataFrame API endpoints for them. For example, in Scala: ``` var df = Seq((0)).toDF("a") df.selectExpr("nullifzero(0)").collect() > null df.select(nullifzero(lit(0))).collect() > null df.selectExpr("nullifzero(a)").collect() > null df.select(nullifzero(lit(5))).collect() > 5 df = Seq[(Integer)]((null)).toDF("a") df.selectExpr("zeroifnull(null)").collect() > 5 df.select(nullifzero(lit(null))).collect() > 0 df.selectExpr("zeroifnull(a)").collect() > 0 df.select(zeroifnull(lit(5))) > 5 ``` ### Why are the changes needed? This improves DataFrame parity with the SQL API. ### Does this PR introduce _any_ user-facing change? Yes, see above. ### How was this patch tested? This PR adds unit test coverage. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47851 from dtenedor/dataframe-zeroifnull. Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
…ero' ### What changes were proposed in this pull request? This PR creates new SQL functions `zeroifnull` and `nullifzero`. These are equivalent to `nvl(input, 0)` and `if(input = 0, null, input)`, respectively. The analyzer performs these replacements inline in this PR to take advantage of existing query optimizations for `nvl` and `if` expressions. For example: ``` > select zeroifnull(null); 0 > select zeroifnull(2); 2 > select nullifzero(0); null > select nullifzero(2); 2 ``` ### Why are the changes needed? We are adding these functions for compatibility with other systems which also have these SQL functions. ### Does this PR introduce _any_ user-facing change? Yes, see above. ### How was this patch tested? This PR adds unit test coverage. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47817 from dtenedor/zeroifnull. Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
…l' and 'nullifzero' ### What changes were proposed in this pull request? In apache#47817 we added new SQL functions `zeroifnull` and `nullifzero`. In this PR we add Scala and Python DataFrame API endpoints for them. For example, in Scala: ``` var df = Seq((0)).toDF("a") df.selectExpr("nullifzero(0)").collect() > null df.select(nullifzero(lit(0))).collect() > null df.selectExpr("nullifzero(a)").collect() > null df.select(nullifzero(lit(5))).collect() > 5 df = Seq[(Integer)]((null)).toDF("a") df.selectExpr("zeroifnull(null)").collect() > 5 df.select(nullifzero(lit(null))).collect() > 0 df.selectExpr("zeroifnull(a)").collect() > 0 df.select(zeroifnull(lit(5))) > 5 ``` ### Why are the changes needed? This improves DataFrame parity with the SQL API. ### Does this PR introduce _any_ user-facing change? Yes, see above. ### How was this patch tested? This PR adds unit test coverage. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47851 from dtenedor/dataframe-zeroifnull. Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
This PR creates new SQL functions
zeroifnull
andnullifzero
.These are equivalent to
nvl(input, 0)
andif(input = 0, null, input)
, respectively. The analyzer performs these replacements inline in this PR to take advantage of existing query optimizations fornvl
andif
expressions.For example:
Why are the changes needed?
We are adding these functions for compatibility with other systems which also have these SQL functions.
Does this PR introduce any user-facing change?
Yes, see above.
How was this patch tested?
This PR adds unit test coverage.
Was this patch authored or co-authored using generative AI tooling?
No.