Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49306][SQL] Create new SQL functions 'zeroifnull' and 'nullifzero' #47817

Closed
wants to merge 3 commits into from

Conversation

dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Aug 20, 2024

What changes were proposed in this pull request?

This PR creates new SQL functions zeroifnull and nullifzero.

These are equivalent to nvl(input, 0) and if(input = 0, null, input), respectively. The analyzer performs these replacements inline in this PR to take advantage of existing query optimizations for nvl and if expressions.

For example:

> select zeroifnull(null);
 0
> select zeroifnull(2);
 2

> select nullifzero(0);
 null
> select nullifzero(2);
 2

Why are the changes needed?

We are adding these functions for compatibility with other systems which also have these SQL functions.

Does this PR introduce any user-facing change?

Yes, see above.

How was this patch tested?

This PR adds unit test coverage.

Was this patch authored or co-authored using generative AI tooling?

No.

commit
@dtenedor
Copy link
Contributor Author

cc @HyukjinKwon @MaxGekk can one of you guys maybe help with this one :)

@github-actions github-actions bot added the SQL label Aug 20, 2024
@dtenedor dtenedor changed the title [SPARK-49306][SQL] Create SQL function aliases for 'zeroifnull' and 'nullifzero' [SPARK-49306][SQL] Create new SQL functions 'zeroifnull' and 'nullifzero' Aug 20, 2024
@@ -384,6 +385,7 @@ object FunctionRegistry {
expression[Rand]("random", true, Some("3.0.0")),
expression[Randn]("randn"),
expression[Stack]("stack"),
expression[ZeroIfNull]("zeroifnull"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably add them into functions.scala and functions.py. could be done in a separate PR tho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will add DataFrame support in a separate PR :)

@@ -285,6 +285,48 @@ class MiscFunctionsSuite extends QueryTest with SharedSparkSession {
assert(df.selectExpr("random(1)").collect() != null)
assert(df.select(random(lit(1))).collect() != null)
}

test("SPARK-49306 nullifzero and zeroifnull functions") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test suite is for misc functions mostly. Could you move expression tests to ConditionalExpressionSuite and end-to-end tests to conditional-functions.sql, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it turns out all these were end to end tests so I just moved them all to conditional-functions.sql.

respond to code review comments
Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MaxGekk @HyukjinKwon for your reviews, please take another look.

@@ -285,6 +285,48 @@ class MiscFunctionsSuite extends QueryTest with SharedSparkSession {
assert(df.selectExpr("random(1)").collect() != null)
assert(df.select(random(lit(1))).collect() != null)
}

test("SPARK-49306 nullifzero and zeroifnull functions") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it turns out all these were end to end tests so I just moved them all to conditional-functions.sql.

Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dtenedor dtenedor requested a review from MaxGekk August 21, 2024 23:29
@MaxGekk
Copy link
Member

MaxGekk commented Aug 22, 2024

+1, LGTM. Merging to master.
Thank you, @dtenedor and @HyukjinKwon @allisonwang-db for review.

@MaxGekk MaxGekk closed this in ab3cd91 Aug 22, 2024
MaxGekk pushed a commit that referenced this pull request Aug 28, 2024
…l' and 'nullifzero'

### What changes were proposed in this pull request?

In #47817 we added new SQL functions `zeroifnull` and `nullifzero`.

In this PR we add Scala and Python DataFrame API endpoints for them.

For example, in Scala:

```
var df = Seq((0)).toDF("a")
df.selectExpr("nullifzero(0)").collect()
> null
df.select(nullifzero(lit(0))).collect()
> null

df.selectExpr("nullifzero(a)").collect()
> null
df.select(nullifzero(lit(5))).collect()
> 5

df = Seq[(Integer)]((null)).toDF("a")
df.selectExpr("zeroifnull(null)").collect()
> 5
df.select(nullifzero(lit(null))).collect()
> 0

df.selectExpr("zeroifnull(a)").collect()
> 0
df.select(zeroifnull(lit(5)))
> 5
```

### Why are the changes needed?

This improves DataFrame parity with the SQL API.

### Does this PR introduce _any_ user-facing change?

Yes, see above.

### How was this patch tested?

This PR adds unit test coverage.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47851 from dtenedor/dataframe-zeroifnull.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
a0x8o added a commit to a0x8o/spark that referenced this pull request Aug 28, 2024
…l' and 'nullifzero'

### What changes were proposed in this pull request?

In apache/spark#47817 we added new SQL functions `zeroifnull` and `nullifzero`.

In this PR we add Scala and Python DataFrame API endpoints for them.

For example, in Scala:

```
var df = Seq((0)).toDF("a")
df.selectExpr("nullifzero(0)").collect()
> null
df.select(nullifzero(lit(0))).collect()
> null

df.selectExpr("nullifzero(a)").collect()
> null
df.select(nullifzero(lit(5))).collect()
> 5

df = Seq[(Integer)]((null)).toDF("a")
df.selectExpr("zeroifnull(null)").collect()
> 5
df.select(nullifzero(lit(null))).collect()
> 0

df.selectExpr("zeroifnull(a)").collect()
> 0
df.select(zeroifnull(lit(5)))
> 5
```

### Why are the changes needed?

This improves DataFrame parity with the SQL API.

### Does this PR introduce _any_ user-facing change?

Yes, see above.

### How was this patch tested?

This PR adds unit test coverage.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47851 from dtenedor/dataframe-zeroifnull.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
…ero'

### What changes were proposed in this pull request?

This PR creates new SQL functions `zeroifnull` and `nullifzero`.

These are equivalent to `nvl(input, 0)` and `if(input = 0, null, input)`, respectively. The analyzer performs these replacements inline in this PR to take advantage of existing query optimizations for `nvl` and `if` expressions.

For example:

```
> select zeroifnull(null);
 0
> select zeroifnull(2);
 2

> select nullifzero(0);
 null
> select nullifzero(2);
 2
```

### Why are the changes needed?

We are adding these functions for compatibility with other systems which also have these SQL functions.

### Does this PR introduce _any_ user-facing change?

Yes, see above.

### How was this patch tested?

This PR adds unit test coverage.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47817 from dtenedor/zeroifnull.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
…l' and 'nullifzero'

### What changes were proposed in this pull request?

In apache#47817 we added new SQL functions `zeroifnull` and `nullifzero`.

In this PR we add Scala and Python DataFrame API endpoints for them.

For example, in Scala:

```
var df = Seq((0)).toDF("a")
df.selectExpr("nullifzero(0)").collect()
> null
df.select(nullifzero(lit(0))).collect()
> null

df.selectExpr("nullifzero(a)").collect()
> null
df.select(nullifzero(lit(5))).collect()
> 5

df = Seq[(Integer)]((null)).toDF("a")
df.selectExpr("zeroifnull(null)").collect()
> 5
df.select(nullifzero(lit(null))).collect()
> 0

df.selectExpr("zeroifnull(a)").collect()
> 0
df.select(zeroifnull(lit(5)))
> 5
```

### Why are the changes needed?

This improves DataFrame parity with the SQL API.

### Does this PR introduce _any_ user-facing change?

Yes, see above.

### How was this patch tested?

This PR adds unit test coverage.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47851 from dtenedor/dataframe-zeroifnull.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…ero'

### What changes were proposed in this pull request?

This PR creates new SQL functions `zeroifnull` and `nullifzero`.

These are equivalent to `nvl(input, 0)` and `if(input = 0, null, input)`, respectively. The analyzer performs these replacements inline in this PR to take advantage of existing query optimizations for `nvl` and `if` expressions.

For example:

```
> select zeroifnull(null);
 0
> select zeroifnull(2);
 2

> select nullifzero(0);
 null
> select nullifzero(2);
 2
```

### Why are the changes needed?

We are adding these functions for compatibility with other systems which also have these SQL functions.

### Does this PR introduce _any_ user-facing change?

Yes, see above.

### How was this patch tested?

This PR adds unit test coverage.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47817 from dtenedor/zeroifnull.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…l' and 'nullifzero'

### What changes were proposed in this pull request?

In apache#47817 we added new SQL functions `zeroifnull` and `nullifzero`.

In this PR we add Scala and Python DataFrame API endpoints for them.

For example, in Scala:

```
var df = Seq((0)).toDF("a")
df.selectExpr("nullifzero(0)").collect()
> null
df.select(nullifzero(lit(0))).collect()
> null

df.selectExpr("nullifzero(a)").collect()
> null
df.select(nullifzero(lit(5))).collect()
> 5

df = Seq[(Integer)]((null)).toDF("a")
df.selectExpr("zeroifnull(null)").collect()
> 5
df.select(nullifzero(lit(null))).collect()
> 0

df.selectExpr("zeroifnull(a)").collect()
> 0
df.select(zeroifnull(lit(5)))
> 5
```

### Why are the changes needed?

This improves DataFrame parity with the SQL API.

### Does this PR introduce _any_ user-facing change?

Yes, see above.

### How was this patch tested?

This PR adds unit test coverage.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47851 from dtenedor/dataframe-zeroifnull.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants