Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49306][PYTHON][SQL] Create DataFrame API support for new 'zeroifnull' and 'nullifzero' SQL functions #47851

Closed
wants to merge 9 commits into from

Conversation

dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Aug 23, 2024

What changes were proposed in this pull request?

In #47817 we added new SQL functions zeroifnull and nullifzero.

In this PR we add Scala and Python DataFrame API endpoints for them.

For example, in Scala:

var df = Seq((0)).toDF("a")
df.selectExpr("nullifzero(0)").collect()
> null
df.select(nullifzero(lit(0))).collect()
> null

df.selectExpr("nullifzero(a)").collect()
> null
df.select(nullifzero(lit(5))).collect()
> 5

df = Seq[(Integer)]((null)).toDF("a")
df.selectExpr("zeroifnull(null)").collect()
> 5
df.select(nullifzero(lit(null))).collect()
> 0

df.selectExpr("zeroifnull(a)").collect()
> 0
df.select(zeroifnull(lit(5)))
> 5

Why are the changes needed?

This improves DataFrame parity with the SQL API.

Does this PR introduce any user-facing change?

Yes, see above.

How was this patch tested?

This PR adds unit test coverage.

Was this patch authored or co-authored using generative AI tooling?

No.

@dtenedor
Copy link
Contributor Author

cc @HyukjinKwon @MaxGekk

@HyukjinKwon HyukjinKwon changed the title [SPARK-49306][Python] Create SQL function aliases for 'zeroifnull' and 'nullifzero' [SPARK-49306][PYTHON][SQL] Create SQL function aliases for 'zeroifnull' and 'nullifzero' Aug 23, 2024
Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MaxGekk @HyukjinKwon @allisonwang-db for your reviews, responded to your comments, please take another look.

python/pyspark/sql/functions/builtin.py Outdated Show resolved Hide resolved
python/pyspark/sql/functions/builtin.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added the CORE label Aug 28, 2024
@dtenedor dtenedor requested a review from MaxGekk August 28, 2024 01:53
+------+
|result|
+------+
| None|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you expect None if the function nullifzero() should return NULL for 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this; it was indeed a typo and should say NULL instead of None.

@dtenedor dtenedor requested a review from MaxGekk August 28, 2024 16:36
@MaxGekk
Copy link
Member

MaxGekk commented Aug 28, 2024

+1, LGTM. Merging to mater.
Thank you, @dtenedor and @HyukjinKwon for review.

@MaxGekk MaxGekk closed this in a3cb064 Aug 28, 2024
@dtenedor dtenedor changed the title [SPARK-49306][PYTHON][SQL] Create SQL function aliases for 'zeroifnull' and 'nullifzero' [SPARK-49306][PYTHON][SQL] Create DataFrame API support for new 'zeroifnull' and 'nullifzero' SQL functions Sep 16, 2024
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
…l' and 'nullifzero'

### What changes were proposed in this pull request?

In apache#47817 we added new SQL functions `zeroifnull` and `nullifzero`.

In this PR we add Scala and Python DataFrame API endpoints for them.

For example, in Scala:

```
var df = Seq((0)).toDF("a")
df.selectExpr("nullifzero(0)").collect()
> null
df.select(nullifzero(lit(0))).collect()
> null

df.selectExpr("nullifzero(a)").collect()
> null
df.select(nullifzero(lit(5))).collect()
> 5

df = Seq[(Integer)]((null)).toDF("a")
df.selectExpr("zeroifnull(null)").collect()
> 5
df.select(nullifzero(lit(null))).collect()
> 0

df.selectExpr("zeroifnull(a)").collect()
> 0
df.select(zeroifnull(lit(5)))
> 5
```

### Why are the changes needed?

This improves DataFrame parity with the SQL API.

### Does this PR introduce _any_ user-facing change?

Yes, see above.

### How was this patch tested?

This PR adds unit test coverage.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47851 from dtenedor/dataframe-zeroifnull.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…l' and 'nullifzero'

### What changes were proposed in this pull request?

In apache#47817 we added new SQL functions `zeroifnull` and `nullifzero`.

In this PR we add Scala and Python DataFrame API endpoints for them.

For example, in Scala:

```
var df = Seq((0)).toDF("a")
df.selectExpr("nullifzero(0)").collect()
> null
df.select(nullifzero(lit(0))).collect()
> null

df.selectExpr("nullifzero(a)").collect()
> null
df.select(nullifzero(lit(5))).collect()
> 5

df = Seq[(Integer)]((null)).toDF("a")
df.selectExpr("zeroifnull(null)").collect()
> 5
df.select(nullifzero(lit(null))).collect()
> 0

df.selectExpr("zeroifnull(a)").collect()
> 0
df.select(zeroifnull(lit(5)))
> 5
```

### Why are the changes needed?

This improves DataFrame parity with the SQL API.

### Does this PR introduce _any_ user-facing change?

Yes, see above.

### How was this patch tested?

This PR adds unit test coverage.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47851 from dtenedor/dataframe-zeroifnull.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants