Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44836][PYTHON] Refactor Arrow Python UDTF #42520

Closed

Conversation

ueshin
Copy link
Member

@ueshin ueshin commented Aug 16, 2023

What changes were proposed in this pull request?

Refactors Arrow Python UDTF.

Why are the changes needed?

Arrow Python UDTF is not need to be redefined when creating it. It can be handled in worker.py.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

The existing tests.

@ueshin
Copy link
Member Author

ueshin commented Aug 16, 2023

cc @dtenedor @allisonwang-db

Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice refactor! Left a comment.

python/pyspark/worker.py Outdated Show resolved Hide resolved
python/pyspark/worker.py Outdated Show resolved Hide resolved
@ueshin
Copy link
Member Author

ueshin commented Aug 16, 2023

For 3.5: #42522 as this changes Spark Connect client implementation.

Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this should indeed make it easier to work with.

ueshin added a commit that referenced this pull request Aug 17, 2023
### What changes were proposed in this pull request?

This is a backport of #42520.

Refactors Arrow Python UDTF.

### Why are the changes needed?

Arrow Python UDTF is not need to be redefined when creating it. It can be handled in `worker.py`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The existing tests.

Closes #42522 from ueshin/issues/SPARK-44836/3.5/refactor_arrow_udtf.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
@ueshin
Copy link
Member Author

ueshin commented Aug 17, 2023

Thanks! merging to master.

@ueshin ueshin closed this in 06959e2 Aug 17, 2023
wangyum pushed a commit that referenced this pull request Aug 18, 2023
…lts of the udtf tests

### What changes were proposed in this pull request?

This is a follow up for #42517.
We need to re-generate the analyzer results for udtf tests after #42519 is merged. Also updated PythonUDTFSuite after #42520 is merged.

### Why are the changes needed?

To fix test failures

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change

Closes #42543 from allisonwang-db/spark-44834-fix.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
wangyum pushed a commit that referenced this pull request Aug 18, 2023
…lts of the udtf tests

### What changes were proposed in this pull request?

This is a follow up for #42517.
We need to re-generate the analyzer results for udtf tests after #42519 is merged. Also updated PythonUDTFSuite after #42520 is merged.

### Why are the changes needed?

To fix test failures

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change

Closes #42543 from allisonwang-db/spark-44834-fix.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit bb41cd8)
Signed-off-by: Yuming Wang <yumwang@ebay.com>
valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
### What changes were proposed in this pull request?

Refactors Arrow Python UDTF.

### Why are the changes needed?

Arrow Python UDTF is not need to be redefined when creating it. It can be handled in `worker.py`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The existing tests.

Closes apache#42520 from ueshin/issues/SPARK-44836/refactor_arrow_udtf.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
…lts of the udtf tests

### What changes were proposed in this pull request?

This is a follow up for apache#42517.
We need to re-generate the analyzer results for udtf tests after apache#42519 is merged. Also updated PythonUDTFSuite after apache#42520 is merged.

### Why are the changes needed?

To fix test failures

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change

Closes apache#42543 from allisonwang-db/spark-44834-fix.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
### What changes were proposed in this pull request?

Refactors Arrow Python UDTF.

### Why are the changes needed?

Arrow Python UDTF is not need to be redefined when creating it. It can be handled in `worker.py`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The existing tests.

Closes apache#42520 from ueshin/issues/SPARK-44836/refactor_arrow_udtf.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
…lts of the udtf tests

### What changes were proposed in this pull request?

This is a follow up for apache#42517.
We need to re-generate the analyzer results for udtf tests after apache#42519 is merged. Also updated PythonUDTFSuite after apache#42520 is merged.

### Why are the changes needed?

To fix test failures

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change

Closes apache#42543 from allisonwang-db/spark-44834-fix.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants