Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46293][CONNECT][PYTHON] Use protobuf transitive dependency #44221

Closed
wants to merge 2 commits into from

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Dec 7, 2023

What changes were proposed in this pull request?

This PR proposes to remove protobuf from required package.

Why are the changes needed?

protobuf is automatically installed when installing grpcio and grpcio-status, so we don't need to specify the specific version explicitly.

Does this PR introduce any user-facing change?

No API changes.

How was this patch tested?

The existing CI should pass

Was this patch authored or co-authored using generative AI tooling?

No.

@itholic itholic changed the title [SPARK-46293][CONNECT][DOCS][PYTHON] Add protobuf to required dependency for Spark Connect [SPARK-46293][CONNECT][DOCS][PYTHON] Add protobuf to required dependency for Spark Connect Dec 7, 2023
@@ -161,6 +161,7 @@ Package Supported version Note
`numpy` >=1.21 Required for pandas API on Spark and MLLib DataFrame-based API; Optional for Spark SQL
`grpcio` >=1.59.3 Required for Spark Connect
`grpcio-status` >=1.59.3 Required for Spark Connect
`protobuf` ==4.25.1 Required for Spark Connect
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The required version referred to

protobuf==4.25.1

@@ -161,6 +161,7 @@ Package Supported version Note
`numpy` >=1.21 Required for pandas API on Spark and MLLib DataFrame-based API; Optional for Spark SQL
`grpcio` >=1.59.3 Required for Spark Connect
`grpcio-status` >=1.59.3 Required for Spark Connect
`protobuf` ==4.25.1 Required for Spark Connect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should fix https://github.com/apache/spark/blob/master/python/setup.py#L325C13-L331 too. in fact I won't add it for now because grpcio and grpcio-status requires a specific version of protobuf, and I think we won't have to duplicate the maintenance here. It is true that we directly use protobuf in few places but not really a lot of places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then what about just removing the protobuf from dev/requirements.txt as well to avoid confusion?? As you mentioned seems like grpcio and grpcio-status automatically detect the specific version and install it when even it's not specified in dev/requirements.txt as below:

Screenshot 2023-12-07 at 9 32 06 AM

Copy link
Contributor Author

@itholic itholic Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we can close this ticket, but just one thing I'm worry about is that if the new version of grpcio, grpcio-status requires a protobuf version other than 4.25.1, I think pinning the specific version could be a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check if we can remove protobuf dependency, and CI passes fine? if that works, let's remove.
If that doesn't work, we should fix the doc here and setup.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Sounds reasonable to me.

@itholic itholic changed the title [SPARK-46293][CONNECT][DOCS][PYTHON] Add protobuf to required dependency for Spark Connect [WIP][SPARK-46293][CONNECT][DOCS][PYTHON] Add protobuf to required dependency for Spark Connect Dec 7, 2023
@itholic itholic marked this pull request as draft December 7, 2023 01:10
@itholic
Copy link
Contributor Author

itholic commented Dec 7, 2023

Let me mark this PR as draft until CI is done.

@itholic itholic changed the title [WIP][SPARK-46293][CONNECT][DOCS][PYTHON] Add protobuf to required dependency for Spark Connect [WIP][SPARK-46293][CONNECT][DOCS][PYTHON] Remove protobuf from required package. Dec 7, 2023
@itholic itholic marked this pull request as ready for review December 7, 2023 09:42
@itholic
Copy link
Contributor Author

itholic commented Dec 7, 2023

Okay, seems like we can just remove the dependency. Just updated the PR title & description.

@HyukjinKwon FYI

@itholic itholic changed the title [WIP][SPARK-46293][CONNECT][DOCS][PYTHON] Remove protobuf from required package. [SPARK-46293][CONNECT][DOCS][PYTHON] Remove protobuf from required package. Dec 7, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Ya, it seems to work fine with the transitive dependency.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-46293][CONNECT][DOCS][PYTHON] Remove protobuf from required package. [SPARK-46293][CONNECT][PYTHON] Remove protobuf from required package. Dec 7, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-46293][CONNECT][PYTHON] Remove protobuf from required package. [SPARK-46293][CONNECT][PYTHON] Use protobuf transitive dependency Dec 7, 2023
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 7, 2023

To @itholic and @HyukjinKwon , I revised the PR title because the original PR title is misleading. protobuf is still one of the transitive required package although we don't need to mention it in the requirements.txt file.

- [SPARK-46293][CONNECT][DOCS][PYTHON] Remove protobuf from required package.
+ [SPARK-46293][CONNECT][PYTHON] Use `protobuf` transitive dependency

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.

@itholic
Copy link
Contributor Author

itholic commented Dec 8, 2023

Makes sense to me. Thanks for revising!

dbatomic pushed a commit to dbatomic/spark that referenced this pull request Dec 11, 2023
### What changes were proposed in this pull request?

This PR proposes to remove `protobuf` from required package.

### Why are the changes needed?

`protobuf` is automatically installed when installing `grpcio` and `grpcio-status`, so we don't need to specify the specific version explicitly.

### Does this PR introduce _any_ user-facing change?

No API changes.

### How was this patch tested?

The existing CI should pass

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#44221 from itholic/protobuf_docs.

Authored-by: Haejoon Lee <haejoon.lee@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@itholic itholic deleted the protobuf_docs branch January 1, 2024 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants