-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32714][PYTHON] Initial pyspark-stubs port. #29591
Conversation
Test build #128074 has finished for PR 29591 at commit
|
Let's use SPARK-32714 for this initial port, and use SPARK-32681 as an umbrella ticket to add other related tickets for followup works. |
Test build #128098 has finished for PR 29591 at commit
|
Test build #128106 has finished for PR 29591 at commit
|
Test build #128109 has finished for PR 29591 at commit
|
Test build #128111 has finished for PR 29591 at commit
|
Test build #128137 has finished for PR 29591 at commit
|
Update At the moment I'm working on re-syncing |
Test build #128324 has finished for PR 29591 at commit
|
Test build #128326 has finished for PR 29591 at commit
|
Update: At the moment:
and F401 (unused import) excludes on a few
From the perspective of this PR, an ideal solution would be an update of test dependencies, but I am not sure if that's realistic at the moment (hate to ask, but do you have any thoughts about it @shaneknapp?). |
Test build #128325 has finished for PR 29591 at commit
|
Test build #128327 has finished for PR 29591 at commit
|
Test build #128328 has finished for PR 29591 at commit
|
examples/src/main/python/ml/estimator_transformer_param_example.py
Outdated
Show resolved
Hide resolved
@zero323, I usually prefer to don't block something by the env issue in Jenkins so such issue can be handled with enough time - @shaneknapp is sort of busy at this moment IIRC. We could work around for now, and file a separate JIRA for him about the dependency upgade. |
Agreed. I thought it is worth raising the question, as it seems like we'll need some changes to the environment anyway. |
Test build #128338 has finished for PR 29591 at commit
|
9998d0a
to
b9ac4f8
Compare
Test build #129005 has finished for PR 29591 at commit
|
retest this please |
Test build #129017 has finished for PR 29591 at commit
|
I'm going to merge if there's no more comment tomorrow. |
Test build #129033 has finished for PR 29591 at commit
|
LGTM pending passing both GHA and Jenkins. |
Test build #129047 has finished for PR 29591 at commit
|
Merged to master. |
@zero323 mind working on the below ones?
I think these two are pretty important followups to be done soon .. |
On it. |
Thanks everyone! |
Thank you @zero323 for leading type hint support in PySpark. |
### What changes were proposed in this pull request? This PR: - removes annotations for modules which are not part of the public API. - removes `__init__.pyi` files, if no annotations, beyond exports, are present. ### Why are the changes needed? Primarily to reduce maintenance overhead and as requested in the comments to #29591 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests and additional MyPy checks: ``` mypy --no-incremental --config python/mypy.ini python/pyspark MYPYPATH=python/ mypy --no-incremental --config python/mypy.ini examples/src/main/python/ml examples/src/main/python/sql examples/src/main/python/sql/streaming ``` Closes #29879 from zero323/SPARK-33002. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
def take(self, num: int) -> List[Row]: ... | ||
def tail(self, num: int) -> List[Row]: ... | ||
def foreach(self, f: Callable[[Row], None]) -> None: ... | ||
def foreachPartition(self, f: Callable[[Iterator[Row]], None]) -> None: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be Iterable[Row]
instead of Iterator[Row]
, to match https://github.com/apache/spark/pull/29591/files#diff-6349afe05d41878cc15995c96a14b011d6aef04b779e136f711eab989b71da6cR215 ?
Has anyone solved the problem of trying to type-check pyspark code without installing the 200+MB pyspark package? That seems to be one massive downside of having |
What changes were proposed in this pull request?
This PR proposes migration of
pyspark-stubs
into Spark codebase.Why are the changes needed?
Does this PR introduce any user-facing change?
Yes. This PR adds type annotations directly to Spark source.
This can impact interaction with development tools for users, which haven't used
pyspark-stubs
.How was this patch tested?
MyPy tests of the PySpark source
MyPy tests of Spark examples
Existing Flake8 linter
Existing unit tests
Tested against:
mypy==0.790+dev.e959952d9001e9713d329a2f9b196705b028f894
mypy==0.782