Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32714][PYTHON] Initial pyspark-stubs port. #29591

Closed
wants to merge 44 commits into from

Conversation

zero323
Copy link
Member

@zero323 zero323 commented Aug 31, 2020

What changes were proposed in this pull request?

This PR proposes migration of pyspark-stubs into Spark codebase.

Why are the changes needed?

Does this PR introduce any user-facing change?

Yes. This PR adds type annotations directly to Spark source.

This can impact interaction with development tools for users, which haven't used pyspark-stubs.

How was this patch tested?

  • MyPy tests of the PySpark source

    mypy --no-incremental --config python/mypy.ini python/pyspark
    
  • MyPy tests of Spark examples

    MYPYPATH=python/ mypy --no-incremental --config python/mypy.ini examples/src/main/python/ml examples/src/main/python/sql examples/src/main/python/sql/streaming
    
  • Existing Flake8 linter

  • Existing unit tests

Tested against:

  • mypy==0.790+dev.e959952d9001e9713d329a2f9b196705b028f894
  • mypy==0.782

@SparkQA
Copy link

SparkQA commented Aug 31, 2020

Test build #128074 has finished for PR 29591 at commit 60b126e.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-32681][PYTHON] Initial pyspark-stubs port. [SPARK-32714][PYTHON] Initial pyspark-stubs port. Aug 31, 2020
@HyukjinKwon
Copy link
Member

Let's use SPARK-32714 for this initial port, and use SPARK-32681 as an umbrella ticket to add other related tickets for followup works.

@SparkQA
Copy link

SparkQA commented Aug 31, 2020

Test build #128098 has finished for PR 29591 at commit e59b562.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 31, 2020

Test build #128106 has finished for PR 29591 at commit 4b9000e.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class MultilayerPerceptronClassificationSummary(_ClassificationSummary): ...
  • class MultilayerPerceptronClassificationTrainingSummary(
  • class PythonException(CapturedException): ...
  • class InheritableThread(threading.Thread):

@SparkQA
Copy link

SparkQA commented Aug 31, 2020

Test build #128109 has finished for PR 29591 at commit a936e03.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 31, 2020

Test build #128111 has finished for PR 29591 at commit 0bc7183.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 1, 2020

Test build #128137 has finished for PR 29591 at commit df12013.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323
Copy link
Member Author

zero323 commented Sep 3, 2020

Update

At the moment I'm working on re-syncing pyspark-stubs to reflect changes introduced by SPARK-32719 and SPARK-32319.

@SparkQA
Copy link

SparkQA commented Sep 6, 2020

Test build #128324 has finished for PR 29591 at commit 93fb711.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 6, 2020

Test build #128326 has finished for PR 29591 at commit b8d4876.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323
Copy link
Member Author

zero323 commented Sep 6, 2020

Update:

At the moment:

  • both MyPy and Flake8 tests pass with
  • flake8==3.8.3
  • flake8-pyi==20.5.0

and F401 (unused import) excludes on a few pyi files.

  • With flake8==3.7.* all tests pass with excludes as added to tox fiile. Additionally to F401 violations, flake8 doesn't seem to understand specific type ignores.

  • With flake8==3.5.0 there are no file specific ignores, so we get multiple failures. This could be addressed, for the time being, by either excluding problematic files from mypy or flake8 tests, and adjusting inline ignores. However, it is rather brute-force solution, and would require some discussion about the priorities.

From the perspective of this PR, an ideal solution would be an update of test dependencies, but I am not sure if that's realistic at the moment (hate to ask, but do you have any thoughts about it @shaneknapp?).

@SparkQA
Copy link

SparkQA commented Sep 6, 2020

Test build #128325 has finished for PR 29591 at commit 27409f2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 7, 2020

Test build #128327 has finished for PR 29591 at commit 1aede7c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 7, 2020

Test build #128328 has finished for PR 29591 at commit 601a577.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _empty_cell_value(object):
  • skeleton_class = types.new_class(
  • enum_class = metacls.__new__(metacls, name, bases, classdict)
  • class CloudPickler(Pickler):
  • is_anyclass = issubclass(t, type)
  • except TypeError: # t is not a class (old Boost; see SF #502085)

dev/tox.ini Outdated Show resolved Hide resolved
@HyukjinKwon
Copy link
Member

@zero323, I usually prefer to don't block something by the env issue in Jenkins so such issue can be handled with enough time - @shaneknapp is sort of busy at this moment IIRC. We could work around for now, and file a separate JIRA for him about the dependency upgade.

@zero323
Copy link
Member Author

zero323 commented Sep 7, 2020

@zero323, I usually prefer to don't block something by the env issue in Jenkins so such issue can be handled with enough time - @shaneknapp is sort of busy at this moment IIRC. We could work around for now, and file a separate JIRA for him about the dependency upgade.

Agreed. I thought it is worth raising the question, as it seems like we'll need some changes to the environment anyway.

@SparkQA
Copy link

SparkQA commented Sep 7, 2020

Test build #128338 has finished for PR 29591 at commit 8f9ef95.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Test build #129005 has finished for PR 29591 at commit b9ac4f8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Test build #129017 has finished for PR 29591 at commit b9ac4f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

I'm going to merge if there's no more comment tomorrow.

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Test build #129033 has finished for PR 29591 at commit fab00f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Sep 23, 2020

LGTM pending passing both GHA and Jenkins.

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Test build #129047 has finished for PR 29591 at commit fab00f1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Sep 24, 2020

@zero323 mind working on the below ones?

  • writing the guidelines in the doc
  • removing non-API type hints

I think these two are pretty important followups to be done soon ..

@zero323
Copy link
Member Author

zero323 commented Sep 24, 2020

@zero323 mind working on the below ones?

On it.

@zero323
Copy link
Member Author

zero323 commented Sep 24, 2020

Thanks everyone!

@zero323 zero323 deleted the SPARK-32681 branch September 24, 2020 06:07
@HyukjinKwon
Copy link
Member

Thank you @zero323 for leading type hint support in PySpark.

HyukjinKwon pushed a commit that referenced this pull request Oct 7, 2020
### What changes were proposed in this pull request?

This PR:

- removes annotations for modules which are not part of the public API.
- removes `__init__.pyi` files, if no annotations, beyond exports, are present.

### Why are the changes needed?

Primarily to reduce maintenance overhead and as requested in the comments to #29591

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests and additional MyPy checks:

```
mypy --no-incremental --config python/mypy.ini python/pyspark
MYPYPATH=python/ mypy --no-incremental --config python/mypy.ini examples/src/main/python/ml examples/src/main/python/sql examples/src/main/python/sql/streaming
```

Closes #29879 from zero323/SPARK-33002.

Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
def take(self, num: int) -> List[Row]: ...
def tail(self, num: int) -> List[Row]: ...
def foreach(self, f: Callable[[Row], None]) -> None: ...
def foreachPartition(self, f: Callable[[Iterator[Row]], None]) -> None: ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kilo59
Copy link

Kilo59 commented Feb 25, 2023

Has anyone solved the problem of trying to type-check pyspark code without installing the 200+MB pyspark package?

That seems to be one massive downside of having pyspark provide its own stubs as opposed to them being part of type-shed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants