Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35721][PYTHON] Path level discover for python unittests #32867

Closed
wants to merge 12 commits into from

Conversation

Yikun
Copy link
Member

@Yikun Yikun commented Jun 10, 2021

What changes were proposed in this pull request?

Add path level discover for python unittests.

Why are the changes needed?

Now we need to specify the python test cases by manually when we add a new testcase. Sometime, we forgot to add the testcase to module list, the testcase would not be executed.

Such as:

  • pyspark-core pyspark.tests.test_pin_thread

Thus we need some auto-discover way to find all testcase rather than specified every case by manually.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add below code in end of dev/sparktestsupport/modules.py

for m in sorted(all_modules):
    for g in sorted(m.python_test_goals):
        print(m.name, g)

Compare the result before and after:
https://www.diffchecker.com/iO3FvhKL

@Yikun
Copy link
Member Author

Yikun commented Jun 10, 2021

cc @HyukjinKwon @xinrong-databricks

It would be good if you could give some inputs on this.

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Test build #139642 has finished for PR 32867 at commit be3877a.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • file2class = os.path.relpath(f, pyspark_path)[:-3].replace(\"/\", \".\")

@github-actions github-actions bot added the BUILD label Jun 10, 2021
@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44170/

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44170/

@Yikun Yikun force-pushed the SPARK_DISCOVER_TEST branch from be3877a to 88b252d Compare June 11, 2021 01:55
@Yikun Yikun force-pushed the SPARK_DISCOVER_TEST branch from 88b252d to 5a7e6e2 Compare June 11, 2021 02:33
@SparkQA
Copy link

SparkQA commented Jun 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44195/

@SparkQA
Copy link

SparkQA commented Jun 11, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44195/

@SparkQA
Copy link

SparkQA commented Jun 11, 2021

Test build #139670 has finished for PR 32867 at commit 5a7e6e2.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Yikun Yikun changed the title [SPARK-XXXX][PYTHON] path level discover for python unittests [SPARK-35721][PYTHON] Path level discover for python unittests Jun 11, 2021
@Yikun Yikun marked this pull request as ready for review June 11, 2021 05:07
@SparkQA
Copy link

SparkQA commented Jun 11, 2021

Test build #139667 has finished for PR 32867 at commit 88b252d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xinrong-meng
Copy link
Member

CC @ueshin @itholic

@SparkQA
Copy link

SparkQA commented Jun 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44317/

@SparkQA
Copy link

SparkQA commented Jun 15, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44317/

@SparkQA
Copy link

SparkQA commented Jun 15, 2021

Test build #139791 has finished for PR 32867 at commit 23f2c87.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Test build #139854 has finished for PR 32867 at commit eb835c5.

  • This patch fails executing the dev/run-tests script.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • for _, _class in inspect.getmembers(_module, inspect.isclass):

@Yikun Yikun force-pushed the SPARK_DISCOVER_TEST branch from eb835c5 to 9f4388a Compare June 16, 2021 08:11
@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Test build #139862 has finished for PR 32867 at commit 9f4388a.

  • This patch fails executing the dev/run-tests script.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • for _, _class in inspect.getmembers(_module, inspect.isclass):

@Yikun Yikun force-pushed the SPARK_DISCOVER_TEST branch from 9f4388a to 22015a5 Compare June 16, 2021 08:41
@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Test build #139865 has finished for PR 32867 at commit 22015a5.

  • This patch fails executing the dev/run-tests script.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • for _, _class in inspect.getmembers(_module, inspect.isclass):

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44383/

@HyukjinKwon
Copy link
Member

BTW, this change will likely affect many other vendors who maintain their forks so I will take a close look few more times. Thanks for bearing with me in advance .. ;-).

@Yikun
Copy link
Member Author

Yikun commented Jun 28, 2021

BTW, this change will likely affect many other vendors who maintain their forks so I will take a close look few more times. Thanks for bearing with me in advance .. ;-).

Sure, thanks for your patient review!

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay.

@HyukjinKwon
Copy link
Member

@Yikun sorry but mind resolving the conflicts please?

@HyukjinKwon
Copy link
Member

Looks pretty good to me too.

@viirya
Copy link
Member

viirya commented Jun 29, 2021

Just wondering we can only verify it manually? If by any chance, some tests are not found, can we easily know it?

@Yikun
Copy link
Member Author

Yikun commented Jun 29, 2021

Just wondering we can only verify it manually? If by any chance, some tests are not found, can we easily know it?

It just same as before, if we forgot to add the path of tests, these tests would not be ran.

But compare to original implementations, the path level discover need less redundant work to add the test file name one by one, and also it decrease the possibility of forgetting to add test on some level.

@HyukjinKwon
Copy link
Member

Merged to master, thanks @Yikun

@ueshin
Copy link
Member

ueshin commented Jun 29, 2021

I'm sorry the the late review, but is this triggering unit tests? I don't see any test cases running.

========================================================================
Running PySpark tests
========================================================================
Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
Will test against the following Python executables: ['python3.6', 'python3.9', 'pypy3']
Will test the following Python modules: ['pyspark-pandas-slow']
python3.6 python_implementation is CPython
python3.6 version is: Python 3.6.13
python3.9 python_implementation is CPython
python3.9 version is: Python 3.9.5
pypy3 python_implementation is PyPy
pypy3 version is: Python 3.7.10 (7.3.5+dfsg-1~ppa1~ubuntu20.04, May 23 2021, 14:57:07)
[PyPy 7.3.5 with GCC 9.3.0]
Starting test(python3.6): pyspark.pandas.generic
Starting test(python3.6): pyspark.pandas.frame
Finished test(python3.6): pyspark.pandas.generic (127s)
Starting test(python3.6): pyspark.pandas.series
Finished test(python3.6): pyspark.pandas.series (197s)
Starting test(python3.9): pyspark.pandas.frame
Finished test(python3.6): pyspark.pandas.frame (465s)
Starting test(python3.9): pyspark.pandas.generic
Finished test(python3.9): pyspark.pandas.generic (120s)
Starting test(python3.9): pyspark.pandas.series
Finished test(python3.9): pyspark.pandas.frame (440s)
Finished test(python3.9): pyspark.pandas.series (179s)
Tests passed in 766 seconds
========================================================================
Running PySpark tests
========================================================================
Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
Will test against the following Python executables: ['python3.6', 'python3.9', 'pypy3']
Will test the following Python modules: ['pyspark-pandas-slow']
python3.6 python_implementation is CPython
python3.6 version is: Python 3.6.13
python3.9 python_implementation is CPython
python3.9 version is: Python 3.9.5
pypy3 python_implementation is PyPy
pypy3 version is: Python 3.7.10 (7.3.5+dfsg-1~ppa1~ubuntu20.04, May 23 2021, 14:57:07)
[PyPy 7.3.5 with GCC 9.3.0]
Starting test(python3.6): pyspark.pandas.tests.indexes.test_base
Starting test(python3.6): pyspark.pandas.tests.indexes.test_datetime
Finished test(python3.6): pyspark.pandas.tests.indexes.test_datetime (262s)
Starting test(python3.6): pyspark.pandas.tests.test_dataframe
Finished test(python3.6): pyspark.pandas.tests.indexes.test_base (301s)
Starting test(python3.6): pyspark.pandas.tests.test_groupby
Finished test(python3.6): pyspark.pandas.tests.test_dataframe (394s) ... 3 tests were skipped
Starting test(python3.6): pyspark.pandas.tests.test_indexing
Finished test(python3.6): pyspark.pandas.tests.test_indexing (180s) ... 5 tests were skipped
Starting test(python3.6): pyspark.pandas.tests.test_ops_on_diff_frames
Finished test(python3.6): pyspark.pandas.tests.test_groupby (559s)
Starting test(python3.6): pyspark.pandas.tests.test_ops_on_diff_frames_groupby
Finished test(python3.6): pyspark.pandas.tests.test_ops_on_diff_frames_groupby (202s)
Starting test(python3.6): pyspark.pandas.tests.test_series
Finished test(python3.6): pyspark.pandas.tests.test_ops_on_diff_frames (392s) ... 2 tests were skipped
Starting test(python3.6): pyspark.pandas.tests.test_stats
Finished test(python3.6): pyspark.pandas.tests.test_stats (102s)
Starting test(python3.9): pyspark.pandas.tests.indexes.test_base
Finished test(python3.6): pyspark.pandas.tests.test_series (301s) ... 2 tests were skipped
Starting test(python3.9): pyspark.pandas.tests.indexes.test_datetime
Finished test(python3.9): pyspark.pandas.tests.indexes.test_base (270s)
Starting test(python3.9): pyspark.pandas.tests.test_dataframe
Finished test(python3.9): pyspark.pandas.tests.indexes.test_datetime (243s)
Starting test(python3.9): pyspark.pandas.tests.test_groupby
Finished test(python3.9): pyspark.pandas.tests.test_dataframe (368s) ... 1 tests were skipped
Starting test(python3.9): pyspark.pandas.tests.test_indexing
Finished test(python3.9): pyspark.pandas.tests.test_groupby (514s)
Starting test(python3.9): pyspark.pandas.tests.test_ops_on_diff_frames
Finished test(python3.9): pyspark.pandas.tests.test_indexing (200s) ... 5 tests were skipped
Starting test(python3.9): pyspark.pandas.tests.test_ops_on_diff_frames_groupby
Finished test(python3.9): pyspark.pandas.tests.test_ops_on_diff_frames_groupby (193s)
Starting test(python3.9): pyspark.pandas.tests.test_series
Finished test(python3.9): pyspark.pandas.tests.test_ops_on_diff_frames (430s)
Starting test(python3.9): pyspark.pandas.tests.test_stats
Finished test(python3.9): pyspark.pandas.tests.test_series (291s) ... 1 tests were skipped
Starting test(python3.6): pyspark.pandas.frame
Finished test(python3.9): pyspark.pandas.tests.test_stats (104s)
Starting test(python3.6): pyspark.pandas.generic
Finished test(python3.6): pyspark.pandas.generic (113s)
Starting test(python3.6): pyspark.pandas.series
Finished test(python3.6): pyspark.pandas.series (162s)
Starting test(python3.9): pyspark.pandas.frame
Finished test(python3.6): pyspark.pandas.frame (386s)
Starting test(python3.9): pyspark.pandas.generic
Finished test(python3.9): pyspark.pandas.generic (102s)
Starting test(python3.9): pyspark.pandas.series
Finished test(python3.9): pyspark.pandas.series (152s)
Finished test(python3.9): pyspark.pandas.frame (364s)
Tests passed in 3301 seconds

@viirya
Copy link
Member

viirya commented Jun 29, 2021

Ur, this might be what I said before, how we make sure the tests are run...

@Yikun Can you check it again?

@ueshin
Copy link
Member

ueshin commented Jun 29, 2021

Let me revert this for now.

@ueshin
Copy link
Member

ueshin commented Jun 29, 2021

Reverted. 1f6e2f5
@Yikun Could you submit another PR to fix it? Thanks.

@HyukjinKwon
Copy link
Member

Uhoh, thanks @ueshin for reverting this.

@Yikun
Copy link
Member Author

Yikun commented Jun 30, 2021

Thanks for revert for it. I'm so sorry for the trouble, I will recheck and fix soon.

Yikun added a commit to Yikun/spark that referenced this pull request Jun 30, 2021
### What changes were proposed in this pull request?
Add path level discover for python unittests.

### Why are the changes needed?
Now we need to specify the python test cases by manually when we add a new testcase. Sometime, we forgot to add the testcase to module list, the testcase would not be executed.

Such as:
- pyspark-core pyspark.tests.test_pin_thread

Thus we need some auto-discover way to find all testcase rather than specified every case by manually.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add below code in end of `dev/sparktestsupport/modules.py`
```python
for m in sorted(all_modules):
    for g in sorted(m.python_test_goals):
        print(m.name, g)
```
Compare the result before and after:
https://www.diffchecker.com/iO3FvhKL

Closes apache#32867 from Yikun/SPARK_DISCOVER_TEST.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@Yikun
Copy link
Member Author

Yikun commented Jul 1, 2021

The root reason of failed to discover is that the deps of PySpark modules is not installed, so we get the wrong list when we do discover.

I re-propose it in #33174 , the mainly changed:

  • Add the error check for discover to make sure the discover work, that means if the error like this pr happend would be raised with exception.
  • Move the discover from dev/run-tests.py to python/run-tests.py, we don't need to discover python test and install python test deps in other modules.
  • Add doctest to make sure discover work as expected.

Thank you for your patience. I will fight this test discover to the end to make up for my mistake. : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants