-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45410][INFRA] Add Python GitHub Action Daily Job #43209
Conversation
Could you review this PR, @HyukjinKwon ? |
.github/workflows/build_python.yml
Outdated
hadoop: hadoop3 | ||
envs: >- | ||
{ | ||
"PYTHON": "pypy3,python3.8", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we should just include Python 3.10 instead of pypy3 that's being tested with coverage at https://github.com/apache/spark/actions/workflows/build_coverage.yml (with JDK 17).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually I see. It will offload from https://github.com/apache/spark/actions/workflows/build_coverage.yml too. Okay, the current one seems fine to me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Yes, the goal is offloading.
.github/workflows/build_python.yml
Outdated
# under the License. | ||
# | ||
|
||
name: "Python (master, Hadoop 3, JDK 17, Scala 2.13)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably add a Python version here too so we can easily distinguish it GitHub Actions tab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.github/workflows/build_and_test.yml
Outdated
@@ -447,7 +448,7 @@ jobs: | |||
export SKIP_PACKAGING=false | |||
echo "Python Packaging Tests Enabled!" | |||
fi | |||
./dev/run-tests --parallelism 1 --modules "$MODULES_TO_TEST" | |||
./dev/run-tests --parallelism 1 --modules "$MODULES_TO_TEST" --python-executables "$PYTHON" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some places here referencing python3.9
to show the installed packages, and would need to fix https://github.com/apache/spark/blob/master/dev/infra/Dockerfile together so the corresponding dependencies are available in the docker image too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the reason why this PR doesn't add Python 3.10/3.11/3.12. Currently, this PR only reusing the existing infra Dockerfile.
.github/workflows/build_and_test.yml
Outdated
@@ -55,6 +55,7 @@ jobs: | |||
runs-on: ubuntu-22.04 | |||
env: | |||
GITHUB_PREV_SHA: ${{ github.event.before }} | |||
PYTHON: 'python3.9' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use a slightly different name in case it conflicts with other env names ... e.g., PYTHON_EXECUTABLE
or SPARK_TESTING_PYTHON
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I agree with you. Let me use PYTHON_TO_TEST
like MODULES_TO_TEST
.
Oh, I missed that our Infra Docker image has an empty |
a7be34a
to
64ca325
Compare
7af8b73
to
726c1e8
Compare
726c1e8
to
f2d90fa
Compare
All python tests run with
|
### What changes were proposed in this pull request? Python language becomes more and more important and we have **11** Python-related test pipelines. This PR aims to add `Python GitHub Action Daily` job - To setup a framework to add other Python versions later. - To offload PyPy3 from main PR and commit builders to this Daily job. ### Why are the changes needed? This will improve our test coverage and will save lots of time in the main builders. Currently, we are running two Python executables at every commits. ``` ======================================================================== Running PySpark tests ======================================================================== Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log Will test against the following Python executables: ['python3.9', 'pypy3'] ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#43209 from dongjoon-hyun/SPARK-45410. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…d builds in other branches ### What changes were proposed in this pull request? This PR explicitly sets `PYTHON_TO_TEST` to an empty string for other branches so they use their default Python versions. #43209 added a new option to the testing script but that option does not exist in other branches so it fails. ### Why are the changes needed? To recover the build (see https://github.com/apache/spark/actions/workflows/build_branch33.yml, https://github.com/apache/spark/actions/workflows/build_branch34.yml and https://github.com/apache/spark/actions/workflows/build_branch35.yml) ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? After merging this, I will monitor the build. I locally tested the changes too. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43306 from HyukjinKwon/SPARK-45479. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
… "Build / Python-only (master, PyPy 3.8)" ### What changes were proposed in this pull request? This PR is a simple followup of #43209 that changes the build name from "Python - PyPy3.8 (master)" to "Build / Python-only (master, PyPy 3.8)" ### Why are the changes needed? To move/group the build up when you click Actions (https://github.com/apache/spark/actions). For now, it looks as below: ![Screenshot 2023-10-18 at 4 16 23 PM](https://github.com/apache/spark/assets/6477701/617cb386-53b7-40e9-bfa2-3c4a72b0278f) and you have to click and extend to see Python build: ![Screenshot 2023-10-18 at 4 16 19 PM](https://github.com/apache/spark/assets/6477701/ec258fb6-2449-4c9f-bb2e-340fc1efe78e) ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #43428 from HyukjinKwon/minor-rename-python. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
Python language becomes more and more important and we have 11 Python-related test pipelines.
This PR aims to add
Python GitHub Action Daily
jobWhy are the changes needed?
This will improve our test coverage and will save lots of time in the main builders.
Currently, we are running two Python executables at every commits.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass the CIs.
Was this patch authored or co-authored using generative AI tooling?
No.