Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45410][INFRA] Add Python GitHub Action Daily Job #43209

Closed
wants to merge 5 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 4, 2023

What changes were proposed in this pull request?

Python language becomes more and more important and we have 11 Python-related test pipelines.

This PR aims to add Python GitHub Action Daily job

  • To setup a framework to add other Python versions later.
  • To offload PyPy3 from main PR and commit builders to this Daily job.

Why are the changes needed?

This will improve our test coverage and will save lots of time in the main builders.

Currently, we are running two Python executables at every commits.

========================================================================
Running PySpark tests
========================================================================
Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
Will test against the following Python executables: ['python3.9', 'pypy3']

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the INFRA label Oct 4, 2023
@dongjoon-hyun
Copy link
Member Author

Could you review this PR, @HyukjinKwon ?

hadoop: hadoop3
envs: >-
{
"PYTHON": "pypy3,python3.8",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should just include Python 3.10 instead of pypy3 that's being tested with coverage at https://github.com/apache/spark/actions/workflows/build_coverage.yml (with JDK 17).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually I see. It will offload from https://github.com/apache/spark/actions/workflows/build_coverage.yml too. Okay, the current one seems fine to me too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Yes, the goal is offloading.

# under the License.
#

name: "Python (master, Hadoop 3, JDK 17, Scala 2.13)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably add a Python version here too so we can easily distinguish it GitHub Actions tab.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let me use `Python 3.8/PyPy (master)" for now because the existing title was not visible already.
Screenshot 2023-10-03 at 11 38 56 PM

We can revise this later when we add Python 3.10/3.11/3.12 later.

@@ -447,7 +448,7 @@ jobs:
export SKIP_PACKAGING=false
echo "Python Packaging Tests Enabled!"
fi
./dev/run-tests --parallelism 1 --modules "$MODULES_TO_TEST"
./dev/run-tests --parallelism 1 --modules "$MODULES_TO_TEST" --python-executables "$PYTHON"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some places here referencing python3.9 to show the installed packages, and would need to fix https://github.com/apache/spark/blob/master/dev/infra/Dockerfile together so the corresponding dependencies are available in the docker image too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the reason why this PR doesn't add Python 3.10/3.11/3.12. Currently, this PR only reusing the existing infra Dockerfile.

@@ -55,6 +55,7 @@ jobs:
runs-on: ubuntu-22.04
env:
GITHUB_PREV_SHA: ${{ github.event.before }}
PYTHON: 'python3.9'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a slightly different name in case it conflicts with other env names ... e.g., PYTHON_EXECUTABLE or SPARK_TESTING_PYTHON

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I agree with you. Let me use PYTHON_TO_TEST like MODULES_TO_TEST.

@dongjoon-hyun
Copy link
Member Author

Oh, I missed that our Infra Docker image has an empty Python 3.8. Let me exclude the CPython 3.8 from this PR for now.

@dongjoon-hyun
Copy link
Member Author

All python tests run with python3.9 only and passed. Let me merge this~ Thank you so much.

========================================================================
Running PySpark tests
========================================================================
Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
Will test against the following Python executables: ['python3.9']

@dongjoon-hyun dongjoon-hyun deleted the SPARK-45410 branch October 4, 2023 21:32
LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request Oct 7, 2023
### What changes were proposed in this pull request?

Python language becomes more and more important and we have **11** Python-related test pipelines.

This PR aims to add `Python GitHub Action Daily` job
- To setup a framework to add other Python versions later.
- To offload PyPy3 from main PR and commit builders to this Daily job.

### Why are the changes needed?

This will improve our test coverage and will save lots of time in the main builders.

Currently, we are running two Python executables at every commits.
```
========================================================================
Running PySpark tests
========================================================================
Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
Will test against the following Python executables: ['python3.9', 'pypy3']
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#43209 from dongjoon-hyun/SPARK-45410.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
HyukjinKwon added a commit that referenced this pull request Oct 10, 2023
…d builds in other branches

### What changes were proposed in this pull request?

This PR explicitly sets `PYTHON_TO_TEST` to an empty string for other branches so they use their default Python versions.
#43209 added a new option to the testing script but that option does not exist in other branches so it fails.

### Why are the changes needed?

To recover the build (see https://github.com/apache/spark/actions/workflows/build_branch33.yml, https://github.com/apache/spark/actions/workflows/build_branch34.yml and https://github.com/apache/spark/actions/workflows/build_branch35.yml)

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

After merging this, I will monitor the build. I locally tested the changes too.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43306 from HyukjinKwon/SPARK-45479.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Oct 18, 2023
… "Build / Python-only (master, PyPy 3.8)"

### What changes were proposed in this pull request?

This PR is a simple followup of #43209 that changes the build name from "Python - PyPy3.8 (master)" to "Build / Python-only (master, PyPy 3.8)"

### Why are the changes needed?

To move/group the build up when you click Actions (https://github.com/apache/spark/actions). For now, it looks as below:
![Screenshot 2023-10-18 at 4 16 23 PM](https://github.com/apache/spark/assets/6477701/617cb386-53b7-40e9-bfa2-3c4a72b0278f)

and you have to click and extend to see Python build:
![Screenshot 2023-10-18 at 4 16 19 PM](https://github.com/apache/spark/assets/6477701/ec258fb6-2449-4c9f-bb2e-340fc1efe78e)

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43428 from HyukjinKwon/minor-rename-python.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants