Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46692][BUILD] Load inputs.envs always in Python and Upload-related steps #44698

Closed
wants to merge 2 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jan 12, 2024

What changes were proposed in this pull request?

The pr aims to fix potential environmental variable(eg: PYTHON_TO_TEST.) transfer issue in pyspark job GA

Why are the changes needed?

https://github.com/apache/spark/actions/workflows/build_python.yml
Let's take a successful build_ python Using as an example, https://github.com/apache/spark/actions/runs/7476792796/job/20348090667

1.python 3.10
Obviously, it is meaningless to enumerate the packages of Python 3.9 when building PySpark based on Python 3.10.
https://github.com/apache/spark/actions/runs/7476792796/job/20348079659
image

2.python 3.11
https://github.com/apache/spark/actions/runs/7476792796/job/20348091081
image

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

@panbingkun panbingkun marked this pull request as ready for review January 12, 2024 02:35
@panbingkun
Copy link
Contributor Author

panbingkun commented Jan 12, 2024

@panbingkun panbingkun changed the title [SPARK-46692][BUILD][PYSPARK] Fix potential issues with environment variable transmission PYTHON_TO_TEST [SPARK-46692][BUILD] Fix potential issues with environment variable transmission PYTHON_TO_TEST Jan 12, 2024
@panbingkun
Copy link
Contributor Author

panbingkun commented Jan 12, 2024

For greater stability, we can fix this issue first, and then I will observe whether the environment variables PYTHON_TO_TEST for each step of build_python tomorrow are effective. If they are effective, we will bring https://github.com/apache/spark/pull/44662 back.

@panbingkun
Copy link
Contributor Author

panbingkun commented Jan 12, 2024

I found that in step PySpark Run test, it is used as follows:

env: ${{ fromJSON(inputs.envs) }}

@panbingkun
Copy link
Contributor Author

After this PR is merged, I will observe whether the value of the environment variable 'PYTHON_TO_TEST' in steps List Python packages (${{ env.PYTHON_TO_TEST }}), Upload test results to report and Upload unit tests log files in build_python is normal at tomorrow morning.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, this looks like the root causes.

  • Are these all instances?
  • Given that the change affects not only PYTHON_TO_TEST, but also all environment variables, please revise the PR title.

    Fix potential issues with environment variable transmission PYTHON_TO_TEST

@panbingkun panbingkun changed the title [SPARK-46692][BUILD] Fix potential issues with environment variable transmission PYTHON_TO_TEST [SPARK-46692][BUILD] Fix potential environmental variable transfer issue in pyspark job GA Jan 12, 2024
@panbingkun
Copy link
Contributor Author

Ya, this looks like the root causes.

  • Are these all instances?
  • Given that the change affects not only PYTHON_TO_TEST, but also all environment variables, please revise the PR title.

    Fix potential issues with environment variable transmission PYTHON_TO_TEST

@dongjoon-hyun
Yes, in ' pyspark job GA', I checked and found that currently only these 3 steps are affected by this issue.
After this PR is merged, I will continue to observe.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-46692][BUILD] Fix potential environmental variable transfer issue in pyspark job GA [SPARK-46692][BUILD] Load inputs.envs always in Python and Upload steps Jan 12, 2024
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-46692][BUILD] Load inputs.envs always in Python and Upload steps [SPARK-46692][BUILD] Load inputs.envs always in Python and Upload-related steps Jan 12, 2024
@panbingkun
Copy link
Contributor Author

+1, LGTM.

@dongjoon-hyun
Thank you very much for giving this PR a suitable title. 😄

If tomorrow's scheduled task meets expectations, I will provide a PR to bring https://github.com/apache/spark/pull/44662 back.

@dongjoon-hyun
Copy link
Member

Ya, +1 for the plan. Thank you.

@panbingkun
Copy link
Contributor Author

panbingkun commented Jan 12, 2024

@dongjoon-hyun
Has the scheduling started yet? I have observed some errors in the pypy3 environment, such as missing package: grpc. Is this expected?
https://github.com/apache/spark/actions/runs/7503864095/job/20429679584
image
I have observed that the environment variables PYTHON_TO_TEST are set correctly, but I am not sure about the error above

Perhaps we need to install the grpc package for Python 3.8 (pypy3)?

RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage matplotlib lxml

@HyukjinKwon
Copy link
Member

We should but I think that's not available. If grpc is not installed it should skip the tests properly.

@HyukjinKwon
Copy link
Member

#44715

@panbingkun
Copy link
Contributor Author

We should but I think that's not available. If grpc is not installed it should skip the tests properly.

Okay, let's continue to observe.

@09306677806
Copy link

0x161142F93e5Af29B51e7552AFbd84E1f563C8ee8

HyukjinKwon added a commit that referenced this pull request Jan 13, 2024
…logics out

### What changes were proposed in this pull request?

This PR factor Connect/non-Connect specific logics out into dedicated test classes. This PR is a followup of #40785

### Why are the changes needed?

In order to avoid test failure such as #44698 (comment) by missing dependencies

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

CI in this PR should verify it.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44715 from HyukjinKwon/SPARK-42960-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@panbingkun
Copy link
Contributor Author

@HyukjinKwon
I have observed the following errors in the environment pypy3.
https://github.com/apache/spark/actions/runs/7519959346/job/20474204069
image

Do we need to install pyarrow in the environment? Or do we ignore some UT when we lack pyarrow again?

@HyukjinKwon
Copy link
Member

pyarrow in PyPy isn't available up to my best knowledge.

@panbingkun
Copy link
Contributor Author

pyarrow in PyPy isn't available up to my best knowledge.

I tried to install 'pyarrow' in pypy3 over the weekend, but one UT failed.
https://github.com/panbingkun/spark/actions/runs/7519505235/job/20468699700
image

So should we ignore it?

@HyukjinKwon
Copy link
Member

For them, let's skip it for now, and file a JIRA for each tests.

@panbingkun
Copy link
Contributor Author

For them, let's skip it for now, and file a JIRA for each tests.

Okay.

HyukjinKwon pushed a commit that referenced this pull request Jan 17, 2024
### What changes were proposed in this pull request?
The pr aims to upgrade upload-artifact action from v3 to v4.
After PR #44698, our environment variable(`PYTHON_TO_TEST`) is correctly passed and assigned value.
We will bring back this PR: #44662

### Why are the changes needed?
- v4.0.0 release notes: https://github.com/actions/upload-artifact/releases/tag/v4.0.0
They have numerous performance and behavioral improvements.

- v3 VS v4: actions/upload-artifact@v3...v4.0.0

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44728 from panbingkun/SPARK-46474_GO_AHEAD.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants