Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48710][PYTHON][3.5] Limit NumPy version to supported range (>=1.15,<2) #47175

Conversation

codesorcery
Copy link
Contributor

What changes were proposed in this pull request?

  • Add a constraint for numpy<2 to the PySpark package

Why are the changes needed?

PySpark references some code which was removed with NumPy 2.0. Thus, if numpy>=2 is installed, executing PySpark may fail.

#47083 updates the master branch to be compatible with NumPy 2. This PR adds a version bound for older releases, where it won't be applied.

Does this PR introduce any user-facing change?

NumPy will be limited to numpy<2 when installing pypspark with extras ml, mllib, sql, pandas_on_spark or connect.

How was this patch tested?

Via existing CI jobs.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the PYTHON label Jul 2, 2024
@HyukjinKwon HyukjinKwon changed the title [SPARK-48710][PYTHON] Limit NumPy version to supported range (>=1.15,<2) [SPARK-48710][PYTHON][3.5] Limit NumPy version to supported range (>=1.15,<2) Jul 2, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this in branch-3.4 too?

Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@HyukjinKwon
Copy link
Member

Do we need this in branch-3.4 too?

i will backport this to branch-3.4

HyukjinKwon pushed a commit that referenced this pull request Jul 3, 2024
…1.15,<2)

### What changes were proposed in this pull request?
 * Add a constraint for `numpy<2` to the PySpark package

### Why are the changes needed?

PySpark references some code which was removed with NumPy 2.0. Thus, if `numpy>=2` is installed, executing PySpark may fail.

#47083 updates the `master` branch to be compatible with NumPy 2. This PR adds a version bound for older releases, where it won't be applied.

### Does this PR introduce _any_ user-facing change?
NumPy will be limited to `numpy<2` when installing `pypspark` with extras `ml`, `mllib`, `sql`, `pandas_on_spark` or `connect`.

### How was this patch tested?
Via existing CI jobs.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47175 from codesorcery/SPARK-48710-numpy-upper-bound.

Authored-by: Patrick Marx <6949483+codesorcery@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Jul 3, 2024
…1.15,<2)

### What changes were proposed in this pull request?
 * Add a constraint for `numpy<2` to the PySpark package

### Why are the changes needed?

PySpark references some code which was removed with NumPy 2.0. Thus, if `numpy>=2` is installed, executing PySpark may fail.

#47083 updates the `master` branch to be compatible with NumPy 2. This PR adds a version bound for older releases, where it won't be applied.

### Does this PR introduce _any_ user-facing change?
NumPy will be limited to `numpy<2` when installing `pypspark` with extras `ml`, `mllib`, `sql`, `pandas_on_spark` or `connect`.

### How was this patch tested?
Via existing CI jobs.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47175 from codesorcery/SPARK-48710-numpy-upper-bound.

Authored-by: Patrick Marx <6949483+codesorcery@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 44eba46)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@HyukjinKwon
Copy link
Member

Merged to branch-3.5 and branch-3.4.

@HyukjinKwon HyukjinKwon closed this Jul 3, 2024
gaecoli pushed a commit to gaecoli/spark that referenced this pull request Jul 10, 2024
…1.15,<2)

### What changes were proposed in this pull request?
 * Add a constraint for `numpy<2` to the PySpark package

### Why are the changes needed?

PySpark references some code which was removed with NumPy 2.0. Thus, if `numpy>=2` is installed, executing PySpark may fail.

apache#47083 updates the `master` branch to be compatible with NumPy 2. This PR adds a version bound for older releases, where it won't be applied.

### Does this PR introduce _any_ user-facing change?
NumPy will be limited to `numpy<2` when installing `pypspark` with extras `ml`, `mllib`, `sql`, `pandas_on_spark` or `connect`.

### How was this patch tested?
Via existing CI jobs.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47175 from codesorcery/SPARK-48710-numpy-upper-bound.

Authored-by: Patrick Marx <6949483+codesorcery@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Aug 7, 2024
…1.15,<2)

### What changes were proposed in this pull request?
 * Add a constraint for `numpy<2` to the PySpark package

### Why are the changes needed?

PySpark references some code which was removed with NumPy 2.0. Thus, if `numpy>=2` is installed, executing PySpark may fail.

apache#47083 updates the `master` branch to be compatible with NumPy 2. This PR adds a version bound for older releases, where it won't be applied.

### Does this PR introduce _any_ user-facing change?
NumPy will be limited to `numpy<2` when installing `pypspark` with extras `ml`, `mllib`, `sql`, `pandas_on_spark` or `connect`.

### How was this patch tested?
Via existing CI jobs.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47175 from codesorcery/SPARK-48710-numpy-upper-bound.

Authored-by: Patrick Marx <6949483+codesorcery@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 44eba46)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants