Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49792][PYTHON][BUILD] Upgrade to numpy 2 for building and testing Spark branches #48180

Closed
wants to merge 1 commit into from

Conversation

xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented Sep 20, 2024

What changes were proposed in this pull request?

Upgrade numpy to 2.1.0 for building and testing Spark branches.

Failed tests are categorized into the following groups:

Why are the changes needed?

Ensure compatibility with newer NumPy, which is utilized by Pandas (on Spark).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the BUILD label Sep 20, 2024
@bjornjorgensen
Copy link
Contributor

there is a new panda version https://pandas.pydata.org/pandas-docs/version/2.2.3/whatsnew/v2.2.3.html that have support for numpy 2.1pandas-dev/pandas#59444

@xinrong-meng xinrong-meng changed the title [WIP] Upgrade numpy to 2.1.0 [SPARK-49792][PS][BUILD] Upgrade numpy to 2.1.0 Sep 26, 2024
@xinrong-meng xinrong-meng changed the title [SPARK-49792][PS][BUILD] Upgrade numpy to 2.1.0 [SPARK-49792][PS][BUILD] Upgrade numpy to 2.1.0 for building and testing Spark branches Sep 26, 2024
@xinrong-meng
Copy link
Member Author

Thank you @bjornjorgensen!
I think we can separate the pandas upgrade from the numpy upgrade, as the current pandas version should be compatible with numpy 2.1.0 as well.

@xinrong-meng xinrong-meng marked this pull request as ready for review September 26, 2024 03:17
@xinrong-meng xinrong-meng changed the title [SPARK-49792][PS][BUILD] Upgrade numpy to 2.1.0 for building and testing Spark branches [SPARK-49792][PS][BUILD] Upgrade numpy for building and testing Spark branches Sep 26, 2024
@xinrong-meng xinrong-meng changed the title [SPARK-49792][PS][BUILD] Upgrade numpy for building and testing Spark branches [SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches Sep 26, 2024
@@ -193,6 +195,10 @@ def predict(inputs):
batch_sizes = preds["preds"].to_numpy()
self.assertTrue(all(batch_sizes <= batch_size))

# TODO(SPARK-49793): enable the test below
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeichenXu123 may I get your input on that please?
More details can be found here https://issues.apache.org/jira/browse/SPARK-49793.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have error message and error stack for numpy2 + caching ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please see https://issues.apache.org/jira/browse/SPARK-49793? There is no error but the results are unexpected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it , need some time to investigation, but we can disable it as a workaround for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also cc @leewyang as the test author

@codesorcery
Copy link
Contributor

Maybe helpful here: the Ruff linter/formatter has some rules to check for NumPy 2 deprecations (https://docs.astral.sh/ruff/rules/numpy2-deprecation/).
I intended to create a pull request for adding those checks to the build pipeline after #47083 was merged, but unfortunately didn't find the time back then.

@xinrong-meng xinrong-meng changed the title [SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches [WIP][SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches Oct 7, 2024
@xinrong-meng xinrong-meng marked this pull request as draft October 7, 2024 03:17
@xinrong-meng
Copy link
Member Author

The test failures we are trying to fix here are almost all related to this issue. Thank you @codesorcery for sharing!

@github-actions github-actions bot added the MLLIB label Oct 8, 2024
@xinrong-meng xinrong-meng changed the title [WIP][SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches [SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches Oct 11, 2024
@xinrong-meng xinrong-meng marked this pull request as ready for review October 11, 2024 00:38
@xinrong-meng
Copy link
Member Author

Retriggered irrelevant tests

@xinrong-meng
Copy link
Member Author

[info] - interrupt all - background queries, foreground interrupt *** FAILED *** (20 seconds, 50 milliseconds)
[info]   The code passed to eventually never returned normally. Attempted 30 times over 20.046569918 seconds. Last failure message: q2Interrupted was false. (SparkSessionE2ESuite.scala:71)
[info]   org.scalatest.exceptions.TestFailedDueToTimeoutException:

Retriggering

@xinrong-meng
Copy link
Member Author

@HyukjinKwon @zhengruifeng @dongjoon-hyun would you please review?

@zhengruifeng
Copy link
Contributor

In General LGTM, pending @WeichenXu123 's feedback on the failed ml caching test

@zhengruifeng zhengruifeng changed the title [SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches [SPARK-49792][PYTHON][BUILD] Upgrade to numpy 2 for building and testing Spark branches Oct 15, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @xinrong-meng and all.

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.0.0 on February 2025.

@xinrong-meng
Copy link
Member Author

Thank you @dongjoon-hyun !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants