Inconsistent Return Types Between numpy 1.26.4 and numpy 2.1.0 in pandas 2.2.2 #59838

xinrong-meng · 2024-09-19T02:54:20Z

When using numpy 2.1.0, certain methods in pandas (e.g. first_valid_index() and .at[] access) return numpy.int64 instead of plain Python integers as seen when using numpy 1.26.4. This creates inconsistencies in behavior.

To reproduce

Environment 1: Python 3.10, pandas 2.2.2, numpy 1.26.4

conda create -n pd_np_1 python=3.10 pandas=2.2.2 numpy=1.26.4

Environment 2: Python 3.10, pandas 2.2.2, numpy 2.1.0

conda create -n pd_np_2 python=3.10 pandas=2.2.2 -c conda-forge
conda activate pd_np_2
pip install numpy==2.1.0

import pandas as pd
import numpy as np

pd.Series([None, None, 3], index=[1, 2, 3]).first_valid_index()
pd.DataFrame([[0, 1, 2]], index=['a'], columns=['A', 'B', 'C']).at['a', 'A']

Issue

Environment 1 (numpy 1.26.4)

>>> pd.Series([None, None, 3], index=[1, 2, 3]).first_valid_index()
3
>>> pd.DataFrame([[0, 1, 2]], index=['a'], columns=['A', 'B', 'C']).at['a', 'A']
0

Environment 2 (numpy 2.1.0)

>>> pd.Series([None, None, 3], index=[1, 2, 3]).first_valid_index()
np.int64(3)
>>> pd.DataFrame([[0, 1, 2]], index=['a'], columns=['A', 'B', 'C']).at['a', 'A']
np.int64(0)

Discusion

Is this intended behavior, or is it a compatibility issue between pandas 2.2.2 and numpy 2.1.0?

The text was updated successfully, but these errors were encountered:

asishm · 2024-09-19T10:31:33Z

Thanks for the report, the result in both cases are numpy integers.

If you check type(pd.Series([None, None, 3], index=[1, 2, 3]).first_valid_index()) in both, you'll see that they are np.int64.

numpy 2.0 has changed the representation of scalars: see https://numpy.org/devdocs/release/2.0.0-notes.html#representation-of-numpy-scalars-changed

xinrong-meng · 2024-09-20T01:56:16Z

Thank you @asishm ! That really makes sense. Closing the issue.

…ing Spark branches ### What changes were proposed in this pull request? Upgrade numpy to 2.1.0 for building and testing Spark branches. Failed tests are categorized into the following groups: - Most of test failures fixed are related to pandas-dev/pandas#59838 (comment). - Replaced np.mat with np.asmatrix. - TODO: SPARK-49793 ### Why are the changes needed? Ensure compatibility with newer NumPy, which is utilized by Pandas (on Spark). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48180 from xinrong-meng/np_upgrade. Authored-by: Xinrong Meng <xinrong@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

xinrong-meng mentioned this issue Sep 19, 2024

[SPARK-47995][INFRA][PYTHON] Refresh testing image for pyarrow 17 apache/spark#47965

Closed

asishm added the Closing Candidate May be closeable, needs more eyeballs label Sep 19, 2024

xinrong-meng closed this as completed Sep 20, 2024

xinrong-meng mentioned this issue Oct 7, 2024

[SPARK-49792][PYTHON][BUILD] Upgrade to numpy 2 for building and testing Spark branches apache/spark#48180

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Return Types Between numpy 1.26.4 and numpy 2.1.0 in pandas 2.2.2 #59838

Inconsistent Return Types Between numpy 1.26.4 and numpy 2.1.0 in pandas 2.2.2 #59838

xinrong-meng commented Sep 19, 2024

asishm commented Sep 19, 2024

xinrong-meng commented Sep 20, 2024

Inconsistent Return Types Between numpy 1.26.4 and numpy 2.1.0 in pandas 2.2.2 #59838

Inconsistent Return Types Between numpy 1.26.4 and numpy 2.1.0 in pandas 2.2.2 #59838

Comments

xinrong-meng commented Sep 19, 2024

To reproduce

Issue

Discusion

asishm commented Sep 19, 2024

xinrong-meng commented Sep 20, 2024