Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Return Types Between numpy 1.26.4 and numpy 2.1.0 in pandas 2.2.2 #59838

Closed
xinrong-meng opened this issue Sep 19, 2024 · 2 comments
Labels
Closing Candidate May be closeable, needs more eyeballs

Comments

@xinrong-meng
Copy link
Contributor

When using numpy 2.1.0, certain methods in pandas (e.g. first_valid_index() and .at[] access) return numpy.int64 instead of plain Python integers as seen when using numpy 1.26.4. This creates inconsistencies in behavior.

To reproduce

  1. Environment 1: Python 3.10, pandas 2.2.2, numpy 1.26.4
conda create -n pd_np_1 python=3.10 pandas=2.2.2 numpy=1.26.4
  1. Environment 2: Python 3.10, pandas 2.2.2, numpy 2.1.0
conda create -n pd_np_2 python=3.10 pandas=2.2.2 -c conda-forge
conda activate pd_np_2
pip install numpy==2.1.0
import pandas as pd
import numpy as np

pd.Series([None, None, 3], index=[1, 2, 3]).first_valid_index()
pd.DataFrame([[0, 1, 2]], index=['a'], columns=['A', 'B', 'C']).at['a', 'A']

Issue

Environment 1 (numpy 1.26.4)

>>> pd.Series([None, None, 3], index=[1, 2, 3]).first_valid_index()
3
>>> pd.DataFrame([[0, 1, 2]], index=['a'], columns=['A', 'B', 'C']).at['a', 'A']
0

Environment 2 (numpy 2.1.0)

>>> pd.Series([None, None, 3], index=[1, 2, 3]).first_valid_index()
np.int64(3)
>>> pd.DataFrame([[0, 1, 2]], index=['a'], columns=['A', 'B', 'C']).at['a', 'A']
np.int64(0)

Discusion

Is this intended behavior, or is it a compatibility issue between pandas 2.2.2 and numpy 2.1.0?

@asishm
Copy link
Contributor

asishm commented Sep 19, 2024

Thanks for the report, the result in both cases are numpy integers.

If you check type(pd.Series([None, None, 3], index=[1, 2, 3]).first_valid_index()) in both, you'll see that they are np.int64.

numpy 2.0 has changed the representation of scalars: see https://numpy.org/devdocs/release/2.0.0-notes.html#representation-of-numpy-scalars-changed

@asishm asishm added the Closing Candidate May be closeable, needs more eyeballs label Sep 19, 2024
@xinrong-meng
Copy link
Contributor Author

Thank you @asishm ! That really makes sense. Closing the issue.

dongjoon-hyun pushed a commit to apache/spark that referenced this issue Oct 15, 2024
…ing Spark branches

### What changes were proposed in this pull request?
Upgrade numpy to 2.1.0 for building and testing Spark branches.

Failed tests are categorized into the following groups:
- Most of test failures fixed are related to pandas-dev/pandas#59838 (comment).
- Replaced np.mat with np.asmatrix.
- TODO: SPARK-49793

### Why are the changes needed?
Ensure compatibility with newer NumPy, which is utilized by Pandas (on Spark).

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48180 from xinrong-meng/np_upgrade.

Authored-by: Xinrong Meng <xinrong@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs
Projects
None yet
Development

No branches or pull requests

2 participants