Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clip fails on NA/nan values #255

Open
burnpanck opened this issue Nov 11, 2024 · 3 comments · Fixed by #263
Open

clip fails on NA/nan values #255

burnpanck opened this issue Nov 11, 2024 · 3 comments · Fixed by #263

Comments

@burnpanck
Copy link
Contributor

burnpanck commented Nov 11, 2024

The following code fails in the last line with TypeError: boolean value of NA is ambiguous:

import numpy as np
import pandas as pd
import pint
import pint_pandas

print(f"{pd.__version__=}")
print(f"{pint.__version__=}")
print(f"{pint_pandas.__version__=}")

u = pint.get_application_registry()

a = np.r_[1,2,np.nan,4,10]
print(f"{a=}")
print(f"{np.clip(a,3,5)=}")

s = pd.Series(data=a)
print(f"{s=}")
print(f"{np.clip(s,3,5)=}")

qs = pd.Series(data=pint_pandas.PintArray.from_1darray_quantity(a*u.m))
print(f"{qs=}")
print(f"{np.clip(qs,3*u.m,5*u.m)=}")

It's output before failing in my environment is:

pd.__version__='2.2.3'
pint.__version__='0.24.4'
pint_pandas.__version__='0.6.2'
a=array([ 1.,  2., nan,  4., 10.])
np.clip(a,3,5)=array([ 3.,  3., nan,  4.,  5.])
s=0     1.0
1     2.0
2     NaN
3     4.0
4    10.0
dtype: float64
np.clip(s,3,5)=0    3.0
1    3.0
2    NaN
3    4.0
4    5.0
dtype: float64
qs=0     1.0
1     2.0
2     nan
3     4.0
4    10.0
dtype: pint[meter]

So, clearly, numpy chose to pass-through nan values in np.clip, and so does pandas.

@andrewgsavage
Copy link
Collaborator

andrewgsavage commented Nov 11, 2024 via email

@burnpanck
Copy link
Contributor Author

I wasn't aware of "subdtypes" or that PR yet. However, I don't immediately see how it would help. In principle, I'm a big fan of proper nullable types ("not-a-number" is not the same as "not-available" to me). However, in the case of clip, I would argue NA most definitely should pass through rather than raising an exception, as NaN always did outside of pint_pandas. So while "subdtypes" should give me control over whether I'd like to use NaN or NA, I feel we're essentially seeing a bug here, for which no API change should be needed.

@andrewgsavage
Copy link
Collaborator

I've added __array_function__, but it doesnt work with Series so you'll need to use np.clip(qs.values, ...)

@andrewgsavage andrewgsavage reopened this Dec 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants