Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: partial boolean indexing fails with dtype = int #2746

Closed
jreback opened this issue Jan 24, 2013 · 7 comments · Fixed by #2950
Closed

BUG: partial boolean indexing fails with dtype = int #2746

jreback opened this issue Jan 24, 2013 · 7 comments · Fixed by #2950
Labels
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jan 24, 2013

this fails because we are putmasking on a dtype=int ndarray
need to cast if this is the case (in series.where)

import pandas
df = pandas.DataFrame(index=[1,2])
df['test'] = [1,2]
df['test'][[True, False]] = pandas.Series([0],index=[1])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-7738933c68bb> in <module>()
----> 1 df['test'][[True,False]] = pandas.Series([0],index=[1])

/mnt/home/jreback/pandas/pandas/core/series.pyc in __setitem__(self, key, value)
    677         if _is_bool_indexer(key):
    678             key = self._check_bool_indexer(key)
--> 679             self.where(~key,value,inplace=True)
    680         else:
    681             self._set_with(key, value)

/mnt/home/jreback/pandas/pandas/core/series.pyc in where(self, cond, other, inplace)
    630             raise ValueError('Length of replacements must equal series length')
    631 
--> 632         np.putmask(ser, ~cond, other)
    633 
    634         return None if inplace else ser

TypeError: array cannot be safely cast to required type

@jreback
Copy link
Contributor Author

jreback commented Feb 15, 2013

@stephenwlin we didn't close this for some reason....any thoughts?

@stephenwlin
Copy link
Contributor

ok i can take a look

@stephenwlin
Copy link
Contributor

@jreback I thought about this and there's probably a one or two line fix that can handle this particular case...but I think the most general solution here might be to implement a function called _maybe_unify which takes a list (or tuple, maybe...maybe either or) of dtypes and returns the most general dtype that can hold all of them.

This would probably be useful in a lot of other places where similar unifications are required, like joins or sets (I think a lot of this logic is duplicated in different places right now and not complete either, mostly only handling ints and floats cases). For example, I'm fairly sure that right now, __setitem__ on DataFrame (either directly or through the ix property) either doesn't even allow assigning values that don't fit in the existing column types, or downcasts them implicit to the existing types, rather than upcasting appropriately to fit them which is probably the right thing to do instead: implementing such upcasts would be a lot easier with such a utility function.

(Possibly _maybe_promote could even be rewritten to be in terms of _maybe_unify, since most of what the former is doing is basically a restricted form of the latter...however, there's also some extra work in computing/modifying a fill_value so it might take some thought to keep it clean and while preserving that.)

The one thing I can think of that might be kind of tricky would be the case of datetime64[ns]: right now, they don't upcast correctly to objects, so you can't unify them with other types: instead, you want to allow other objects (like dates, string, etc.) to implicit cast to datetime64[ns] on assignment, if possible, as well as convert NaN to NaT. This is currently handled as a special case in __setitem__ I think...maybe it can be done with a more general solution that doesn't introduce local hacks.

What do you think? Overkill?

@stephenwlin
Copy link
Contributor

(On second thought, might not be worth it to handle the >2 dtypes case...I don't think it happens anywhere practically)

@jreback
Copy link
Contributor Author

jreback commented Feb 25, 2013

@jreback
Copy link
Contributor Author

jreback commented Feb 28, 2013

I tried to fix this, and my solution would work, but have to change the data and dtype of a series inplace is currently not possible, in np 1.7 would work though http://docs.scipy.org/doc/numpy/reference/generated/numpy.copyto.html#numpy.copyto

(once series is refactored to have a _data as its internals, then this is easy)

@jreback
Copy link
Contributor Author

jreback commented Feb 28, 2013

got it....in place updating

ser.dtype = result.dtype ser[:] = result

jreback added a commit that referenced this issue Mar 1, 2013
BUG: in-place conversion of integer series to float (on putmasking), GH #2746.
@jreback jreback closed this as completed Mar 1, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants