BUG: partial boolean indexing fails with dtype = int #2746

jreback · 2013-01-24T13:22:45Z

this fails because we are putmasking on a dtype=int ndarray
need to cast if this is the case (in series.where)

import pandas
df = pandas.DataFrame(index=[1,2])
df['test'] = [1,2]
df['test'][[True, False]] = pandas.Series([0],index=[1])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-7738933c68bb> in <module>()
----> 1 df['test'][[True,False]] = pandas.Series([0],index=[1])

/mnt/home/jreback/pandas/pandas/core/series.pyc in __setitem__(self, key, value)
    677         if _is_bool_indexer(key):
    678             key = self._check_bool_indexer(key)
--> 679             self.where(~key,value,inplace=True)
    680         else:
    681             self._set_with(key, value)

/mnt/home/jreback/pandas/pandas/core/series.pyc in where(self, cond, other, inplace)
    630             raise ValueError('Length of replacements must equal series length')
    631 
--> 632         np.putmask(ser, ~cond, other)
    633 
    634         return None if inplace else ser

TypeError: array cannot be safely cast to required type

The text was updated successfully, but these errors were encountered:

jreback · 2013-02-15T00:32:29Z

@stephenwlin we didn't close this for some reason....any thoughts?

stephenwlin · 2013-02-15T23:37:40Z

ok i can take a look

stephenwlin · 2013-02-19T04:59:01Z

@jreback I thought about this and there's probably a one or two line fix that can handle this particular case...but I think the most general solution here might be to implement a function called _maybe_unify which takes a list (or tuple, maybe...maybe either or) of dtypes and returns the most general dtype that can hold all of them.

This would probably be useful in a lot of other places where similar unifications are required, like joins or sets (I think a lot of this logic is duplicated in different places right now and not complete either, mostly only handling ints and floats cases). For example, I'm fairly sure that right now, __setitem__ on DataFrame (either directly or through the ix property) either doesn't even allow assigning values that don't fit in the existing column types, or downcasts them implicit to the existing types, rather than upcasting appropriately to fit them which is probably the right thing to do instead: implementing such upcasts would be a lot easier with such a utility function.

(Possibly _maybe_promote could even be rewritten to be in terms of _maybe_unify, since most of what the former is doing is basically a restricted form of the latter...however, there's also some extra work in computing/modifying a fill_value so it might take some thought to keep it clean and while preserving that.)

The one thing I can think of that might be kind of tricky would be the case of datetime64[ns]: right now, they don't upcast correctly to objects, so you can't unify them with other types: instead, you want to allow other objects (like dates, string, etc.) to implicit cast to datetime64[ns] on assignment, if possible, as well as convert NaN to NaT. This is currently handled as a special case in __setitem__ I think...maybe it can be done with a more general solution that doesn't introduce local hacks.

What do you think? Overkill?

stephenwlin · 2013-02-19T05:20:06Z

(On second thought, might not be worth it to handle the >2 dtypes case...I don't think it happens anywhere practically)

jreback · 2013-02-25T23:22:19Z

I think this is related:

http://stackoverflow.com/questions/15077885/typeerror-when-changing-an-integer-data-series-in-pandas/15078445#15078445

jreback · 2013-02-28T19:21:43Z

I tried to fix this, and my solution would work, but have to change the data and dtype of a series inplace is currently not possible, in np 1.7 would work though http://docs.scipy.org/doc/numpy/reference/generated/numpy.copyto.html#numpy.copyto

(once series is refactored to have a _data as its internals, then this is easy)

jreback · 2013-02-28T19:30:27Z

got it....in place updating

ser.dtype = result.dtype ser[:] = result

pandas-dev#2746

BUG: in-place conversion of integer series to float (on putmasking), GH #2746.

jreback mentioned this issue Jan 24, 2013

regression in 0.10.1 with boolean indexing? #2745

Closed

jreback mentioned this issue Feb 4, 2013

ENH: should boolean indexing preserve input dtypes where possible? #2794

Closed

jreback added a commit to jreback/pandas that referenced this issue Feb 28, 2013

BUG: in-place conversion of integer series to float (on putmasking), GH

e2a07cb

pandas-dev#2746

jreback mentioned this issue Feb 28, 2013

BUG: in-place conversion of integer series to float (on putmasking), GH #2746. #2950

Merged

jreback added a commit that referenced this issue Mar 1, 2013

Merge pull request #2950 from jreback/intna_2746

325d119

BUG: in-place conversion of integer series to float (on putmasking), GH #2746.

jreback closed this as completed Mar 1, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: partial boolean indexing fails with dtype = int #2746

BUG: partial boolean indexing fails with dtype = int #2746

jreback commented Jan 24, 2013

jreback commented Feb 15, 2013

stephenwlin commented Feb 15, 2013

stephenwlin commented Feb 19, 2013

stephenwlin commented Feb 19, 2013

jreback commented Feb 25, 2013

jreback commented Feb 28, 2013

jreback commented Feb 28, 2013

BUG: partial boolean indexing fails with dtype = int #2746

BUG: partial boolean indexing fails with dtype = int #2746

Comments

jreback commented Jan 24, 2013

jreback commented Feb 15, 2013

stephenwlin commented Feb 15, 2013

stephenwlin commented Feb 19, 2013

stephenwlin commented Feb 19, 2013

jreback commented Feb 25, 2013

jreback commented Feb 28, 2013

jreback commented Feb 28, 2013