BUG: DataFrame inplace where doesn't work for mixed datatype frames #2793

stephenwlin · 2013-02-03T06:25:18Z

Noticed the following: DataFrame.where with inplace=True only works when DataFrame is of a single dtype (because it relies on calling np.putmask on DataFrame.values which is only a view in the single dtype case)

In [1]: import pandas as p

In [2]: import numpy as np

In [3]: df=p.DataFrame({'a': [1.0, 2.0, 3.0, 4.0], 
   ...:                 'b': [4.0, 3.0, 2.0, 1.0]}) # single dtype

In [4]: df.where(df > 2, np.nan)
Out[4]: 
    a   b
0 NaN   4
1 NaN   3
2   3 NaN
3   4 NaN

In [5]: df.where(df > 2, np.nan, inplace=True)

In [6]: df
Out[6]: 
    a   b
0 NaN   4
1 NaN   3
2   3 NaN
3   4 NaN

In [7]: df=p.DataFrame({'a': [1, 2, 3, 4],
   ...:                 'b': [4.0, 3.0, 2.0, 1.0]}) # mixed dtypes

In [8]: df.where(df > 2, np.nan) # ok
Out[8]: 
    a   b
0 NaN   4
1 NaN   3
2   3 NaN
3   4 NaN

In [9]: df.where(df > 2, np.nan, inplace=True) # not ok

In [10]: df
Out[10]: 
   a  b
0  1  4
1  2  3
2  3  2
3  4  1

The text was updated successfully, but these errors were encountered:

jreback · 2013-02-03T18:51:13Z

fixed in #2708

a little bit non-trivial as you can (and in your example you did) have an upcast from integer to float, in
an in-place operation, causing the underlying blocks to change; so have to modify the manager
in place

good news is that I wrote a comprehensive test suite for where catching this (and a couple of other) issues

thanks!

In [11]: df = pd.DataFrame({
  'a': np.array([1, 2, 3, 4],dtype='int32'), 
  'b': np.array([4.0, 3.0, 2.0, 1.0], dtype = 'float64') })

In [12]: df.dtypes
Out[12]: 
a      int32
b    float64
Dtype: object

In [13]: df.where(df>2,np.nan,inplace=True)

In [14]: df.dtypes
Out[14]: 
a    float64
b    float64
Dtype: object

In [9]: df
Out[9]: 
    a   b
0 NaN   4
1 NaN   3
2   3 NaN
3   4 NaN

In [15]: df = pd.DataFrame({
  'a': np.array([1, 2, 3, 4],dtype='int64'), 
   'b': np.array([4.0, 3.0, 2.0, 1.0], dtype = 'float64') })

In [16]: df.dtypes
Out[16]: 
a      int64
b    float64
Dtype: object

In [17]: df.where(df>2,np.nan,inplace=True)

In [18]: df.dtypes
Out[18]: 
a    float64
b    float64
Dtype: object

In [9]: df
Out[9]: 
    a   b
0 NaN   4
1 NaN   3
2   3 NaN
3   4 NaN

stephenwlin · 2013-02-03T19:47:22Z

oh, ok, that's great! i'll close this then.

…ndas-dev#622) construction of multi numeric dtypes with other types in a dict validated get_numeric_data returns correct dtypes added blocks attribute (and as_blocks()) method that returns a dict of dtype -> homogeneous Frame to DataFrame added keyword 'raise_on_error' to astype, which can be set to false to exluded non-numeric columns fixed merging to correctly merge on multiple dtypes with blocks (e.g. float64 and float32 in other merger) changed implementation of get_dtype_counts() to use .blocks revised DataFrame.convert_objects to use blocks to be more efficient added Dtype printing to show on default with a Series added convert_dates='coerce' option to convert_objects, to force conversions to datetime64[ns] where can upcast integer to float as needed (on inplace ops pandas-dev#2793) added fully cythonized support for int8/int16 no support for float16 (it can exist, but no cython methods for it) TST: fixed test in test_from_records_sequencelike (dict orders can be different on different arch!) NOTE: using tuples will remove dtype info from the input stream (using a record array is ok though!) test updates for merging (multi-dtypes) added tests for replace (but skipped for now, algos not set for float32/16) tests for astype and convert in internals fixes for test_excel on 32-bit fixed test_resample_median_bug_1688 I belive separated out test_from_records_dictlike testing of panel constructors (GH pandas-dev#797) where ops now have a full test suite allow slightly less sensitive decimal tests for less precise dtypes BUG: fixed GH pandas-dev#2778, fillna on empty frame causes seg fault fixed bug in groupby where types were not being casted to original dtype respect the dtype of non-natural numeric (Decimal) don't upcast ints/bools to floats (if you say were agging on len, you can get an int) DOC: added astype conversion examples to whatsnew and docs (dsintro) updated RELEASE notes whatsnew for 0.10.2 added upcasting gotchas docs CLN: updated convert_objects to be more consistent across frame/series moved most groupby functions out of algos.pyx to generated.pyx fully support cython functions for pad/bfill/take/diff/groupby for float32 moved more block-like conversion loops from frame.py to internals.py (created apply method) (e.g. diff,fillna,where,shift,replace,interpolate,combining), to top-level methods in BlockManager

* jreback/dtypes: ENH: allow propgation and coexistance of numeric dtypes (closes GH #622) construction of multi numeric dtypes with other types in a dict validated get_numeric_data returns correct dtypes added blocks attribute (and as_blocks()) method that returns a dict of dtype -> homogeneous Frame to DataFrame added keyword 'raise_on_error' to astype, which can be set to false to exluded non-numeric columns fixed merging to correctly merge on multiple dtypes with blocks (e.g. float64 and float32 in other merger) changed implementation of get_dtype_counts() to use .blocks revised DataFrame.convert_objects to use blocks to be more efficient added Dtype printing to show on default with a Series added convert_dates='coerce' option to convert_objects, to force conversions to datetime64[ns] where can upcast integer to float as needed (on inplace ops #2793) added fully cythonized support for int8/int16 no support for float16 (it can exist, but no cython methods for it)

jreback mentioned this issue Feb 3, 2013

ENH/BUG/DOC: allow propogation and coexistance of numeric dtypes #2708

Merged

stephenwlin closed this as completed Feb 3, 2013

MarcoGorelli mentioned this issue May 5, 2023

CLN: avoid upcasting in tests where unnecessary (PDEP-6 precursor) #53104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame inplace where doesn't work for mixed datatype frames #2793

BUG: DataFrame inplace where doesn't work for mixed datatype frames #2793

stephenwlin commented Feb 3, 2013

jreback commented Feb 3, 2013

stephenwlin commented Feb 3, 2013

BUG: DataFrame inplace where doesn't work for mixed datatype frames #2793

BUG: DataFrame inplace where doesn't work for mixed datatype frames #2793

Comments

stephenwlin commented Feb 3, 2013

jreback commented Feb 3, 2013

stephenwlin commented Feb 3, 2013