Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame inplace where doesn't work for mixed datatype frames #2793

Closed
stephenwlin opened this issue Feb 3, 2013 · 2 comments
Closed

Comments

@stephenwlin
Copy link
Contributor

Noticed the following: DataFrame.where with inplace=True only works when DataFrame is of a single dtype (because it relies on calling np.putmask on DataFrame.values which is only a view in the single dtype case)

In [1]: import pandas as p

In [2]: import numpy as np

In [3]: df=p.DataFrame({'a': [1.0, 2.0, 3.0, 4.0], 
   ...:                 'b': [4.0, 3.0, 2.0, 1.0]}) # single dtype

In [4]: df.where(df > 2, np.nan)
Out[4]: 
    a   b
0 NaN   4
1 NaN   3
2   3 NaN
3   4 NaN

In [5]: df.where(df > 2, np.nan, inplace=True)

In [6]: df
Out[6]: 
    a   b
0 NaN   4
1 NaN   3
2   3 NaN
3   4 NaN

In [7]: df=p.DataFrame({'a': [1, 2, 3, 4],
   ...:                 'b': [4.0, 3.0, 2.0, 1.0]}) # mixed dtypes

In [8]: df.where(df > 2, np.nan) # ok
Out[8]: 
    a   b
0 NaN   4
1 NaN   3
2   3 NaN
3   4 NaN

In [9]: df.where(df > 2, np.nan, inplace=True) # not ok

In [10]: df
Out[10]: 
   a  b
0  1  4
1  2  3
2  3  2
3  4  1
@jreback
Copy link
Contributor

jreback commented Feb 3, 2013

fixed in #2708

a little bit non-trivial as you can (and in your example you did) have an upcast from integer to float, in
an in-place operation, causing the underlying blocks to change; so have to modify the manager
in place

good news is that I wrote a comprehensive test suite for where catching this (and a couple of other) issues

thanks!

In [11]: df = pd.DataFrame({
  'a': np.array([1, 2, 3, 4],dtype='int32'), 
  'b': np.array([4.0, 3.0, 2.0, 1.0], dtype = 'float64') })

In [12]: df.dtypes
Out[12]: 
a      int32
b    float64
Dtype: object

In [13]: df.where(df>2,np.nan,inplace=True)

In [14]: df.dtypes
Out[14]: 
a    float64
b    float64
Dtype: object

In [9]: df
Out[9]: 
    a   b
0 NaN   4
1 NaN   3
2   3 NaN
3   4 NaN

In [15]: df = pd.DataFrame({
  'a': np.array([1, 2, 3, 4],dtype='int64'), 
   'b': np.array([4.0, 3.0, 2.0, 1.0], dtype = 'float64') })

In [16]: df.dtypes
Out[16]: 
a      int64
b    float64
Dtype: object

In [17]: df.where(df>2,np.nan,inplace=True)

In [18]: df.dtypes
Out[18]: 
a    float64
b    float64
Dtype: object

In [9]: df
Out[9]: 
    a   b
0 NaN   4
1 NaN   3
2   3 NaN
3   4 NaN

@stephenwlin
Copy link
Contributor Author

oh, ok, that's great! i'll close this then.

jreback added a commit to jreback/pandas that referenced this issue Feb 8, 2013
…ndas-dev#622)

     construction of multi numeric dtypes with other types in a dict
     validated get_numeric_data returns correct dtypes
     added blocks attribute (and as_blocks()) method that returns a dict of dtype -> homogeneous Frame to DataFrame
     added keyword 'raise_on_error' to astype, which can be set to false to exluded non-numeric columns
     fixed merging to correctly merge on multiple dtypes with blocks (e.g. float64 and float32 in other merger)
     changed implementation of get_dtype_counts() to use .blocks
     revised DataFrame.convert_objects to use blocks to be more efficient
     added Dtype printing to show on default with a Series
     added convert_dates='coerce' option to convert_objects, to force conversions to datetime64[ns]
     where can upcast integer to float as needed (on inplace ops pandas-dev#2793)
     added fully cythonized support for int8/int16
     no support for float16 (it can exist, but no cython methods for it)

TST: fixed test in test_from_records_sequencelike (dict orders can be different on different arch!)
       NOTE: using tuples will remove dtype info from the input stream (using a record array is ok though!)
     test updates for merging (multi-dtypes)
     added tests for replace (but skipped for now, algos not set for float32/16)
     tests for astype and convert in internals
     fixes for test_excel on 32-bit
     fixed test_resample_median_bug_1688 I belive
     separated out test_from_records_dictlike
     testing of panel constructors (GH pandas-dev#797)
     where ops now have a full test suite
     allow slightly less sensitive decimal tests for less precise dtypes

BUG: fixed GH pandas-dev#2778, fillna on empty frame causes seg fault
     fixed bug in groupby where types were not being casted to original dtype
     respect the dtype of non-natural numeric (Decimal)
     don't upcast ints/bools to floats (if you say were agging on len, you can get an int)
DOC: added astype conversion examples to whatsnew and docs (dsintro)
     updated RELEASE notes
     whatsnew for 0.10.2
     added upcasting gotchas docs

CLN: updated convert_objects to be more consistent across frame/series
     moved most groupby functions out of algos.pyx to generated.pyx
     fully support cython functions for pad/bfill/take/diff/groupby for float32
     moved more block-like conversion loops from frame.py to internals.py (created apply method)
       (e.g. diff,fillna,where,shift,replace,interpolate,combining), to top-level methods in BlockManager
wesm added a commit that referenced this issue Feb 10, 2013
* jreback/dtypes:
  ENH: allow propgation and coexistance of numeric dtypes (closes GH #622)      construction of multi numeric dtypes with other types in a dict      validated get_numeric_data returns correct dtypes      added blocks attribute (and as_blocks()) method that returns a dict of dtype -> homogeneous Frame to DataFrame      added keyword 'raise_on_error' to astype, which can be set to false to exluded non-numeric columns      fixed merging to correctly merge on multiple dtypes with blocks (e.g. float64 and float32 in other merger)      changed implementation of get_dtype_counts() to use .blocks      revised DataFrame.convert_objects to use blocks to be more efficient      added Dtype printing to show on default with a Series      added convert_dates='coerce' option to convert_objects, to force conversions to datetime64[ns]      where can upcast integer to float as needed (on inplace ops #2793)      added fully cythonized support for int8/int16      no support for float16 (it can exist, but no cython methods for it)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants