Assigning dataframe by specifying both row/col doesn't handle nan correctly #3626

jianpan · 2013-05-16T19:17:21Z

If I run the following code under pandas 0.11, I will get different results from the 2 identical assign statment at the bottom:

import numpy as np
import pandas as pd

if __name__ == '__main__':
    df = pd.DataFrame({'FC':['a','b','a','b','a','b'],
                       'PF':[0,0,0,0,1,1],
                       'col1':np.random.rand(6),
                       'col2':np.random.rand(6)});
    df.ix[1,0]=np.nan
    mask=~df.FC.isnull()
    cols=['col1', 'col2']

    dft = df * 2
    dft.ix[3,3] = np.nan


    # bug?
    df.ix[mask, cols]= dft.ix[mask, cols]
    print df
    df.ix[mask, cols]= dft.ix[mask, cols]
    print df

Notice in the second result, the NaN at [3,3] disappeared and all values below it got shifted up.

    FC  PF      col1      col2
0    a   0  0.679388  0.508501
1  NaN   0  0.168159  0.346730
2    a   0  1.802611  0.870135
3    b   0  0.577989       NaN
4    a   1  1.027070  0.530890
5    b   1  0.383662  1.855584

    FC  PF      col1      col2
0    a   0  0.679388  0.508501
1  NaN   0  0.168159  0.346730
2    a   0  1.802611  0.870135
3    b   0  0.577989  0.530890
4    a   1  1.027070  1.855584
5    b   1  0.383662  0.508501

The issue seems to be in pandas index.py line 143:
v = v.reindex(self.obj[item].reindex(v.index).dropna().index)
notice it's dropping NA from the target.

I then tried to use df.ix[mask, cols]= dft.ix[mask, cols].values to bypass this, and it failed also. The problem is in pandas index.py line 149:
if len(labels) != len(value):
notice it's comparing number of columns to be assigned against number of rows in ndarray.

The text was updated successfully, but these errors were encountered:

jreback · 2013-05-16T19:22:45Z

df = DataFrame({'FC':['a','b','a','b','a','b'],
                        'PF':[0,0,0,0,1,1],
                        'col1':range(6),
                        'col2':range(6,12)})

In [5]: df
Out[5]: 
    FC  PF  col1  col2
0    a   0     0     6
1  NaN   0     1     7
2    a   0     2     8
3    b   0     3     9
4    a   1     4    10
5    b   1     5    11

df2 = df.copy()

In [6]: mask
Out[6]: 
0     True
1    False
2     True
3     True
4     True
5     True
dtype: bool

In [7]: dft
Out[7]: 
    FC  PF  col1  col2
0   aa   0     0    12
1  NaN   0     2    14
2   aa   0     4    16
3   bb   0     6   NaN
4   aa   2     8    20
5   bb   2    10    22

In [8]: df2.ix[mask, cols]= dft.ix[mask, cols]

In [9]: df2
Out[9]: 
    FC  PF  col1  col2
0    a   0     0    12
1  NaN   0     1     7
2    a   0     4    16
3    b   0     6   NaN
4    a   1     8    20
5    b   1    10    22

jianpan · 2013-05-16T19:39:03Z

I'm using 0.11. The older version doesn't support this kind of assignment.

notice in your test, the mask is inversed, so the null value problem doesn't show up.

jreback · 2013-05-16T19:40:43Z

ah..you are right...copy error.....I will take a look, could be a bug in this as I allowed this starting in 0.11 (in theory in certain cases in might have worked < 0.11, that's why I asked)

jreback · 2013-05-16T22:44:05Z

@jianpan I updated the results (I switched a range of ints, easier to see alignment), this looks right to me....

(I didn't change any code)

jianpan · 2013-05-17T10:48:02Z

if you run the df2.ix[mask, cols]= dft.ix[mask, cols] the second time, you will see different result because the bug happens when the left hand side of assignment (df2 in your test) has null values. I updated my first post to include the wrong results.

jreback · 2013-05-17T13:09:32Z

this should be closed by #3632
pls give a try

also, if you like to contribute a DOC PR, I had left this issue open #3289 would like to have an example of this (since as of 0.11 we support this)
http://pandas.pydata.org/pandas-docs/dev/indexing.html#setting-values-in-mixed-type-dataframe

jianpan · 2013-05-17T13:25:10Z

Thanks. Can you also check this sample:

df.ix[mask, cols]= dft.ix[mask, cols].values

This used to fail because in pandas index.py line 149:
if len(labels) != len(value):
it's comparing number of columns to be assigned against number of rows in ndarray.

jreback · 2013-05-17T14:15:24Z

updated the PR, that is correct, was another bug!

as you can see was not completely tested (or maybe tests were too naive)

this correctly works now ...pls give a try again

jianpan · 2013-05-17T16:19:08Z

Thanks for the quick fix!

jreback · 2013-05-17T16:42:39Z

np... still would love for a PR for the docs for this (really just an example like hav above)

jreback mentioned this issue May 17, 2013

BUG: (GH3626) issue with alignment of a DataFrame setitem with a piece of another DataFrame #3632

Merged

jianpan closed this as completed May 17, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assigning dataframe by specifying both row/col doesn't handle nan correctly #3626

Assigning dataframe by specifying both row/col doesn't handle nan correctly #3626

jianpan commented May 16, 2013

jreback commented May 16, 2013

jianpan commented May 16, 2013

jreback commented May 16, 2013

jreback commented May 16, 2013

jianpan commented May 17, 2013

jreback commented May 17, 2013

jianpan commented May 17, 2013

jreback commented May 17, 2013

jianpan commented May 17, 2013

jreback commented May 17, 2013

Assigning dataframe by specifying both row/col doesn't handle nan correctly #3626

Assigning dataframe by specifying both row/col doesn't handle nan correctly #3626

Comments

jianpan commented May 16, 2013

jreback commented May 16, 2013

jianpan commented May 16, 2013

jreback commented May 16, 2013

jreback commented May 16, 2013

jianpan commented May 17, 2013

jreback commented May 17, 2013

jianpan commented May 17, 2013

jreback commented May 17, 2013

jianpan commented May 17, 2013

jreback commented May 17, 2013