Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assigning dataframe by specifying both row/col doesn't handle nan correctly #3626

Closed
jianpan opened this issue May 16, 2013 · 10 comments · Fixed by #3632
Closed

Assigning dataframe by specifying both row/col doesn't handle nan correctly #3626

jianpan opened this issue May 16, 2013 · 10 comments · Fixed by #3632

Comments

@jianpan
Copy link

jianpan commented May 16, 2013

If I run the following code under pandas 0.11, I will get different results from the 2 identical assign statment at the bottom:

import numpy as np
import pandas as pd

if __name__ == '__main__':
    df = pd.DataFrame({'FC':['a','b','a','b','a','b'],
                       'PF':[0,0,0,0,1,1],
                       'col1':np.random.rand(6),
                       'col2':np.random.rand(6)});
    df.ix[1,0]=np.nan
    mask=~df.FC.isnull()
    cols=['col1', 'col2']

    dft = df * 2
    dft.ix[3,3] = np.nan


    # bug?
    df.ix[mask, cols]= dft.ix[mask, cols]
    print df
    df.ix[mask, cols]= dft.ix[mask, cols]
    print df

Notice in the second result, the NaN at [3,3] disappeared and all values below it got shifted up.

    FC  PF      col1      col2
0    a   0  0.679388  0.508501
1  NaN   0  0.168159  0.346730
2    a   0  1.802611  0.870135
3    b   0  0.577989       NaN
4    a   1  1.027070  0.530890
5    b   1  0.383662  1.855584

    FC  PF      col1      col2
0    a   0  0.679388  0.508501
1  NaN   0  0.168159  0.346730
2    a   0  1.802611  0.870135
3    b   0  0.577989  0.530890
4    a   1  1.027070  1.855584
5    b   1  0.383662  0.508501    

The issue seems to be in pandas index.py line 143:
v = v.reindex(self.obj[item].reindex(v.index).dropna().index)
notice it's dropping NA from the target.

I then tried to use df.ix[mask, cols]= dft.ix[mask, cols].values to bypass this, and it failed also. The problem is in pandas index.py line 149:
if len(labels) != len(value):
notice it's comparing number of columns to be assigned against number of rows in ndarray.

@jreback
Copy link
Contributor

jreback commented May 16, 2013

df = DataFrame({'FC':['a','b','a','b','a','b'],
                        'PF':[0,0,0,0,1,1],
                        'col1':range(6),
                        'col2':range(6,12)})

In [5]: df
Out[5]: 
    FC  PF  col1  col2
0    a   0     0     6
1  NaN   0     1     7
2    a   0     2     8
3    b   0     3     9
4    a   1     4    10
5    b   1     5    11

df2 = df.copy()

In [6]: mask
Out[6]: 
0     True
1    False
2     True
3     True
4     True
5     True
dtype: bool

In [7]: dft
Out[7]: 
    FC  PF  col1  col2
0   aa   0     0    12
1  NaN   0     2    14
2   aa   0     4    16
3   bb   0     6   NaN
4   aa   2     8    20
5   bb   2    10    22

In [8]: df2.ix[mask, cols]= dft.ix[mask, cols]

In [9]: df2
Out[9]: 
    FC  PF  col1  col2
0    a   0     0    12
1  NaN   0     1     7
2    a   0     4    16
3    b   0     6   NaN
4    a   1     8    20
5    b   1    10    22

@jianpan
Copy link
Author

jianpan commented May 16, 2013

I'm using 0.11. The older version doesn't support this kind of assignment.

notice in your test, the mask is inversed, so the null value problem doesn't show up.

@jreback
Copy link
Contributor

jreback commented May 16, 2013

ah..you are right...copy error.....I will take a look, could be a bug in this as I allowed this starting in 0.11 (in theory in certain cases in might have worked < 0.11, that's why I asked)

@jreback
Copy link
Contributor

jreback commented May 16, 2013

@jianpan I updated the results (I switched a range of ints, easier to see alignment), this looks right to me....

(I didn't change any code)

@jianpan
Copy link
Author

jianpan commented May 17, 2013

if you run the df2.ix[mask, cols]= dft.ix[mask, cols] the second time, you will see different result because the bug happens when the left hand side of assignment (df2 in your test) has null values. I updated my first post to include the wrong results.

@jreback
Copy link
Contributor

jreback commented May 17, 2013

this should be closed by #3632
pls give a try

also, if you like to contribute a DOC PR, I had left this issue open #3289 would like to have an example of this (since as of 0.11 we support this)
http://pandas.pydata.org/pandas-docs/dev/indexing.html#setting-values-in-mixed-type-dataframe

@jianpan
Copy link
Author

jianpan commented May 17, 2013

Thanks. Can you also check this sample:

df.ix[mask, cols]= dft.ix[mask, cols].values

This used to fail because in pandas index.py line 149:
if len(labels) != len(value):
it's comparing number of columns to be assigned against number of rows in ndarray.

@jreback
Copy link
Contributor

jreback commented May 17, 2013

updated the PR, that is correct, was another bug!

as you can see was not completely tested (or maybe tests were too naive)

this correctly works now ...pls give a try again

@jianpan
Copy link
Author

jianpan commented May 17, 2013

Thanks for the quick fix!

@jianpan jianpan closed this as completed May 17, 2013
@jreback
Copy link
Contributor

jreback commented May 17, 2013

np... still would love for a PR for the docs for this (really just an example like hav above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants