Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN comparison in merge #1990

Closed
jseabold opened this issue Sep 29, 2012 · 2 comments
Closed

NaN comparison in merge #1990

jseabold opened this issue Sep 29, 2012 · 2 comments
Labels
Milestone

Comments

@jseabold
Copy link
Contributor

Parking this here so I don't forget about it. I assume that the NaNs here are not comparing equal, so we get "duplicate" rows on a merge, which I think is a bug.

data = [[1950, "A", 1.5],
        [1950, "B", 1.5],
        [1955, "B", 1.5],
        [1960, "B", np.nan],
        [1970, "B", 4.],
        [1950, "C", 4.],
        [1960, "C", np.nan],
        [1965, "C", 3.],
        [1970, "C", 4.],
        ]
frame = pandas.DataFrame(data, columns=["year", "panel", "data"])

other_data = [[1960, 'A', np.nan],
 [1970, 'A', np.nan],
 [1955, 'A', np.nan],
 [1965, 'A', np.nan],
 [1965, 'B', np.nan],
 [1955, 'C', np.nan]]
other = pandas.DataFrame(other_data, columns=['year', 'panel', 'data'])

together = frame.merge(other, how="outer")

Worked around this by filling in the NaNs before the merge, but then I have to convert them back to NaNs later.

@wesm
Copy link
Member

wesm commented Sep 29, 2012

Hm, I was fairly surprised by this. Having a look under the hood

@wesm wesm closed this as completed in f82d931 Sep 29, 2012
@wesm
Copy link
Member

wesm commented Sep 29, 2012

Thanks for catching this one. Proper NA handling in merges really flew under the radar for a while (I had TODO notes sprinkled throughout pandas/tools/merge.py about it). Turned out to not be very difficult

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants