Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.drop_duplicates is unreliable in the presence of NA values #557

Closed
wesm opened this issue Dec 30, 2011 · 3 comments
Closed

DataFrame.drop_duplicates is unreliable in the presence of NA values #557

wesm opened this issue Dec 30, 2011 · 3 comments

Comments

@wesm
Copy link
Member

wesm commented Dec 30, 2011

No description provided.

@tr11
Copy link
Contributor

tr11 commented May 3, 2012

I'm not sure if I implemented the correct fix, but I think my fork handles the problem, at least for the case I needed. All I did was to replace the NaNs by a new object that compares to true with itself.

@wesm
Copy link
Member Author

wesm commented May 12, 2012

@changhiskhan could you do a performance comparison on a large dataset before and after the fix you put in and post it here?

@changhiskhan
Copy link
Contributor

Yeah, i'll add a vbench in my branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants