Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable DataFrame to be sorted by multiple columns #92

Closed
wesm opened this issue Aug 7, 2011 · 4 comments
Closed

Enable DataFrame to be sorted by multiple columns #92

wesm opened this issue Aug 7, 2011 · 4 comments
Milestone

Comments

@wesm
Copy link
Member

wesm commented Aug 7, 2011

Code would look more or less like:

sorted_df = df.sort(['col1', 'col2'])
@wesm
Copy link
Member Author

wesm commented Nov 13, 2011

Hey @changhiskhan check out my above changes, I was able to get about 4-5x improvement by switching to the fast ndarray zipper I wrote yesterday and calling argsort on an array of tuples. here was my test dataset:


from pandas import DataFrame
import numpy as np
import random

k = 5
n = 1000

A = np.arange(k).repeat(n)
B = np.tile(np.arange(k), n)
random.shuffle(A)
random.shuffle(B)
frame = DataFrame({'A' : A, 'B' : B,
                   'C' : np.random.randn(k * n)})

I made a slightly more complicated test dataset-- I'd love to get a stable sort going but I think it's going to be quicksort until someone writes mergesort or another stable sort for dtype=object

went from 16.4 ms to 3.5 ms, not too bad-- bottleneck is sorting the tuples of course

@mdgoldberg
Copy link

Is sort along multiple columns stable now? Or still unstable?

@jreback
Copy link
Contributor

jreback commented Mar 15, 2017

multiple column sorting has been stable for quite a long time. single column sorting can be controlled via a kind kw, default is quicksort to match np.argsort, which is unstable (though you can pass mergesort if you need)

@mdgoldberg
Copy link

Thanks! It didn't seem clear from the documentation whether multiple column sorting was stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants