Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouby select multiple columns #6524

Closed
hayd opened this issue Mar 3, 2014 · 9 comments
Closed

Grouby select multiple columns #6524

hayd opened this issue Mar 3, 2014 · 9 comments

Comments

@hayd
Copy link
Contributor

hayd commented Mar 3, 2014

Is this supported?

g[['X', 'Y']]  # do groupby stuff with just these columns

http://stackoverflow.com/q/22139053/1240268

@hayd hayd added this to the Someday milestone Mar 3, 2014
@hayd
Copy link
Contributor Author

hayd commented Mar 3, 2014

This should probably raise NotImplemented ?

@naught101
Copy link

@hayd, thanks for the follow up. For reference:

df = pandas.DataFrame({"Dummy":[1,2]*6, "X":[1,3,7]*4, 
                       "Y":[2,3,4]*4, "group":["A","B"]*6})
df[['X', 'Y']].head(1)
   X  Y
0  1  2
[1 rows x 2 columns]

df[:,['X', 'Y']].head(1)
TypeError: unhashable type: 'slice'

df.loc[:,['X', 'Y']].head(1)
   X  Y
0  1  2
[1 rows x 2 columns]

df.groupby('group')[['X', 'Y']].head(1)
         Dummy  X  Y group
group                     
A     0      1  1  2     A
B     1      2  3  3     B
[2 rows x 4 columns]

df.groupby('group').loc[:,['X', 'Y']].head(1)
AttributeError: Cannot access attribute 'loc' of 'DataFrameGroupBy' objects, try using the 'apply' method

@jreback
Copy link
Contributor

jreback commented Mar 3, 2014

This works, except in head/tail.
jreback@92e5c50
Current

In [1]: df = DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])

In [2]: df
Out[2]: 
   A  B
0  1  2
1  1  4
2  5  6

[3 rows x 2 columns]

In [4]: df.groupby('A',as_index=True).head(1)
Out[4]: 
     A  B
A        
1 0  1  2
5 2  5  6

[2 rows x 2 columns]

In [5]: df.groupby('A',as_index=False).head(1)
Out[5]: 
   A  B
0  1  2
2  5  6

[2 rows x 2 columns]

master (with my change)

In [1]: df = DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])

In [2]: df
Out[2]: 
   A  B
0  1  2
1  1  4
2  5  6

[3 rows x 2 columns]

In [3]: df.groupby('A',as_index=True).head(1)
Out[3]: 
     B
A     
1 0  2
5 2  6

[2 rows x 1 columns]

In [4]: df.groupby('A',as_index=False).head(1)
Out[4]: 
   B
0  2
2  6

[2 rows x 1 columns]

I think master is wrong here ?

@TomAugspurger
Copy link
Contributor

Same issue? #5264

@jreback
Copy link
Contributor

jreback commented Mar 3, 2014

@TomAugspurger actually might be the same....let's close this on and consolidate.

The fix is pretty trivial, but a couple of tests are 'wrong' (that's why I put them up). So needs to be carefully gone over.

@jreback jreback closed this as completed Mar 3, 2014
@jreback
Copy link
Contributor

jreback commented Mar 3, 2014

consolidating issue to #5264

@hayd
Copy link
Contributor Author

hayd commented Mar 3, 2014

@jreback I really want to kill this index behaviour of head/tail, it should act like filter. There is an issue somewhere, maybe should do it sooner rather than later #5755

@hayd
Copy link
Contributor Author

hayd commented Mar 3, 2014

@jreback master is wrong there!

@hayd
Copy link
Contributor Author

hayd commented Mar 3, 2014

will put up change (breaking) to make head/tail act like filter (regardless of as_index), the reason it's not is historical IMO (from when it was .apply(head) )... I don't think people ever want it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants