Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: allow index to be referenced by name #12404

Closed
wants to merge 5 commits into from

Conversation

hsharrison
Copy link

Still missing are groupby support (#5677) and .loc support.

Also, I wasn't sure if this deserves more than just a line in whatsnew, so I kept it small for now.

With a standard index:

idx = pd.Index(list('abc'), name='idx')

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=idx)

df['idx']
Out[4]: 
idx
a    a
b    b
c    c
Name: idx, dtype: object

df.idx
Out[5]: 
idx
a    a
b    b
c    c
Name: idx, dtype: object

df[['idx', 'B']]
Out[6]: 
    idx  B
idx       
a     a  4
b     b  5
c     c  6

and with a MultiIndex:

idx = pd.MultiIndex.from_product([list('abc'), list('fg')], names=['lev0', 'lev1'])

df = pd.DataFrame({'A': range(6), 'B': range(10, 16)}, index=idx)

df['lev0']
Out[9]: 
lev0  lev1
a     f       a
      g       a
b     f       b
      g       b
c     f       c
      g       c
Name: lev0, dtype: object

df.lev0
Out[10]: 
lev0  lev1
a     f       a
      g       a
b     f       b
      g       b
c     f       c
      g       c
Name: lev0, dtype: object

df[['A', 'lev1']]
Out[11]: 
           A lev1
lev0 lev1        
a    f     0    f
     g     1    g
b    f     2    f
     g     3    g
c    f     4    f
     g     5    g

@hsharrison
Copy link
Author

Apologies for the noise in the issues threads, I made more than one git goof...

@jorisvandenbossche
Copy link
Member

@hsharrison If you want to avoid noise in issues, you can use GH8162 instead of #8162, those will not make automatic links in the issue each time you commit or rebase.

@hsharrison
Copy link
Author

@jorisvandenbossche thanks

Looks like I was a bit premature with my pull request. Anyone have a suggestion for how to skip Panels here? Perhaps hasattr(self, 'index')? This was my first foray in into the internals so there may be a better way to separate the code paths.

Or, I guess, Panels should behave similarly, but with minor_axis instead of index. I haven't worked much with Panels so I'm not sure if this makes sense. (On second though, it doesn't make sense; I don't think this can work for Panels)

@hsharrison
Copy link
Author

Also, what's the best way to make changes to the pull request without adding more commits? Am I really supposed to force push?

@jorisvandenbossche
Copy link
Member

You can for now just add commits, and then when things shape up you can always squash in the end.

I personally wouldn't bother for Panels (they will probably be deprecated in some time).

@jreback
Copy link
Contributor

jreback commented Feb 21, 2016

note that what you are doing is way complicated - pls simplify much more

@hsharrison
Copy link
Author

OK, not surprised to hear. I assume you mean specifically the solution in _getitem_array?

Any suggestion for another method? It made sense to me to split the index and run each piece back through getitem. It's convoluted but any alternative I can come up with just seems even more convoluted. Would a reset_index solution be preferable? Or is there some piece of the internals that would be helpful here, that I'm not familiar with.

I may have been a bit too ambitious for my first contribution, so any guidance is appreciated.

@jreback
Copy link
Contributor

jreback commented Feb 21, 2016

what I mean is want to contain this change to a function that gets called when appropriate rather than inlining the code (eg u can call it in the except)

further it needs to ask the index (maybe a new method) to evaluate if it has a particular name (this provides compat between index and multiindex)

@hsharrison
Copy link
Author

Cool, thanks :)

@hsharrison
Copy link
Author

Looks like there was a lot of compat there already that I wasn't aware of, specifically self.names and self.get_level_values working appropriately for a non-MultiIndex. This is a lot simpler.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design MultiIndex labels Feb 22, 2016
@@ -432,6 +432,7 @@ Other enhancements
- Added Google ``BigQuery`` service account authentication support, which enables authentication on remote servers. (:issue:`11881`). For further details see :ref:`here <io.bigquery_authentication>`
- ``HDFStore`` is now iterable: ``for k in store`` is equivalent to ``for k in store.keys()`` (:issue:`12221`).
- The entire codebase has been ``PEP``-ified (:issue:`12096`)
- Index (or index levels, with a MultiIndex) can now be referenced like column names (:issue:`8162`, :issue:`10816`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't be in 0.18.0. so remove for now

@jreback
Copy link
Contributor

jreback commented May 7, 2016

@hsharrison if you'd like to update / revisit at some point. This would be best for 0.19.0

@jreback jreback closed this May 7, 2016
@hsharrison
Copy link
Author

Yes, sorry for the silence. I got bogged down and wasn't able to devote any more time to it, but I just picked it up again the other day. I'll reopen when I have something. Thanks for your comments btw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allowing the index to be referenced by name, like a column
3 participants