-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add Index.to_frame method #17815
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17815 +/- ##
==========================================
- Coverage 91.26% 91.24% -0.02%
==========================================
Files 163 163
Lines 49978 49990 +12
==========================================
+ Hits 45611 45614 +3
- Misses 4367 4376 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17815 +/- ##
==========================================
- Coverage 91.26% 91.24% -0.02%
==========================================
Files 163 163
Lines 49980 49992 +12
==========================================
+ Hits 45613 45616 +3
- Misses 4367 4376 +9
Continue to review full report at Codecov.
|
pandas/core/indexes/base.py
Outdated
@@ -1005,6 +1005,29 @@ def to_series(self, **kwargs): | |||
index=self._shallow_copy(), | |||
name=self.name) | |||
|
|||
def to_frame(self, index=True): | |||
""" | |||
Create a DataFrame with the columns the levels of the Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be reworded. Unless I'm missing something this should only return one column for the base Index
, which should only have a single level (MultiIndex
has it's own method/docstring). So maybe something like "Create a DataFrame with the Index values as a column".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...bad English copied and pasted. Fixed.
pandas/core/indexes/base.py
Outdated
Parameters | ||
---------- | ||
index : boolean, default True | ||
return the Index as the index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a little confusing to me on first read. Maybe something along the lines of "use this Index as the index of the resulting DataFrame" would be a little more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...bad English copied and pasted. Fixed.
pandas/core/indexes/datetimes.py
Outdated
@@ -913,6 +913,45 @@ def to_series(self, keep_tz=False): | |||
index=self._shallow_copy(), | |||
name=self.name) | |||
|
|||
def to_frame(self, index=True, keep_tz=False): | |||
""" | |||
Create a DataFrame with the columns the levels of the Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...bad English copied and pasted. Fixed.
pandas/core/indexes/datetimes.py
Outdated
Parameters | ||
---------- | ||
index : boolean, default True | ||
return the DatetimeIndex as the index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...bad English copied and pasted. Fixed.
pandas/core/indexes/datetimes.py
Outdated
|
||
Returns | ||
------- | ||
DataFrame : a DataFrame containing the original Index data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Index" -> "DatetimeIndex"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pandas/tests/indexes/common.py
Outdated
@@ -51,6 +51,20 @@ def test_to_series(self): | |||
assert s.index is not idx | |||
assert s.name == idx.name | |||
|
|||
def test_to_frame(self): | |||
idx = self.create_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
DataFrame : a DataFrame containing the original Index data. | ||
""" | ||
|
||
from pandas import DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not do the same thing as .to_series()
where the values and the index are the same.
is there a reason you are doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where the values and the index are the same.
Not sure I get you here. The implementation I wrote tries to be consistent with what was done with MultiIndex
by constructing DataFrame
with data and setting the index if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is reasonable, side issue this is a little suspect:
In [4]: pd.MultiIndex.from_product([range(3),list('ab')], names=['foo', 'bar']).to_frame()
Out[4]:
bar foo
foo bar
0 a a 0
b b 0
1 a a 1
b b 1
2 a a 2
b b 2
In [5]: pd.MultiIndex.from_product([range(3),list('ab')], names=['foo', 'bar']).to_series()
Out[5]:
foo bar
0 a (0, a)
b (0, b)
1 a (1, a)
b (1, b)
2 a (2, a)
b (2, b)
dtype: object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is suspect about it?
The index seems toe same in both examples?
75f11cc
to
88d3eed
Compare
@jreback @jschendel : Addressed all comments, and everything is green. PTAL. |
pandas/core/indexes/multi.py
Outdated
@@ -1010,14 +1010,14 @@ def _to_safe_for_reshape(self): | |||
|
|||
def to_frame(self, index=True): | |||
""" | |||
Create a DataFrame with the columns the levels of the MultiIndex | |||
Create a DataFrame with the columns and levels of the MultiIndex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem quite right. Maybe something like "Create a DataFrame with the levels of the MultiIndex as columns"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
Minor comment, otherwise LGTM. |
88d3eed
to
554ecd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments!
(I don't feel too enthusiastic about adding this method, but given the consistency between Index and MultiIndex I suppose this is good)
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -31,6 +31,7 @@ New features | |||
- Added ``skipna`` parameter to :func:`~pandas.api.types.infer_dtype` to | |||
support type inference in the presence of missing values (:issue:`17059`). | |||
- :class:`~pandas.Resampler.nearest` is added to support nearest-neighbor upsampling (:issue:`17496`). | |||
- :class:`~pandas.Index~` has added support for a ``to_frame`` method (:issue:`15230`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second ~
should not be there I think
pandas/core/indexes/datetimes.py
Outdated
Set the index of the returned DataFrame as self. | ||
|
||
keep_tz : optional, defaults False. | ||
return the data keeping the timezone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to have this keyword?
I suppose the reason is for consistency with to_series
, but that has this keyword for historical reasons (I think): it dates from before we could have tz aware data inside a series / frame.
Given that, I am not sure it is needed to copy that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche : Given that you're not 100% sure with this logic, I'd rather stick with consistency. We can always remove (and deprecate) the argument in one sweep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only think of deprecating it, then it makes no sense to add it IMO.
(you can say the same with "we can always add it later if people want it" :-))
pandas/core/indexes/multi.py
Outdated
|
||
.. versionadded:: 0.20.0 | ||
|
||
Parameters | ||
---------- | ||
index : boolean, default True | ||
return this MultiIndex as the index | ||
Set the index of the returned DataFrame as self. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally would not use "self" in docstrings. For users that are not very familiar with writing classes themselves (and you don't need to do this for using pandas) this will sound very strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Fixed.
def test_to_frame(self): | ||
# see gh-15230 | ||
idx = self.create_index() | ||
name = idx.name or 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is at least one in the create_index
indices that has a name? (or at least one that has no name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is.
554ecd0
to
587dc1d
Compare
587dc1d
to
1c0aaec
Compare
thanks. maybe make an issue about this #17815 (comment) if you would |
Title is self-explanatory. Closes #15230.