-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it. #22673
Conversation
Hello @Moisan! Thanks for updating the PR.
Comment last updated on September 14, 2018 at 20:39 Hours UTC |
pandas/core/generic.py
Outdated
Return the xarray equivalent of the pandas object. `xarray | ||
<http://xarray.pydata.org/en/stable/>`__ is a | ||
Python package that allows to handle N-dimensional data. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we need to mention this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to extend the previous short summary. Do you suggest to simply remove it or is there something else we should mention here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove it. We describe xarray
elsewhere in the docs anyhow. You don't always have to put an extension of the short summary.
Codecov Report
@@ Coverage Diff @@
## master #22673 +/- ##
=======================================
Coverage 92.17% 92.17%
=======================================
Files 169 169
Lines 50708 50708
=======================================
Hits 46740 46740
Misses 3968 3968
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/core/generic.py
Outdated
* items (items) object 'A' 'B' 'C' 'D' | ||
* major_axis (major_axis) datetime64[ns] 2013-01-01 2013-01-02 2013-01-03 # noqa | ||
* minor_axis (minor_axis) object 'first' 'second' | ||
* first (first) object 'bar' 'baz' 'foo' 'qux' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be better to have a datetime index for 1 level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @Moisan
I added some comments about the original docstring, that if you implement them, I think the examples won't only pass the tests, but will be much clearer.
pandas/core/generic.py
Outdated
@@ -2498,11 +2498,15 @@ def to_xarray(self): | |||
a Dataset for a DataFrame | |||
a DataArray for higher dims |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind replacing this to the standard format? Only the type in the first line, and a description in the next. For example:
xarray.DataArray or xarray.Dataset
Data in the pandas structure converted to Dataset if the object is a DataFrame, or a DataArray if the object is a Series.
pandas/core/generic.py
Outdated
@@ -2498,11 +2498,15 @@ def to_xarray(self): | |||
a Dataset for a DataFrame | |||
a DataArray for higher dims | |||
|
|||
See also | |||
-------- | |||
DataFrame.to_csv : Write out to a csv file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Also
is with a capital A
.
Personally I don't like "recommending" to_csv
here, as besides not being a great format, it does not support multidimensional data. I think to_parquet
and to_hdf
seem more appropriate to me.
pandas/core/generic.py
Outdated
).set_index(['B','A']) | ||
... 'B' : ['foo', 'bar', 'foo'], | ||
... 'C' : np.arange(4.,7)} | ||
... ).set_index(['B','A']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the need to repeat the previous DataFrame, we can just have df_multiindex = df.set_index(['B', 'A'])
pandas/core/generic.py
Outdated
'B' : ['foo', 'bar', 'foo'], | ||
'C' : np.arange(4.,7)}) | ||
... 'B' : ['foo', 'bar', 'foo'], | ||
... 'C' : np.arange(4.,7)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using arange makes the code more complicated for no reason. Using [4., 5., 6.]
is simpler and clearer I think.
Also, it'd be good to have a meaningful example. The code in the example is difficult to follow as there is no way to know that the object column is B, more than looking at the example. If we get something few samples with animals with name
, num_legs
, speed
, it's obvious which is the float, which the int, and which the str/object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really great. Just a couple of comments that if I'm right, should simplify the examples.
pandas/core/generic.py
Outdated
... ('monkey', 'mammal', np.nan, 4)], | ||
... columns=['name', 'class', 'max_speed', | ||
... 'num_legs'], | ||
... index=[0, 2, 3, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we don't use the default index (so we don't specify it), or we specify one sorted? May be I'm missing the point, but seems like this should have a meaning, but couldn't see with the rest of the example. If there is no reason (may be you just copied from an example where this was for something?), I'd just remove it, so we save some space and avoid distractions.
The indentation of the num_legs
seems wrong, I think it should be indented to the level of name
. When possible we'll start validating automatically PEP8 in the examples, so if we can get this fixed already, that would be great.
pandas/core/generic.py
Outdated
... names=['first', 'second']) | ||
|
||
>>> s = pd.Series(np.arange(8), index=index) | ||
>>> s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this too complicated for what we need to show. To have a Series with a multiindex with a datetime level, we can have something like:
import pandas as pd
df = pd.DataFrame({'date': pd.to_datetime(['2018-01-01', '2018-01-01', '2018-01-02', '2018-01-02']),
'animal': ['falcon', 'parrot', 'falcon', 'parrot'],
'speed': [350, 18, 361, 15]}).set_index(['date', 'animal'])
df['speed']
I haven't used much xarray myself, and not sure what makes sense to show here. May be:
- Series.to_xarray()
- DataFrame.to_xarray()
- DataFrame(with multiindex including datetime).to_xarray()
If that makes sense, I think with the first example, we can have df.to_xarray()
and df['max_speed'].to_xarray()
, and then a example like the one I wrote.
@jreback does this make sense?
Sorry for requesting the changes @Moisan, but my I find like the current version gives the idea that we're trying to show something more complex than what we are actually showing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I'm happy to make the examples more relevant :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! |
git diff upstream/master -u -- "*.py" | flake8 --diff
Fix the docstring for DataFrame.to_xarray as described in #22459 . I removed the pandas.panel example since it was deprecated and I replaced it with a MultiIndex.