-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: enable Series.info() #37320
ENH: enable Series.info() #37320
Conversation
Got some CI issue with building documentation (presumably because of warning related to numpy). In file included from /home/runner/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822:0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls fix the tests. I can barely tell what changed here.
pandas/tests/io/formats/test_info.py
Outdated
assert ( | ||
df_with_object_index.memory_usage(index=True, deep=True).sum() | ||
== df_with_object_index.memory_usage(index=True).sum() | ||
class TestDataFrameInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this diff is super confusing. I would rather simply make 2 test files then jamming them in one. (you can also make a sub-module if that works better).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a separate module tests/io/formats/tests_series_info.py
.
After this PR, we can probably move both test info modules to tests/*/methods/
.
we have FrameOrSeries to handle this or FrameOrUnion, otherwise you can make a type alias as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ivanovmg for the PR.
needs a release note in 1.2 and a versionadded tag in Series.info docstring.
pandas/core/series.py
Outdated
@@ -4564,6 +4565,96 @@ def replace( | |||
method=method, | |||
) | |||
|
|||
@Substitution( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the Substitution decorator should be necessary with the doc decorator. (and not seen them used together)
The doc decorator was created to supersede the Appender and Substitution decorators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that Substitution is still necessary if we use one generic docstring for DataFrame and Series info. I could not figure out how I can replace some keywords in the base docstring, to make it suitable for both frame and series.
Probably I do not know how to use doc decorator.
pandas/core/series.py
Outdated
Series.memory_usage: Memory usage of Series.""" | ||
), | ||
) | ||
@doc(SeriesInfo.to_buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SeriesInfo.to_buffer doesn't have a docstring. so this doesn't render.
>>> help(pd.Series.info)
Help on function info in module pandas.core.series:
info(self, verbose: Union[bool, NoneType] = None, buf: Union[IO[str], NoneType] = None, max_cols: Union[int, NoneType] = N
one, memory_usage: Union[bool, str, NoneType] = None, null_counts: Union[bool, NoneType] = None) -> None
>>>
and wouldn't have memory_usage, max_cols, and null_counts parameters anyway?
(as an aside there appears to be a few issues with DataFrame.info docstring on master, such as alignment of console output and rogue data parameter. Not sure if always like this or from recent refactors, so if you get time, it would be great if can you check that out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding null_counts
- does it mean that we do not need series info without non-null counts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as an aside there appears to be a few issues with DataFrame.info docstring on master, such as alignment of console output and rogue data parameter. Not sure if always like this or from recent refactors, so if you get time, it would be great if can you check that out)
I noticed not only here, but in couple of other places, that indentation gets bad, when using this kind of construct:
%(max_cols_sub)s
I never touched the docstring, so probably @MarcoGorelli can comment on the rendering issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that I just added dedent in some parameters docs, which make info
docstrings render better, without extra indentation.
pandas/core/series.py
Outdated
verbose: Optional[bool] = None, | ||
buf: Optional[IO[str]] = None, | ||
max_cols: Optional[int] = None, | ||
memory_usage: Optional[Union[bool, str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docstring for DataFrame.info is
memory_usage: bool, str, optional
I think this should be
memory_usage: bool or 'deep', optional
might be able to use Literal here (see #37137) and maybe create an alias in typing . follow-on OK too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use Literal, but looks like that is available only starting from Python 3.8.
pandas/core/series.py
Outdated
"Argument `max_cols` can only be passed " | ||
"in DataFrame.info, not Series.info" | ||
) | ||
return SeriesInfo(self, memory_usage).to_buffer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems odd imo to have parameters other than buf passed to to_buffer()
would it be better to pass verbose and show_counts to SeriesInfo constructor or rename to_buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the params in the constructor is possible, but in this case in SeriesInfo
there will be two more attributes, which are used only in one method (smaller cohesion within the class).
I would prefer renaming the method. I will look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed to_buffer
-> render
.
However, I had to make the same function signature for DataFrameInfo
and SeriesInfo
to avoid typing errors.
Thus, I pass max_cols
into render
and raise ValueError there instead of pandas.core.series.info
.
How does it look?
I added versionadded tag. Or maybe it is better to just create two separate docstrings for DataFrame and Series, but with the duplication? |
CI/Checks complains just about that. |
@ivanovmg if you'd merge master will have a look |
f9edf9e
to
f7cb4f8
Compare
pandas/core/series.py
Outdated
buf: Optional[IO[str]] = None, | ||
max_cols: Optional[int] = None, | ||
memory_usage: Optional[Union[bool, str]] = None, | ||
null_counts: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but one thing at a time. Will wait for the public API update first.
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Going to mark as a draft as this PR depends on #38062 |
would take this, if you can merge master and will look |
9460913
to
0d1c5d8
Compare
05cc445
to
816803e
Compare
This reverts commit 16ac96e.
@jreback, I merged master and made several updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm (as a follow on see if we have the correct entry in api.rst)
thanks @ivanovmg |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
I took over #31796 from @MarcoGorelli.
In this PR I took the tests and the docstring from #31796, refactored tests (separated into dataframe and series-related only test classes).
Then on top of the recent changes (#36752) I implemented series info.
New classes:
SeriesInfo
(store data, which will be used in the outputs)SeriesInfoPrinter
(basically creator of the appropriate table builder)SeriesTableBuilder
(both Verbose and NonVerbose)TableBuilderVerboseMixin
(shared functionality for verbose info builders of both dataframe and series)It seems to me that tests are not sufficient enough.
In particular, it seems that empty series info should be covered.
Currently there is a special empty dataframe info, but for series info there is just a generic verbose info with zero items.
If there is a need for a dedicated empty series info, then I would need to add method
_fill_empty_info
intoSeriesTableBuilder
.Static typing makes code quite verbose. In some cases we have the very same methods/properties, but with different type annotations to satisfy type checking (methods are small, but anyway). If somebody can suggest me a better way to handle it, then that would be great.