ENH: enable Series.info() #37320

ivanovmg · 2020-10-21T19:26:28Z

closes API: add Series.info method #5167
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

I took over #31796 from @MarcoGorelli.
In this PR I took the tests and the docstring from #31796, refactored tests (separated into dataframe and series-related only test classes).
Then on top of the recent changes (#36752) I implemented series info.

New classes:

SeriesInfo (store data, which will be used in the outputs)
SeriesInfoPrinter (basically creator of the appropriate table builder)
SeriesTableBuilder (both Verbose and NonVerbose)
TableBuilderVerboseMixin (shared functionality for verbose info builders of both dataframe and series)

It seems to me that tests are not sufficient enough.
In particular, it seems that empty series info should be covered.
Currently there is a special empty dataframe info, but for series info there is just a generic verbose info with zero items.
If there is a need for a dedicated empty series info, then I would need to add method _fill_empty_info into SeriesTableBuilder.

Static typing makes code quite verbose. In some cases we have the very same methods/properties, but with different type annotations to satisfy type checking (methods are small, but anyway). If somebody can suggest me a better way to handle it, then that would be great.

ivanovmg · 2020-10-21T20:53:55Z

Got some CI issue with building documentation (presumably because of warning related to numpy).
I think this is not related to the changes. Can anyone restart?

In file included from /home/runner/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822:0,
from /home/runner/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /home/runner/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from /home/runner/.cache/ipython/cython/_cython_magic_1e384fc850b1a0be145d9b7384e71f98.c:630:
/home/runner/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it with "
^~~~~~~
In file included from /home/runner/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822:0,
from /home/runner/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /home/runner/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from /home/runner/.cache/ipython/cython/_cython_magic_1f3f4faa63381d31bc6688d149dcf218.c:631:
/home/runner/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it with "
^~~~~~~

jreback

pls fix the tests. I can barely tell what changed here.

jreback · 2020-10-23T00:03:24Z

pandas/tests/io/formats/test_info.py

-    assert (
-        df_with_object_index.memory_usage(index=True, deep=True).sum()
-        == df_with_object_index.memory_usage(index=True).sum()
+class TestDataFrameInfo:


so this diff is super confusing. I would rather simply make 2 test files then jamming them in one. (you can also make a sub-module if that works better).

I created a separate module tests/io/formats/tests_series_info.py.
After this PR, we can probably move both test info modules to tests/*/methods/.

jreback · 2020-10-23T00:13:13Z

Static typing makes code quite verbose. In some cases we have the very same methods/properties, but with different type annotations to satisfy type checking (methods are small, but anyway). If somebody can suggest me a better way to handle it, then that would be great.

we have FrameOrSeries to handle this or FrameOrUnion, otherwise you can make a type alias as well.

simonjayhawkins

Thanks @ivanovmg for the PR.

needs a release note in 1.2 and a versionadded tag in Series.info docstring.

simonjayhawkins · 2020-10-23T08:02:01Z

pandas/core/series.py

@@ -4564,6 +4565,96 @@ def replace(
            method=method,
        )

+    @Substitution(


I don't think the Substitution decorator should be necessary with the doc decorator. (and not seen them used together)

The doc decorator was created to supersede the Appender and Substitution decorators.

I guess that Substitution is still necessary if we use one generic docstring for DataFrame and Series info. I could not figure out how I can replace some keywords in the base docstring, to make it suitable for both frame and series.
Probably I do not know how to use doc decorator.

simonjayhawkins · 2020-10-23T08:06:19Z

pandas/core/series.py

+            Series.memory_usage: Memory usage of Series."""
+        ),
+    )
+    @doc(SeriesInfo.to_buffer)


SeriesInfo.to_buffer doesn't have a docstring. so this doesn't render.

>>> help(pd.Series.info) Help on function info in module pandas.core.series: info(self, verbose: Union[bool, NoneType] = None, buf: Union[IO[str], NoneType] = None, max_cols: Union[int, NoneType] = N one, memory_usage: Union[bool, str, NoneType] = None, null_counts: Union[bool, NoneType] = None) -> None >>>

and wouldn't have memory_usage, max_cols, and null_counts parameters anyway?

(as an aside there appears to be a few issues with DataFrame.info docstring on master, such as alignment of console output and rogue data parameter. Not sure if always like this or from recent refactors, so if you get time, it would be great if can you check that out)

Regarding null_counts - does it mean that we do not need series info without non-null counts?

(as an aside there appears to be a few issues with DataFrame.info docstring on master, such as alignment of console output and rogue data parameter. Not sure if always like this or from recent refactors, so if you get time, it would be great if can you check that out)

I noticed not only here, but in couple of other places, that indentation gets bad, when using this kind of construct:

%(max_cols_sub)s

I never touched the docstring, so probably @MarcoGorelli can comment on the rendering issue.

Please note that I just added dedent in some parameters docs, which make info docstrings render better, without extra indentation.

simonjayhawkins · 2020-10-23T08:48:38Z

pandas/core/series.py

+        verbose: Optional[bool] = None,
+        buf: Optional[IO[str]] = None,
+        max_cols: Optional[int] = None,
+        memory_usage: Optional[Union[bool, str]] = None,


the docstring for DataFrame.info is

memory_usage: bool, str, optional

I think this should be

memory_usage: bool or 'deep', optional

might be able to use Literal here (see #37137) and maybe create an alias in typing . follow-on OK too.

I tried to use Literal, but looks like that is available only starting from Python 3.8.

simonjayhawkins · 2020-10-23T08:50:43Z

pandas/core/series.py

+                "Argument `max_cols` can only be passed "
+                "in DataFrame.info, not Series.info"
+            )
+        return SeriesInfo(self, memory_usage).to_buffer(


it seems odd imo to have parameters other than buf passed to to_buffer()

would it be better to pass verbose and show_counts to SeriesInfo constructor or rename to_buffer?

Putting the params in the constructor is possible, but in this case in SeriesInfo there will be two more attributes, which are used only in one method (smaller cohesion within the class).
I would prefer renaming the method. I will look into that.

I renamed to_buffer -> render.

However, I had to make the same function signature for DataFrameInfo and SeriesInfo to avoid typing errors.
Thus, I pass max_cols into render and raise ValueError there instead of pandas.core.series.info.
How does it look?

ivanovmg · 2020-10-23T16:18:11Z

Thanks @ivanovmg for the PR.

needs a release note in 1.2 and a versionadded tag in Series.info docstring.

I added versionadded tag.
The problem is that it creates extra newline in DataFrame.info() docstring.
Any idea how to solve this? Like, if substitution string is empty, then do not create new line.

Or maybe it is better to just create two separate docstrings for DataFrame and Series, but with the duplication?

ivanovmg · 2020-10-23T16:37:43Z

I added versionadded tag.
The problem is that it creates extra newline in DataFrame.info() docstring.

CI/Checks complains just about that.

jreback · 2020-11-04T03:01:53Z

@ivanovmg if you'd merge master will have a look

hongshaoyang · 2020-11-24T14:58:48Z

pandas/core/series.py

+        buf: Optional[IO[str]] = None,
+        max_cols: Optional[int] = None,
+        memory_usage: Optional[Union[bool, str]] = None,
+        null_counts: bool = True,


per #36805 and #37999 , this should be show_counts

Right, but one thing at a time. Will wait for the public API update first.

github-actions · 2020-12-25T00:21:37Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

mroeschke · 2021-03-10T05:34:12Z

Going to mark as a draft as this PR depends on #38062

jreback · 2021-10-04T00:11:27Z

would take this, if you can merge master and will look

pep8speaks · 2021-10-04T15:23:26Z

Hello @ivanovmg! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-11-30 02:01:11 UTC

This reverts commit 16ac96e.

ivanovmg · 2021-11-30T10:50:17Z

@jreback, I merged master and made several updates.

jreback

lgtm (as a follow on see if we have the correct entry in api.rst)

jreback · 2021-12-01T01:30:53Z

thanks @ivanovmg

ivanovmg added 8 commits October 8, 2020 00:44

TST: add series info tests

a903f32

TST: remove test that series has no info

e07d6e2

ENH: add method Series.info

0990d54

REF: split tests for frame and series

1814795

REF: param test on frame memory_usage_qualified

4c390a8

Merge branch 'master' into feature/series-info

81929e6

ENH: enable series info

824d8d6

CLN: remove extra parens

ce68e94

ivanovmg requested review from jreback, MarcoGorelli and simonjayhawkins October 21, 2020 20:54

jreback requested changes Oct 23, 2020

View reviewed changes

simonjayhawkins requested changes Oct 23, 2020

View reviewed changes

REF: split series-related tests

ede6dc4

ivanovmg requested a review from jreback October 23, 2020 14:41

ivanovmg added 6 commits October 23, 2020 21:46

DOC: add release note

789e03e

CLN: merge two lines

f41596d

DOC: unify series/frame docstrings, fix indent

40b71f8

REF: to_buffer -> render, unify func signature

3e71336

Merge branch 'master' into feature/series-info

739c62d

DOC: add versionadded tag

e9c5220

ivanovmg requested a review from simonjayhawkins October 25, 2020 17:27

jreback added the Output-Formatting __repr__ of pandas objects, to_string label Nov 4, 2020

Merge branch 'master' into feature/series-info

f7cb4f8

ivanovmg force-pushed the feature/series-info branch from f9edf9e to f7cb4f8 Compare November 4, 2020 04:47

hongshaoyang reviewed Nov 24, 2020

View reviewed changes

ivanovmg mentioned this pull request Nov 25, 2020

DOC: move info docs to DataFrameInfo #38062

Merged

5 tasks

github-actions bot added the Stale label Dec 25, 2020

mroeschke marked this pull request as draft March 10, 2021 05:34

mroeschke removed the Stale label Mar 10, 2021

Merge branch 'master' into feature/series-info

0d1c5d8

ivanovmg force-pushed the feature/series-info branch from 9460913 to 0d1c5d8 Compare October 4, 2021 15:36

Fix styling

816803e

ivanovmg force-pushed the feature/series-info branch from 05cc445 to 816803e Compare October 4, 2021 16:09

ivanovmg marked this pull request as ready for review October 5, 2021 15:39

ivanovmg added 11 commits November 29, 2021 22:36

Merge branch 'master' into feature/series-info

9e0198f

DOC: move whatsnew info to v1.4.0

1e2aaef

DOC: move docs on Series.info() to io/formats/info.py

688080b

FIX: newline

4e87b1a

FIX: change versionadded to 1.4.0

dc999fe

DOC: extract null_counts_sub for frames only

4bb4e40

DOC: avoid duplication of kwargs replacement

f114293

DOC: unify newlines/spacing with substitutions

16ac96e

Revert "DOC: unify newlines/spacing with substitutions"

aac2954

This reverts commit 16ac96e.

DOC: fix newlines substitutions

22303dc

DOC: another attempt to fix newline

9428a32

jreback approved these changes Dec 1, 2021

View reviewed changes

jreback added this to the 1.4 milestone Dec 1, 2021

jreback merged commit ef3237f into pandas-dev:master Dec 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: enable Series.info() #37320

ENH: enable Series.info() #37320

ivanovmg commented Oct 21, 2020 •

edited

Loading

ivanovmg commented Oct 21, 2020

jreback left a comment

jreback Oct 23, 2020

ivanovmg Oct 23, 2020 •

edited

Loading

jreback commented Oct 23, 2020

simonjayhawkins left a comment

simonjayhawkins Oct 23, 2020

ivanovmg Oct 23, 2020 •

edited

Loading

simonjayhawkins Oct 23, 2020

ivanovmg Oct 23, 2020

ivanovmg Oct 23, 2020

ivanovmg Oct 23, 2020

simonjayhawkins Oct 23, 2020

ivanovmg Oct 23, 2020

simonjayhawkins Oct 23, 2020

ivanovmg Oct 23, 2020

ivanovmg Oct 23, 2020

ivanovmg commented Oct 23, 2020

ivanovmg commented Oct 23, 2020

jreback commented Nov 4, 2020

hongshaoyang Nov 24, 2020

ivanovmg Nov 24, 2020

github-actions bot commented Dec 25, 2020

mroeschke commented Mar 10, 2021

jreback commented Oct 4, 2021

pep8speaks commented Oct 4, 2021 •

edited

Loading

ivanovmg commented Nov 30, 2021

jreback left a comment

jreback commented Dec 1, 2021

ENH: enable Series.info() #37320

ENH: enable Series.info() #37320

Conversation

ivanovmg commented Oct 21, 2020 • edited Loading

ivanovmg commented Oct 21, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivanovmg Oct 23, 2020 • edited Loading

Choose a reason for hiding this comment

jreback commented Oct 23, 2020

simonjayhawkins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivanovmg Oct 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivanovmg commented Oct 23, 2020

ivanovmg commented Oct 23, 2020

jreback commented Nov 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 25, 2020

mroeschke commented Mar 10, 2021

jreback commented Oct 4, 2021

pep8speaks commented Oct 4, 2021 • edited Loading

Comment last updated at 2021-11-30 02:01:11 UTC

ivanovmg commented Nov 30, 2021

jreback left a comment

Choose a reason for hiding this comment

jreback commented Dec 1, 2021

ivanovmg commented Oct 21, 2020 •

edited

Loading

ivanovmg Oct 23, 2020 •

edited

Loading

ivanovmg Oct 23, 2020 •

edited

Loading

pep8speaks commented Oct 4, 2021 •

edited

Loading