Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorter truncated Series/DataFrame repr: introduce min_rows #27095

Merged
merged 9 commits into from
Jul 3, 2019

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Jun 28, 2019

Closes #27000

(didn't yet add tests, docs or whatsnew, but I think the implementation is already more or less complete)

@jorisvandenbossche jorisvandenbossche added the Output-Formatting __repr__ of pandas objects, to_string label Jun 28, 2019
@@ -665,8 +667,8 @@ def _repr_html_(self):
def to_string(self, buf=None, columns=None, col_space=None, header=True,
index=True, na_rep='NaN', formatters=None, float_format=None,
sparsify=None, index_names=True, justify=None,
max_rows=None, max_cols=None, show_dimensions=False,
decimal='.', line_width=None):
min_rows=None, max_rows=None, max_cols=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add types on new args (and old ones if you can)

pandas/io/formats/format.py Show resolved Hide resolved
pandas/io/formats/format.py Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor

@jorisvandenbossche think you'll be able to update in the next 24 hours?

@codecov
Copy link

codecov bot commented Jul 3, 2019

Codecov Report

Merging #27095 into master will decrease coverage by 49.9%.
The diff coverage is 53.33%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #27095       +/-   ##
===========================================
- Coverage   91.79%   41.89%   -49.91%     
===========================================
  Files         180      180               
  Lines       50934    50803      -131     
===========================================
- Hits        46757    21286    -25471     
- Misses       4177    29517    +25340
Flag Coverage Δ
#multiple ?
#single 41.89% <53.33%> (-0.22%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 34.99% <ø> (-61.94%) ⬇️
pandas/core/series.py 45.45% <ø> (-47.06%) ⬇️
pandas/core/config_init.py 80.68% <100%> (-15.39%) ⬇️
pandas/io/formats/format.py 50.57% <46.15%> (-47.35%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/gcs.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
... and 142 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ab9ff5...98e7d43. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Jul 3, 2019

Codecov Report

Merging #27095 into master will decrease coverage by 49.9%.
The diff coverage is 53.33%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #27095       +/-   ##
===========================================
- Coverage   91.79%   41.89%   -49.91%     
===========================================
  Files         180      180               
  Lines       50934    50803      -131     
===========================================
- Hits        46757    21286    -25471     
- Misses       4177    29517    +25340
Flag Coverage Δ
#multiple ?
#single 41.89% <53.33%> (-0.22%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 34.99% <ø> (-61.94%) ⬇️
pandas/core/series.py 45.45% <ø> (-47.06%) ⬇️
pandas/core/config_init.py 80.68% <100%> (-15.39%) ⬇️
pandas/io/formats/format.py 50.57% <46.15%> (-47.35%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/gcs.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
... and 142 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ab9ff5...98e7d43. Read the comment docs.

@jorisvandenbossche
Copy link
Member Author

I added some tests. Further to do:

  • Still need to add a whatsnew note / docs
  • We can decide to not add the min_rows argument to to_string, but only in the Series/DataFrameFormatter classes if we want (cc @simonjayhawkins ), by using those Formatter classes directly in __repr__ instead of calling to_string (to_string is only brief wrapper around the Formatter classes, so it would not add much code duplication)

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 on adding keywords to public methods. see #27000 (comment)

max_cols = get_option("display.max_columns")
show_dimensions = get_option("display.show_dimensions")
if get_option("display.expand_frame_repr"):
width, _ = console.get_console_size()
else:
width = None
self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols,
line_width=width, show_dimensions=show_dimensions)
self.to_string(buf=buf, max_rows=max_rows, min_rows=min_rows,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the logic be added before this call to .to_string (isn't it as simple as just set max_rows to min_rows if len(self.frame) > max_rows?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the simple cases, yes. And I agree that could be a nice option. But there are some corner cases that are only handled within the DataFrameFormatter class (eg if max_rows = 0, we check the terminal size)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, looking at it more closely, that special case is only if max_rows is 0 or None, so that could be easily checked. So yes, this would in principle be possible (and probably even give easier code). But that means that we move some of the logic of the repr out of the Formatter class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that means that we move some of the logic of the repr out of the Formatter class.

i agree. but this logic is related to display options. i'd prefer to see all the display option handling removed from the Formatter classes entirely and only appear in __repr__ and _repr_html_

it appears from the comments that adding arguments to to_string is not considered undesirable. so this topic can be addressed another day.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. A release note would be nice if you have a chance.

-1 on adding keywords to public methods. see #27000 (comment)

I don't have a strong opinion here, but I can see them being useful.

@jreback jreback added this to the 0.25.0 milestone Jul 3, 2019
max_rows = self.max_rows
# if min_rows is None, follow value of max_rows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None or 0

pandas/io/formats/format.py Outdated Show resolved Hide resolved
pandas/io/formats/format.py Outdated Show resolved Hide resolved
@jorisvandenbossche
Copy link
Member Author

-1 on adding keywords to public methods

As I said above, it is easy to avoid that, if we want (with only a little bit of code duplication, just a call to the Formatter class. But it also gives some consistency since max_rows and max_cols are there as well.

@jreback
Copy link
Contributor

jreback commented Jul 3, 2019

-1 on adding keywords to public methods

As I said above, it is easy to avoid that, if we want (with only a little bit of code duplication, just a call to the Formatter class. But it also gives some consistency since max_rows and max_cols are there as well.

I like the consistency argument here, +1 on adding min_rows to to_string

@TomAugspurger
Copy link
Contributor

@jorisvandenbossche
Copy link
Member Author

Ah, this annoying check we have in our CI ;-)

@TomAugspurger TomAugspurger mentioned this pull request Jul 3, 2019
@@ -1522,6 +1525,9 @@ def to_string(self, buf=None, na_rep='NaN', float_format=None, header=True,
max_rows : int, optional
Maximum number of rows to show before truncating. If None, show
all.
min_rows : int, optional
Copy link
Contributor

@jreback jreback Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory should add a versionaded here, but not worth repushing

@@ -79,6 +79,9 @@
* unset.
max_rows : int, optional
Maximum number of rows to display in the console.
min_rows : int, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback
Copy link
Contributor

jreback commented Jul 3, 2019

this is green. any objections?

@jreback jreback merged commit 96c7ab5 into pandas-dev:master Jul 3, 2019
@jreback
Copy link
Contributor

jreback commented Jul 3, 2019

very nice @jorisvandenbossche

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Shorter default Series/DataFrame repr when truncated
4 participants