Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH (GH6568) Add option info_verbose #6890

Closed
wants to merge 3 commits into from
Closed

Conversation

bjonen
Copy link
Contributor

@bjonen bjonen commented Apr 16, 2014

This adds a info_verbose to the options. There's a small section in faq and basic introduction. Entry in v0.14 is still missing.

Closes #6568

if self._info_repr():
self.info(buf=buf)
info_verbose = get_option("display.info_verbose")
if self._info_repr() and info_verbose:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and the next section: just pass verbose=info_verbose rather than having an if/then

@jorisvandenbossche
Copy link
Member

@jreback I know you were in favor of making it a seperate option instead of adding 'info_short' to display.large_repr, but when looking at it now: if I see the option named info_verbose to be True or False, I expect this sets the default for the info method, while this actually only sets behaviour of info when this is used in the large dataframe repr. So I find this a bit confusing.

  • what it actually means is large_repr_info_verbose, but this is then a bit a verbose name .. :-)
  • it could also set the default for df.info(verbose=..) itself? (But I don't know if this is wanted)
  • maybe nonetheless choose for display.large_repr='info_short' or info_concise? Or do you think this is more confusing?

@bjonen @jreback What do you think?

@jreback
Copy link
Contributor

jreback commented Apr 16, 2014

this sets the default for df.info(...)

I agree it's only when the info repr is triggered in the first place

@bjonen if we change this option to really be a subset of large_repr

iow have 3 options: False, True/verbose (verbose=False), concise (verbose=False)

that would work yes?

though I think the default should be concise (so maybe have to fiddle with this a bit to be backward compat)

@bjonen
Copy link
Contributor Author

bjonen commented Apr 16, 2014

Thanks for your comments.

I think it's a good idea to directly control df.info(verbose=..). Then it makes sense to have a separate option info_verbose, as it is independent of display.large_repr.

@bjonen
Copy link
Contributor Author

bjonen commented Apr 16, 2014

3 options for large_repr works too for me. Why do you prefer to leave the default for df.info unchanged?

@bjonen
Copy link
Contributor Author

bjonen commented Apr 16, 2014

@jreback @jorisvandenbossche
Which solution should we go with?

@jorisvandenbossche
Copy link
Member

So the options would be:

  • having a info_verbose (True/False) option which would set the default for df.info(verbose=..) itself
  • adding an option to large_repr ('truncate' (default), 'info', and newly 'info_short' or 'info_concise') that only sets the default for info when called in the repr of df

Personally I don't really have a preference (I won't use any of both options)

@jreback
Copy link
Contributor

jreback commented Apr 16, 2014

hmm I think I like the separate option

with a default of False (which is an API change)

@jorisvandenbossche
Copy link
Member

Why a default of False? Is there a reason to change the behaviour of df.info()?

@bjonen
Copy link
Contributor Author

bjonen commented Apr 17, 2014

How do you think we should change the default on df.info?

  • We could have a wrapper that sets verbose and passes it to info but then we lose the ability to do 'df.info?' in ipython to get the default arguments.
  • Alternatively we could have something like below and somehow trigger DataFrame.info = DataFrame.info_non_verbose when the option 'info_verbose' changes.

Any thoughts?

    def info_verbose(self, verbose=True, buf=None, max_cols=None):
        self._info(verbose=verbose,buf=buf,max_cols=max_cols)

    def info_non_verbose(self, verbose=False, buf=None, max_cols=None):
        self._info(verbose=verbose,buf=buf,max_cols=max_cols)        

    def _info(self, verbose=True, buf=None, max_cols=None):

@jreback
Copy link
Contributor

jreback commented Apr 17, 2014

I am pretty sure the default was False in 0.12
I think got changed inadvertently in 0.13/0.13.1

@bjonen can u see if that was the case?

@jorisvandenbossche this doesn't need to be complicated
df.info() works as is and u can pass a parameter
all the option will do is to set the parameter

@bjonen
Copy link
Contributor Author

bjonen commented Apr 17, 2014

Yes it did change in 0.13 but it seems it was intentional: #4886

"df.info() works as is and u can pass a parameter
all the option will do is to set the parameter"

If I understand correctly, this is the version currently implemented in the PR. That means the option value (True/False) will only have an impact when a df is printed and not when one calls df.info() because the default is hard coded.

@jreback
Copy link
Contributor

jreback commented Apr 17, 2014

ok the default wasn't meant to change

change the signature to

df.info(verbose=None)

then handle a passed true/false as an override
if none then use the info_verbose option
which I think should default to False

@bjonen
Copy link
Contributor Author

bjonen commented Apr 17, 2014

Ok sounds good!

@bjonen
Copy link
Contributor Author

bjonen commented Apr 17, 2014

Ok so

I'll adapt the PR.

@jorisvandenbossche
Copy link
Member

@jreback don't you have the max_info_cols option for that? To decide when the short and when the long summary is shown? Wouldn't that conflict with an option to set verbose to True or False?

max_info_columns is used in DataFrame.info method to decide if
        per column information will be printed.

But maybe we are just misinterpreting each other words, as I think the default for verbose in df.info() is already a long time True, see eg docs of 0.7: http://pandas.pydata.org/pandas-docs/version/0.7.0/generated/pandas.DataFrame.info.html

@jreback
Copy link
Contributor

jreback commented Apr 17, 2014

hmm don't know

@bjonen can u investigate this?

@jreback jreback added this to the 0.14.0 milestone Apr 21, 2014
@@ -1666,3 +1666,35 @@ columns of DataFrame objects are shown by default. If ``max_columns`` is set to
0 (the default, in fact), the library will attempt to fit the DataFrame's
string representation into the current terminal width, and defaulting to the
summary view otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this same thing (you can copy-paste) to v0.14.0 as its a bit of a change, want to inform users, create a new sub-section (e.g. use ---- under the heading), put after the plotting sub-section (include a pointer to the basics section a ':ref:')

.. ipython:: python

with option_context("display.large_repr",'info'):
print df_lrge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use print(df_lrge) (for py3 compat in doc building) instead of print

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

minor edit

can you post at the top of the issue what the 3 cases are (e.g. what the docs are going to show)
(an ipython picture, png)would be even better actually and you could include this in-line

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

@jorisvandenbossche ?

@bjonen
Copy link
Contributor Author

bjonen commented Apr 23, 2014

@jorisvandenbossche max_info_columns is currently doing something different. Maybe we can adjust it so that it can play the role of info_verbose. The current behavior doesn't safe any space. Also the default of 100 seems very high to me.

max_columns doesn't seem to have an effect at all under large_repr = 'info'.

In [12]: import pandas as pd

In [13]: df = pd.DataFrame(columns=['a','b','c'],index=pd.DatetimeIndex(start='19900101',end='20000101',freq='BM'))

In [15]: pd.options.display.large_repr = 'info'

In [16]: df
Out[16]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1990-01-31 00:00:00 to 1999-12-31 00:00:00
Freq: BM
Data columns (total 3 columns):
a 0 non-null object
b 0 non-null object
c 0 non-null object
dtypes: object(3)

In [17]: pd.options.display.max_columns = 1

In [18]: df
Out[18]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1990-01-31 00:00:00 to 1999-12-31 00:00:00
Freq: BM
Data columns (total 3 columns):
a 0 non-null object
b 0 non-null object
c 0 non-null object
dtypes: object(3)

In [19]: pd.options.display.max_info_columns = 1

In [20]: df
Out[20]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1990-01-31 00:00:00 to 1999-12-31 00:00:00
Freq: BM
Data columns (total 3 columns):
a object
b object
c object
dtypes: object(3)

In [21]: df.info(False)
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1990-01-31 00:00:00 to 1999-12-31 00:00:00
Freq: BM
Columns: 3 entries, a to c
dtypes: object(3)

@jreback Looking at previous commits in git:

    git grep 'def info\(self, verbose=True.*\):' $(git rev-list --all) 

it seems that the default has been True up to now.

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

@bjonen ok on the default then (I think I personally set it to False, but that is fine then)

you need to use a frame with > max_info_columns (e.g. 101) to get an effect

@bjonen
Copy link
Contributor Author

bjonen commented Apr 23, 2014

I reset the default below the number of columns (3) in the df (see previous post):

pd.options.display.max_info_columns = 1

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

@bjonen hmm.. see if you can figure out from the tests what it is supposed to do. All of the options have complex interactions. If its a 'bug' would rather fix than add a new option if we can (e.g. you are suggesting that if you have columns > max_info_columns then we basically switch info_verbose to False (instead of actually having an option), right?

@bjonen
Copy link
Contributor Author

bjonen commented Apr 23, 2014

Yes, exactly. I'll look into it and let you guys know.

@jorisvandenbossche
Copy link
Member

@bjonen That max_columns has no effect when large_repr='info' is normal, as this parameter defines how many columns are shown in the default display, not in the info display (for that is max_info_columns).

The strange behaviour of max_info_cols is a bug I think, a regression in current master. As with 0.13 I get:

In [1]: pd.__version__
Out[1]: '0.13.0'

In [5]: df = pd.DataFrame(np.random.randn(5,5))
In [6]: df
Out[6]:
          0         1         2         3         4
0  0.708876 -0.179273  1.367976 -0.929688 -1.138946
1  1.047154  1.049302 -0.248178 -0.957677  1.879843
2 -0.523272 -2.013742  2.064032 -1.389822  1.394960
3  0.224508  1.032544 -1.312425  0.123956  0.144831
4 -1.691660  0.952837  1.380545 -1.279794  1.026131

In [7]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 5 columns):
0    5  non-null values
1    5  non-null values
2    5  non-null values
3    5  non-null values
4    5  non-null values
dtypes: float64(5)

In [8]: pd.options.display.max_info_columns = 4

In [9]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Columns: 5 entries, 0 to 4
dtypes: float64(5)
In [10]:

which is much more logical and in line with the explanation (max_info_columns is used in DataFrame.info method to decide if per column information will be printed.)

But that seems like a seperate issue.

@bjonen
Copy link
Contributor Author

bjonen commented Apr 23, 2014

It seems the behavior was introduced in 0.13.1
Sticking with your example:

In [22]: pd.version
Out[22]: '0.13.1'

In [27]: pd.options.display.max_info_columns = 100

In [28]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 5 columns):
0 5 non-null float64
1 5 non-null float64
2 5 non-null float64
3 5 non-null float64
4 5 non-null float64
dtypes: float64(5)
In [29]: pd.options.display.max_info_columns = 1

In [30]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 5 columns):
0 float64
1 float64
2 float64
3 float64
4 float64
dtypes: float64(5)

The explanation for max_info_rows reads:
"df.info() will usually show null-counts for each column. For large frames this can be quite slow. max_info_rows and max_info_cols limit this null check only to frames with smaller dimensions then specified."

So the max_info options as they are implemented right now are concerned with improving display performance and not so much display style.

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

ahh...I do recal this a bit, @y-p put this in IIRC

to not do the non-null check if you have a very large frame that would be displayed in a summary anyhow

but maybe it introduced a bug (as @jorisvandenbossche describes)

can you simply this (w/o creating more havoc!)

@jorisvandenbossche
Copy link
Member

I just opened a seperate issue (#6939), as I thought this was seperate from this discussion? So can you maybe repeat that overthere?
(But of course, if it is not a regression, it has implications for this issue)

@jorisvandenbossche
Copy link
Member

it was indeed a pr of @y-p : #5974

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

@bjonen ok....so pls change this PR to close #6939 in addition (its the same fix)

@bjonen
Copy link
Contributor Author

bjonen commented Apr 23, 2014

Ok will do.

@jreback
Copy link
Contributor

jreback commented Apr 28, 2014

@bjonen any luck with this ?

@bjonen
Copy link
Contributor Author

bjonen commented Apr 29, 2014

@jreback I'm currently working on #5603 (comment) .

@jreback
Copy link
Contributor

jreback commented May 8, 2014

@bjonen coming along?

@bjonen
Copy link
Contributor Author

bjonen commented May 8, 2014

I'll submit a PR tonight so you see where I am at.

@bjonen
Copy link
Contributor Author

bjonen commented May 9, 2014

I pushed the current state to https://github.com/bjonen/pandas/commits/adj_trunc. The truncate represantation is generally working. Feel free to check out the displaying of large dfs.

Still there are some tests (mainly in test_frame) not passing. Looking into it...

@jreback
Copy link
Contributor

jreback commented May 14, 2014

closing in favor or #7130

@jreback jreback closed this May 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option to set large_repr to info(verbose=False) missing
3 participants