ENH (GH6568) Add option info_verbose #6890

bjonen · 2014-04-16T07:41:20Z

This adds a info_verbose to the options. There's a small section in faq and basic introduction. Entry in v0.14 is still missing.

Closes #6568

jreback · 2014-04-16T10:48:58Z

pandas/core/frame.py

-        if self._info_repr():
-            self.info(buf=buf)
+        info_verbose = get_option("display.info_verbose")
+        if self._info_repr() and info_verbose:


here and the next section: just pass verbose=info_verbose rather than having an if/then

jorisvandenbossche · 2014-04-16T11:40:37Z

@jreback I know you were in favor of making it a seperate option instead of adding 'info_short' to display.large_repr, but when looking at it now: if I see the option named info_verbose to be True or False, I expect this sets the default for the info method, while this actually only sets behaviour of info when this is used in the large dataframe repr. So I find this a bit confusing.

what it actually means is large_repr_info_verbose, but this is then a bit a verbose name .. :-)
it could also set the default for df.info(verbose=..) itself? (But I don't know if this is wanted)
maybe nonetheless choose for display.large_repr='info_short' or info_concise? Or do you think this is more confusing?

@bjonen @jreback What do you think?

jreback · 2014-04-16T11:52:23Z

this sets the default for df.info(...)

I agree it's only when the info repr is triggered in the first place

@bjonen if we change this option to really be a subset of large_repr

iow have 3 options: False, True/verbose (verbose=False), concise (verbose=False)

that would work yes?

though I think the default should be concise (so maybe have to fiddle with this a bit to be backward compat)

bjonen · 2014-04-16T12:08:40Z

Thanks for your comments.

I think it's a good idea to directly control df.info(verbose=..). Then it makes sense to have a separate option info_verbose, as it is independent of display.large_repr.

bjonen · 2014-04-16T12:20:10Z

3 options for large_repr works too for me. Why do you prefer to leave the default for df.info unchanged?

bjonen · 2014-04-16T21:47:35Z

@jreback @jorisvandenbossche
Which solution should we go with?

jorisvandenbossche · 2014-04-16T21:56:23Z

So the options would be:

having a info_verbose (True/False) option which would set the default for df.info(verbose=..) itself
adding an option to large_repr ('truncate' (default), 'info', and newly 'info_short' or 'info_concise') that only sets the default for info when called in the repr of df

Personally I don't really have a preference (I won't use any of both options)

jreback · 2014-04-16T23:47:12Z

hmm I think I like the separate option

with a default of False (which is an API change)

jorisvandenbossche · 2014-04-17T06:58:32Z

Why a default of False? Is there a reason to change the behaviour of df.info()?

bjonen · 2014-04-17T08:22:15Z

How do you think we should change the default on df.info?

We could have a wrapper that sets verbose and passes it to info but then we lose the ability to do 'df.info?' in ipython to get the default arguments.
Alternatively we could have something like below and somehow trigger DataFrame.info = DataFrame.info_non_verbose when the option 'info_verbose' changes.

Any thoughts?

    def info_verbose(self, verbose=True, buf=None, max_cols=None):
        self._info(verbose=verbose,buf=buf,max_cols=max_cols)

    def info_non_verbose(self, verbose=False, buf=None, max_cols=None):
        self._info(verbose=verbose,buf=buf,max_cols=max_cols)        

    def _info(self, verbose=True, buf=None, max_cols=None):

jreback · 2014-04-17T11:49:29Z

I am pretty sure the default was False in 0.12
I think got changed inadvertently in 0.13/0.13.1

@bjonen can u see if that was the case?

@jorisvandenbossche this doesn't need to be complicated
df.info() works as is and u can pass a parameter
all the option will do is to set the parameter

bjonen · 2014-04-17T12:51:52Z

Yes it did change in 0.13 but it seems it was intentional: #4886

"df.info() works as is and u can pass a parameter
all the option will do is to set the parameter"

If I understand correctly, this is the version currently implemented in the PR. That means the option value (True/False) will only have an impact when a df is printed and not when one calls df.info() because the default is hard coded.

jreback · 2014-04-17T13:14:08Z

ok the default wasn't meant to change

change the signature to

df.info(verbose=None)

then handle a passed true/false as an override
if none then use the info_verbose option
which I think should default to False

bjonen · 2014-04-17T13:18:16Z

Ok sounds good!

bjonen · 2014-04-17T13:26:23Z

Ok so

we leave large_repr default to 'truncate'. Then we are not in conflict with Rethink when HTML repr of DataFrame is displayed #4886
We change info_verbose to default to False so that when a summary is printed it is the most concise summary.

I'll adapt the PR.

jorisvandenbossche · 2014-04-17T13:32:24Z

@jreback don't you have the max_info_cols option for that? To decide when the short and when the long summary is shown? Wouldn't that conflict with an option to set verbose to True or False?

max_info_columns is used in DataFrame.info method to decide if
        per column information will be printed.

But maybe we are just misinterpreting each other words, as I think the default for verbose in df.info() is already a long time True, see eg docs of 0.7: http://pandas.pydata.org/pandas-docs/version/0.7.0/generated/pandas.DataFrame.info.html

jreback · 2014-04-17T14:08:02Z

hmm don't know

@bjonen can u investigate this?

jreback · 2014-04-21T18:04:45Z

doc/source/basics.rst

@@ -1666,3 +1666,35 @@ columns of DataFrame objects are shown by default. If ``max_columns`` is set to
 0 (the default, in fact), the library will attempt to fit the DataFrame's
 string representation into the current terminal width, and defaulting to the
 summary view otherwise.
+


can you add this same thing (you can copy-paste) to v0.14.0 as its a bit of a change, want to inform users, create a new sub-section (e.g. use ---- under the heading), put after the plotting sub-section (include a pointer to the basics section a ':ref:')

jreback · 2014-04-23T12:34:24Z

doc/source/basics.rst

+.. ipython:: python
+
+    with option_context("display.large_repr",'info'):
+        print df_lrge


use print(df_lrge) (for py3 compat in doc building) instead of print

jreback · 2014-04-23T12:35:34Z

minor edit

can you post at the top of the issue what the 3 cases are (e.g. what the docs are going to show)
(an ipython picture, png)would be even better actually and you could include this in-line

jreback · 2014-04-23T12:46:46Z

@jorisvandenbossche ?

bjonen · 2014-04-23T12:53:36Z

@jorisvandenbossche max_info_columns is currently doing something different. Maybe we can adjust it so that it can play the role of info_verbose. The current behavior doesn't safe any space. Also the default of 100 seems very high to me.

max_columns doesn't seem to have an effect at all under large_repr = 'info'.

In [12]: import pandas as pd

In [13]: df = pd.DataFrame(columns=['a','b','c'],index=pd.DatetimeIndex(start='19900101',end='20000101',freq='BM'))

In [15]: pd.options.display.large_repr = 'info'

In [16]: df
Out[16]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1990-01-31 00:00:00 to 1999-12-31 00:00:00
Freq: BM
Data columns (total 3 columns):
a 0 non-null object
b 0 non-null object
c 0 non-null object
dtypes: object(3)

In [17]: pd.options.display.max_columns = 1

In [18]: df
Out[18]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1990-01-31 00:00:00 to 1999-12-31 00:00:00
Freq: BM
Data columns (total 3 columns):
a 0 non-null object
b 0 non-null object
c 0 non-null object
dtypes: object(3)

In [19]: pd.options.display.max_info_columns = 1

In [20]: df
Out[20]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1990-01-31 00:00:00 to 1999-12-31 00:00:00
Freq: BM
Data columns (total 3 columns):
a object
b object
c object
dtypes: object(3)

In [21]: df.info(False)
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1990-01-31 00:00:00 to 1999-12-31 00:00:00
Freq: BM
Columns: 3 entries, a to c
dtypes: object(3)

@jreback Looking at previous commits in git:

    git grep 'def info\(self, verbose=True.*\):' $(git rev-list --all)

it seems that the default has been True up to now.

jreback · 2014-04-23T12:55:37Z

@bjonen ok on the default then (I think I personally set it to False, but that is fine then)

you need to use a frame with > max_info_columns (e.g. 101) to get an effect

bjonen · 2014-04-23T13:02:14Z

I reset the default below the number of columns (3) in the df (see previous post):

pd.options.display.max_info_columns = 1

jreback · 2014-04-23T13:04:35Z

@bjonen hmm.. see if you can figure out from the tests what it is supposed to do. All of the options have complex interactions. If its a 'bug' would rather fix than add a new option if we can (e.g. you are suggesting that if you have columns > max_info_columns then we basically switch info_verbose to False (instead of actually having an option), right?

bjonen · 2014-04-23T13:06:21Z

Yes, exactly. I'll look into it and let you guys know.

jorisvandenbossche · 2014-04-23T13:26:15Z

@bjonen That max_columns has no effect when large_repr='info' is normal, as this parameter defines how many columns are shown in the default display, not in the info display (for that is max_info_columns).

The strange behaviour of max_info_cols is a bug I think, a regression in current master. As with 0.13 I get:

In [1]: pd.__version__
Out[1]: '0.13.0'

In [5]: df = pd.DataFrame(np.random.randn(5,5))
In [6]: df
Out[6]:
          0         1         2         3         4
0  0.708876 -0.179273  1.367976 -0.929688 -1.138946
1  1.047154  1.049302 -0.248178 -0.957677  1.879843
2 -0.523272 -2.013742  2.064032 -1.389822  1.394960
3  0.224508  1.032544 -1.312425  0.123956  0.144831
4 -1.691660  0.952837  1.380545 -1.279794  1.026131

In [7]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 5 columns):
0    5  non-null values
1    5  non-null values
2    5  non-null values
3    5  non-null values
4    5  non-null values
dtypes: float64(5)

In [8]: pd.options.display.max_info_columns = 4

In [9]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Columns: 5 entries, 0 to 4
dtypes: float64(5)
In [10]:

which is much more logical and in line with the explanation (max_info_columns is used in DataFrame.info method to decide if per column information will be printed.)

But that seems like a seperate issue.

bjonen · 2014-04-23T13:45:54Z

It seems the behavior was introduced in 0.13.1
Sticking with your example:

In [22]: pd.version
Out[22]: '0.13.1'

In [27]: pd.options.display.max_info_columns = 100

In [28]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 5 columns):
0 5 non-null float64
1 5 non-null float64
2 5 non-null float64
3 5 non-null float64
4 5 non-null float64
dtypes: float64(5)
In [29]: pd.options.display.max_info_columns = 1

In [30]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 5 columns):
0 float64
1 float64
2 float64
3 float64
4 float64
dtypes: float64(5)

The explanation for max_info_rows reads:
"df.info() will usually show null-counts for each column. For large frames this can be quite slow. max_info_rows and max_info_cols limit this null check only to frames with smaller dimensions then specified."

So the max_info options as they are implemented right now are concerned with improving display performance and not so much display style.

jreback · 2014-04-23T13:48:37Z

ahh...I do recal this a bit, @y-p put this in IIRC

to not do the non-null check if you have a very large frame that would be displayed in a summary anyhow

but maybe it introduced a bug (as @jorisvandenbossche describes)

can you simply this (w/o creating more havoc!)

jorisvandenbossche · 2014-04-23T13:48:53Z

I just opened a seperate issue (#6939), as I thought this was seperate from this discussion? So can you maybe repeat that overthere?
(But of course, if it is not a regression, it has implications for this issue)

jorisvandenbossche · 2014-04-23T13:53:20Z

it was indeed a pr of @y-p : #5974

jreback · 2014-04-23T14:27:45Z

@bjonen ok....so pls change this PR to close #6939 in addition (its the same fix)

bjonen · 2014-04-23T14:59:42Z

Ok will do.

jreback · 2014-04-28T23:17:13Z

@bjonen any luck with this ?

bjonen · 2014-04-29T18:01:46Z

@jreback I'm currently working on #5603 (comment) .

jreback · 2014-05-08T16:04:13Z

@bjonen coming along?

bjonen · 2014-05-08T16:28:12Z

I'll submit a PR tonight so you see where I am at.

bjonen · 2014-05-09T00:26:17Z

I pushed the current state to https://github.com/bjonen/pandas/commits/adj_trunc. The truncate represantation is generally working. Feel free to check out the displaying of large dfs.

Still there are some tests (mainly in test_frame) not passing. Looking into it...

jreback · 2014-05-14T22:25:15Z

closing in favor or #7130

jreback reviewed Apr 16, 2014
View reviewed changes

bjonen added 3 commits April 19, 2014 21:38

ENH (GH6568) Add option info_verbose

9d3ca80

changed default on frame.info

8bd466a

fixed basics.rst

f4a060e

jreback added API Design labels Apr 21, 2014

jreback added this to the 0.14.0 milestone Apr 21, 2014

jreback reviewed Apr 21, 2014
View reviewed changes

This was referenced Apr 23, 2014

Introduce 'tidy_repr' for DataFrames #6938

Closed

Truncate DataFrames centrally, rather than at one end #5603

Closed

jreback reviewed Apr 23, 2014
View reviewed changes

jreback mentioned this pull request Apr 23, 2014

BUG: regression in max_info_columns behaviour? #6939

Closed

2 tasks

jreback mentioned this pull request Apr 23, 2014

0.13.1: info(verbose=True) does not return non-null counts for large DataFrames #6940

Closed

jreback closed this May 14, 2014

ENH (GH6568) Add option info_verbose #6890

ENH (GH6568) Add option info_verbose #6890

Conversation

bjonen commented Apr 16, 2014

jreback Apr 16, 2014

Choose a reason for hiding this comment

jorisvandenbossche commented Apr 16, 2014

jreback commented Apr 16, 2014

bjonen commented Apr 16, 2014

bjonen commented Apr 16, 2014

bjonen commented Apr 16, 2014

jorisvandenbossche commented Apr 16, 2014

jreback commented Apr 16, 2014

jorisvandenbossche commented Apr 17, 2014

bjonen commented Apr 17, 2014

jreback commented Apr 17, 2014

bjonen commented Apr 17, 2014

jreback commented Apr 17, 2014

bjonen commented Apr 17, 2014

bjonen commented Apr 17, 2014

jorisvandenbossche commented Apr 17, 2014

jreback commented Apr 17, 2014

jreback Apr 21, 2014

Choose a reason for hiding this comment

jreback Apr 23, 2014

Choose a reason for hiding this comment

jreback commented Apr 23, 2014

jreback commented Apr 23, 2014

bjonen commented Apr 23, 2014

jreback commented Apr 23, 2014

bjonen commented Apr 23, 2014

jreback commented Apr 23, 2014

bjonen commented Apr 23, 2014

jorisvandenbossche commented Apr 23, 2014

bjonen commented Apr 23, 2014

jreback commented Apr 23, 2014

jorisvandenbossche commented Apr 23, 2014

jorisvandenbossche commented Apr 23, 2014

jreback commented Apr 23, 2014

bjonen commented Apr 23, 2014

jreback commented Apr 28, 2014

bjonen commented Apr 29, 2014

jreback commented May 8, 2014

bjonen commented May 8, 2014

bjonen commented May 9, 2014

jreback commented May 14, 2014