BUG: fix df repr troubles #3395

lodagro · 2013-04-18T20:37:48Z

My apologies for the mess i created on the dataframe repr. Not only did i get the non-interactive behavior wrong, i also introduced a performance issue in the interactive mode.

This PR should make my wrong, right again. In non-interactive mode i re-enabled concise formats and made sure that auto-terminal-size detection is not used (little test added), also performance issue reported in #3337 and #3373 is fixed (added a benchmark for it).

ghost · 2013-04-19T13:23:03Z

@jreback , @hayd ,@lodagro. Quick quiz.

open up qtconsole / ipnb. how many rows and cols fit on your default display?

for width:

print "1"*100

for height:

range(60)

Does one of those scroll off the screen for you? a sample size of 4 ought to be
statistically significant.

what about height=100, does that fit on your screen?

jreback · 2013-04-19T13:28:37Z

monitor is 24", range(75) fills height, "1"*300 fills width (courier fixed font), this is an xterm

ghost · 2013-04-19T13:29:53Z

24"? nice.

but on that size screen you usually have 2 windows paneled, right? so say half that
width, which means width 100 is fine, and height 100 is too much.

jreback · 2013-04-19T13:32:01Z

actually don't use 2 side by side,I overlap say about 3/4 width, but your assumption is prob about right

ghost · 2013-04-19T13:37:05Z

I usually work in something like w=120,h=70. so the current (and new defaults)
are wrong for me too.

jreback · 2013-04-19T13:38:39Z

isn't this the get_terminal_size feature? (or its not 'accurate'/reliable the issue)?

ghost · 2013-04-19T13:39:33Z

Exactly, we use hardwired values for qtconsole (and soon ipnb) because get_terminal_size()
fails for them , and I think 0.10.1 and updated values in this PR both get it wrong.

cpcloud · 2013-04-19T13:45:19Z

This issue lies with finding the length of the largest row repr which seems difficult to do without introducing a performance hit.

ghost · 2013-04-19T13:48:44Z

the great majority of cases where the frame is really wide and this becomes slow,
is due to many columns, not a lot of data in a small number of them. max_cols
catches that, and this PR now checks that first. so, reduced surface area at least.

lodagro · 2013-04-19T14:47:00Z

I am typically running ipython in a full screen terminal: width x heigth = 260 x 65, 20`` monitor 1600x1200.
And the frames i use often are more than 200 display columns wide. Defaults are/were not ok for me either. I use the auto detect horizontally, which was the default a long time ago, not any more now.

Using default options, get_terminal_size() is never used. Only when requested by user, doing so is only interesting in a real terminal as indicated in the method docstring and display option descriptions. This is what i do.
Hard wired values (out of user control) are only used in non-interactive mode and if user enabled auto-detection. For this case auto-detect values are overruled with hard-coded values (as requested in #1610). In all other scenarios user can set/control dimensions.

Current repr performance is in line/better compared to v0.10.1 (there are only two benchmarks but they cover a frame that is too wide and one too high (a new benchmark i added)).

So what do you think would make more sense to have as defaults for display.width, heigth, max_rows, max_columns? Auto detect as default can not be done, since this only works in a real terminal, other than than i don`t care what defaults are.

ghost · 2013-04-19T14:51:49Z

W=100, H=60 seems like it would would work beter for everyone so far ,where
the current default don't work for everyone so far.

aside, I'm pretty sure this PR rebreaks #2275.
if the user sets height and width to auto-detect in qtconsole, you get back the terminal
height and width in which qtconsole was launched.

lodagro · 2013-04-19T14:57:11Z

If no other proposals i will update defaults to suggestion of @y-p.

#2275 should still be ok, fix to that PR was to disable html repr in qtconsole, this is still the case, however i think this is no longer needed. I will test it later today.

cpcloud · 2013-04-19T15:14:01Z

@y-p not sure I understand. If you try making a frame with many rows you'll see that it's slow to repr because to_string is called on the frame and then that is split by newlines and then the max of the length of each resulting list is computed. This is horribly slow for a frame with, say, 1e6 rows. Proof:

ghost · 2013-04-19T15:39:24Z

Are you using master or this PR branch (yet to be merged?)

In [1]: df =DataFrame(randn(10000,1).T)

In [2]: df.shape
Out[2]: (1, 10000)

In [3]: timeit repr(df)
1000 loops, best of 3: 929 us per loop

In [4]: timeit repr(df.T)
1000 loops, best of 3: 1.05 ms per loop

In [5]:

ghost · 2013-04-19T15:49:24Z

If len(self.index) > reasonable_threshold:
    fail_fast
If len(self.columns) > reasonable_threshold:
    fail_fast
else:
   slow_op

Covers most real cases I should think, but I guess there will be
pathological cases, when cells contains much data. I seem to recall
... here it is:

In [6]: pd.describe_option('max_colwidth')
display.max_colwidth: [default: 50] [currently: 50]
: int
        The maximum width in characters of a column in the repr of
        a pandas data structure. When the column overflows, a "..."
        placeholder is embedded in the output.

Would be glad to learn of a corner case we've missed, if you find one.

jreback · 2013-04-19T15:53:29Z

if you hit the slow_op part you coudl do a sampling of the data (say 10%) random to see if they are too wide/long (so you can fail_fast)?

lodagro · 2013-04-19T18:36:24Z

Two screenshots with the qtconsole. For both i launched the qtconsole in a tiny terminal and both behave as expected, fact that qtconsole is launched in tiny terminal has no impact.

Current state of the code.

Adapt code to re-enabled the html repr.

ghost · 2013-04-20T06:52:15Z

That's because of the default values for width/height. set them to 0 and try again.
for the same reason, scripts with wide output now get their output wrapped using expand_repr,
which is a change from 0.10.1, if anyone relied on that behaviour.
And so in a way that's a regression of #1610 .

Note: The expand_repr output in script mode is actually very nice, just concerned
about yet another change in established behavior.

ghost · 2013-04-22T11:50:09Z

I'll be merging this today with some fixes.

ghost · 2013-04-22T14:38:00Z

fixed a Misleading __str__/__repr__ for DatatimeIndex objects with less than 3 items #1611 related regression, the (new) test was passing by accident as
the df dimensions were too small to trigger the fail with the defaults for height/width.
width/height auto-detect is now activated by None rather then 0.
(made it easier to handle the corner cases, and cleaner I think)
re-enabled HTML for qtconsole (I completely agree with @lodagro,
this should be user controlled)
adjusted defaults of height,width to 80,80. Ipython's notebook defaults to an unusually
large font, so the 80 (rather then 100 as I suggested) in the PR works better.
ipnb and ipq are now treated equally. this ia a long overdue oversight.
The hardcoded default values for dimensions now reuse the defaults
of disp.height/width.
ip qtconsole was special cased by mistake, when the original Issue was addressed.
optimized repr_fits_horizontal, beyond @lodagro's fix, the width
check now only generates the repr for the slice that's about to be printed.

Beyond the test suite, I tested in terminal ipython, qtconsole,ipnb and script
form with various options, cleared up whatever I found.

I'm uneasy about so much churn and subtlety going on at the RC phase,
would have preferred to have it all take place in 0.12. But enough is enough.

b9fa04a

BUG: fix df repr troubles

d9fcd2d

ghost mentioned this pull request Apr 20, 2013

Perf check on upcoming v0.11.0 vs 0.10.1 #3326

Closed

ghost closed this Apr 22, 2013

hayd mentioned this pull request May 7, 2013

max_rows seems to have max value of 60 #3541

Closed

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: fix df repr troubles #3395

BUG: fix df repr troubles #3395

lodagro commented Apr 18, 2013

ghost commented Apr 19, 2013

jreback commented Apr 19, 2013

ghost commented Apr 19, 2013

jreback commented Apr 19, 2013

ghost commented Apr 19, 2013

jreback commented Apr 19, 2013

ghost commented Apr 19, 2013

cpcloud commented Apr 19, 2013

ghost commented Apr 19, 2013

lodagro commented Apr 19, 2013

ghost commented Apr 19, 2013

lodagro commented Apr 19, 2013

cpcloud commented Apr 19, 2013

ghost commented Apr 19, 2013

ghost commented Apr 19, 2013

jreback commented Apr 19, 2013

lodagro commented Apr 19, 2013

ghost commented Apr 20, 2013

ghost commented Apr 22, 2013

ghost commented Apr 22, 2013

BUG: fix df repr troubles #3395

BUG: fix df repr troubles #3395

Conversation

lodagro commented Apr 18, 2013

ghost commented Apr 19, 2013

jreback commented Apr 19, 2013

ghost commented Apr 19, 2013

jreback commented Apr 19, 2013

ghost commented Apr 19, 2013

jreback commented Apr 19, 2013

ghost commented Apr 19, 2013

cpcloud commented Apr 19, 2013

ghost commented Apr 19, 2013

lodagro commented Apr 19, 2013

ghost commented Apr 19, 2013

lodagro commented Apr 19, 2013

cpcloud commented Apr 19, 2013

ghost commented Apr 19, 2013

ghost commented Apr 19, 2013

jreback commented Apr 19, 2013

lodagro commented Apr 19, 2013

ghost commented Apr 20, 2013

ghost commented Apr 22, 2013

ghost commented Apr 22, 2013