-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix df repr troubles #3395
Conversation
@jreback , @hayd ,@lodagro. Quick quiz. open up qtconsole / ipnb. how many rows and cols fit on your default display? for width:
for height:
Does one of those scroll off the screen for you? a sample size of 4 ought to be what about height=100, does that fit on your screen? |
monitor is 24", range(75) fills height, "1"*300 fills width (courier fixed font), this is an xterm |
24"? nice. but on that size screen you usually have 2 windows paneled, right? so say half that |
actually don't use 2 side by side,I overlap say about 3/4 width, but your assumption is prob about right |
I usually work in something like w=120,h=70. so the current (and new defaults) |
isn't this the get_terminal_size feature? (or its not 'accurate'/reliable the issue)? |
Exactly, we use hardwired values for qtconsole (and soon ipnb) because get_terminal_size() |
This issue lies with finding the length of the largest row repr which seems difficult to do without introducing a performance hit. |
the great majority of cases where the frame is really wide and this becomes slow, |
I am typically running ipython in a full screen terminal: width x heigth = 260 x 65, 20`` monitor 1600x1200. Using default options, get_terminal_size() is never used. Only when requested by user, doing so is only interesting in a real terminal as indicated in the method docstring and display option descriptions. This is what i do. Current repr performance is in line/better compared to v0.10.1 (there are only two benchmarks but they cover a frame that is too wide and one too high (a new benchmark i added)). So what do you think would make more sense to have as defaults for display.width, heigth, max_rows, max_columns? Auto detect as default can not be done, since this only works in a real terminal, other than than i don`t care what defaults are. |
W=100, H=60 seems like it would would work beter for everyone so far ,where aside, I'm pretty sure this PR rebreaks #2275. |
@y-p not sure I understand. If you try making a frame with many rows you'll see that it's slow to repr because to_string is called on the frame and then that is split by newlines and then the max of the length of each resulting list is computed. This is horribly slow for a frame with, say, 1e6 rows. Proof: |
Are you using master or this PR branch (yet to be merged?) In [1]: df =DataFrame(randn(10000,1).T)
In [2]: df.shape
Out[2]: (1, 10000)
In [3]: timeit repr(df)
1000 loops, best of 3: 929 us per loop
In [4]: timeit repr(df.T)
1000 loops, best of 3: 1.05 ms per loop
In [5]: |
Covers most real cases I should think, but I guess there will be
Would be glad to learn of a corner case we've missed, if you find one. |
if you hit the slow_op part you coudl do a sampling of the data (say 10%) random to see if they are too wide/long (so you can fail_fast)? |
That's because of the default values for width/height. set them to 0 and try again. Note: The expand_repr output in script mode is actually very nice, just concerned |
I'll be merging this today with some fixes. |
Beyond the test suite, I tested in terminal ipython, qtconsole,ipnb and script I'm uneasy about so much churn and subtlety going on at the RC phase, |
My apologies for the mess i created on the dataframe repr. Not only did i get the non-interactive behavior wrong, i also introduced a performance issue in the interactive mode.
This PR should make my wrong, right again. In non-interactive mode i re-enabled concise formats and made sure that auto-terminal-size detection is not used (little test added), also performance issue reported in #3337 and #3373 is fixed (added a benchmark for it).