-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML (and text) reprs for large dataframes. #5550
Conversation
I've fixed the wide display issue - it wasn't truncating the extra row added for the row index names. The empty dataframe repr is the same as what's displayed in my system installation of pandas - I agree that it's a bit odd, but I think that's a separate issue. I agree, the behaviour of plain text reprs should be similar. Do you want me to tackle that in this PR, or separately? |
this is related to #1889 as well |
That'd be great, I think it fits here ok. Not sure about putting this in 0.13, I've had bad luck with last minute changes to |
I think this is fine with a couple of minor issues: need a mention in v0.13.0 and main docs the hard codes for max rows / columns should come purely from the options and not hard code the functions |
I also mentioned it in the issue, but what do you think of showing |
Makes sense to me, but can be done in a subsequent PR if needs be. |
I have:
|
|
I've had a brief look at #1610 - I don't think this should cause a regression, because |
I was very wrong with my initial objections, this is just great. It's impossible to set default values for for Regardless - tested this and liked it, +1 to merge. @jreback, any more issues to address before the green button? |
is it possible to have an option to do the exisiting behavior , but default to the new maybe if its easy I would add this to provide back compat, if not then ok (w/o going back to @y-p admitted 'ugliest' code) |
in the terminal, with |
I think it should be easy enough to have an option to revert to the old behaviour (at least roughly - I'd rather not restore Truncation to terminal width is harder, because that would have to propagate down into the actual formatting code, and no doubt deal with various corner cases. |
Added the option in the form I described in my last message. |
When truncating, having a footer with total row count would eliminate the need Edit: as a header is probably better, since in ipnb you may be forced to scroll down manually to |
I played around with some different options: showing it below the table looked more natural, and I opted to show it whether or not the table is truncated. The format is "61 rows × 26 columns". In the terminal, it shows up in [square brackets] to highlight that it's not part of the table. |
The failing test attempts to roundtrip a dataframe to and from the clipboard. It tests various ways of doing this, but one of them (passing Should we attempt to fix that, or simply remove the code path that writes |
I've made the clipboard use However, now I appear to have a merge conflict. What's the preferred strategy for pandas: rebase, merge into my branch, or let whoever merges the PR handle it? |
you need to clear merge conflicts via rebasing |
see here: https://github.com/pydata/pandas/wiki/Using-Git will need you to squash down before merging a well |
Rebased, squashing a couple of commits where I had undone some change. |
Mercilessly squashing to 1 commit will make life a easier imo... @jreback perhaps we should add that to wiki? |
sure feel free to update/expand wiki |
I don't follow why squashing the whole PR to one commit would be useful. It seems to defeat the point of a DVCS. |
OK, great. Here's a more prominent section in the release notes, including a little picture. |
HTML reprs for large dataframes.
Merged. Thanks @takluyver. |
:-) Thanks everyone for the review and improvements. |
docs on the web are built at 5pm est pls review the changes and make sure they look right thanks again |
Nearly right - there should be an image here: http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#dataframe-repr-changes . I realise now that I didn't check it in. |
no that's right |
Just check it in (there are a few other static images there). The folder is ignored because all the generated plots are stored there. |
So, should we change the defaults for |
The image is now PR #5594. I might consider bumping the default max_columns down a bit, because I think in most real examples, 20 columns is very wide. Then again, when I open a blank spreadsheet, I see 20 columns, and I think it's more annoying to hide columns than to hide rows, so I'm not sure that it should change. |
Has anyone had some performance issues with this on large DataFrames in the IPython notebook? It doesn't take long at all in terminal, and I don't use the qtconsole. I don't mind, but I wanted people to be aware. |
this should be ok on master (as it doesn't display all the rows), unless you have max_rows set to some big number |
My That's why I was surprised it was taking longer on large frames. |
I'm doing some timing right now to dig into it (I'll put up a notebook). |
I guess it's a bit tricky to profile reprs. I'll come back to this later. I can say that its a lot quicker just on a random frame. My example a had MultiIndex. |
confirmed, we fixed that bug for the Index case, but I missed the MultiIndex equivalent. good catch. |
Once again, the wisdom of not merging things right before a release (and vice versa) shines through. |
Should be fixed, add vbenches. |
... I kinda like this phase of the release cycle:
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
|
:) |
As discussed in #4886, the HTML representation of DataFrames currently starts off as a table, but switches to the condensed info view if the table exceeds a certain size (by default, more than 60 rows or 20 columns). I've seen this confusing users, who think that they suddenly have a completely different kind of object, and don't understand why.
With these changes, the HTML repr always displays the table, but truncates it when it exceeds a certain size. It reuses the same options,
display.max_rows
anddisplay.max_columns
.Before:
After: