Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New DataFrame display information? #6547

Closed
jseabold opened this issue Mar 5, 2014 · 29 comments · Fixed by #7108
Closed

New DataFrame display information? #6547

jseabold opened this issue Mar 5, 2014 · 29 comments · Fixed by #7108
Labels
Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@jseabold
Copy link
Contributor

jseabold commented Mar 5, 2014

We are using DataFrames to hold information in a lot places now. E.g., ANOVA tables

            df  sum_sq     mean_sq          F    PR(>F)
C(Fitness)   2     672  336.000000  16.961538  0.000041
Residual    21     416   19.809524        NaN       NaN

[2 rows x 5 columns] 

Where'd that bottom [] business come from? Is this new? Isn't this something that's better included in the info method?

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

started in 0.13, you can turn off if you want by: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#output-formatting-enhancements (I think set show_dimensions to False)

@jseabold
Copy link
Contributor Author

jseabold commented Mar 5, 2014

I figured you could, but that's a global option that we can't control after we pass back to a user. We're trying more and more to take advantage of DataFrames as containers to display information, to_latex, to_csv, etc. and this is just noise here. It is descriptive information, which is why I think it should be in info just not in the __repr__. If this is going to stay, then I'm going to have to go in and change all of our use of DataFrames as container/display objects, because it's pretty ugly for tables of information IMO.

@jseabold
Copy link
Contributor Author

jseabold commented Mar 5, 2014

Also, .shape is just a tab-complete away...

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

this is only for interactive display, e.g. __repr__ so not sure what the issue is

its actually quite helpful when you have the default to display the actual data, but the frame itself is truncated (e.g. you are not using info, which you always have the option to do of course)

@jseabold
Copy link
Contributor Author

jseabold commented Mar 5, 2014

Yes, I'm not saying it affects anything else. I'm saying that we like those features, but we also like a clean __repr__. In the case when we're using it as a table-like container, e.g., in ANOVA tables above, the clean __repr__ trumps everything for me. Trying quickly to scan a bunch of ANOVA tables, I'm interrupted with this noise about the rows and columns and it makes interactive use more difficult to the point where I'd rather fall back on our old, horrible to use summary tables. IIRC, we also have a PR to move our summary tables partially into DataFrames, which will also be affected.

It used to be that by default when the frame was truncated, that you got the info() view, which I preferred, but can live without. Why not show this new information only if the DataFrame is, in fact, truncated?

But, again, as you pointed out, you always have info, so you can just call it. And shapes is also just one tab-completion away. It just seems that there are so many other ways to get this information than forcing it into the default __repr__ of everything.

I hate to nit-pick here, but this is a pretty jarring visual change from what I'm used to, and there's not much gained.

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

I think their was a bit of discussion on the linked issues. http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#dataframe-repr-changes.

I personally like it, but I can see your point. Hard to have everyone like everything all the time!

@cancan101
Copy link
Contributor

@jseabold Why not just return a subclass of DataFrame with the repr methods changed?

Maybe the show_dimensions = get_option("display.show_dimensions") call should be moved to an instance methods making changing the behavior in a subclass easier?

@jseabold
Copy link
Contributor Author

jseabold commented Mar 6, 2014

IMO the case for including that extra information has not been made well enough for me to start subclassing DataFrames to avoid seeing a piece of information I don't want or need. Are there any other data structures out there that do this? Matlab? data.frames? Numpy arrays? I think scipy.sparse matrices do, but are we really saying that a DataFrame that's truncated on output is like a sparse matrix? Originally, DFs showed the info view. But sometimes you want to peak at the data. So it grew a head and tail like data.frames. There's also shape. Is anyone really printing out the full __repr__ of a DataFrame just to find out about the rows and columns? This just seems like an addition for the sake of having an addition to me. No real benefit, potential negative side effects (more noise).

@cancan101
Copy link
Contributor

@jseabold My opinion is that when no truncation of the dataframe occurs, printing the dimensions is in fact noise. When truncation does occur, I do believe that it makes sense to print the dimensions of the full frame.

@jseabold
Copy link
Contributor Author

jseabold commented Mar 6, 2014

Yeah, I could live with that. I won't back up to argue whether there should even be this ... truncation vs just info, head, and tail :)

@cancan101
Copy link
Contributor

@jreback What do you think about that?

@jreback
Copy link
Contributor

jreback commented Mar 6, 2014

I think these discussions are personal preferences and have long been discussed
you can simply set options to get the prior behavior if u want

no one is ever going to be happy

so will wait for more consensus to potentially change this again

@jseabold
Copy link
Contributor Author

jseabold commented Mar 6, 2014

Ok, then here's another gripe. Discussions on github are not public enough IMO. There is a lot of a github noise from pandas. I try my best to keep up with development, but I don't see this until I install it. It seems to me, you're saying "I participated in the discussion, and it went on for a long time, so that's that."

This was discussed by 3 people on #4886 and #5550. @y-p made some strenuous initial objections (which are in line with my initial reaction) and then withdrew them. Particularly this bit

I strongly urge conducting a small usability study (have a few users adopt it for a week 
and report) before making potentially disruptive change like this to UX.

There are several other sensible comments to this effect.

It would've been great to see a ping on the mailing list about this. Something. Anything. Maybe I missed it and it's my fault. But, hey, let's think of this release as the usability study.

Coming back to the point, the issue of the footer was discussed only in passing on #5550. And here's what it says (my emphasis).

**When truncating**, having a footer with total row count would eliminate the need 
to use df.info in many cases and so reduce the impact of the change on existing users. 
(For example, after filtering a frame you're often interested in the size of the result).

@jreback
Copy link
Contributor

jreback commented Mar 6, 2014

@jseabold all for more discussion

esp on UX

pls post an issue to the mailing list if u would then

@jreback jreback added this to the 0.14.0 milestone Mar 6, 2014
@dhirschfeld
Copy link
Contributor

FWIW, I completely agree with @jseabold that github discussions aren't really sufficient notification of any large or potentially breaking changes. A case in point - I knew nothing about these UX changes until I upgraded and am only here now because I saw Skippers post to the mailing list.

I have no strong opinion about these changes though Skippers suggestion of only printing the shape when truncated sounds sensible to me.

One thing I've noticed is that the truncated view displays the first n rows and columns whereas I'd vastly prefer displaying the first and last n/2. Since I missed the discussion and the changes had already been made I didn't bother piping up.

@jreback
Copy link
Contributor

jreback commented Mar 7, 2014

@dhirschfeld funny thing is I just referened your comment on the ML...hahaha...

their is an issue to do exactly this type of 2-side truncation display, see here: #5603

@jreback
Copy link
Contributor

jreback commented Apr 9, 2014

@jseabold @dhirschfeld what is the decision on this?

@jorisvandenbossche
Copy link
Member

If I look at the dicussion on the mailing list and here, I think there is a majority for the option "only show dimensions when truncated" (4 votes against 1 or two for "always show dimensions" on the mailing list, and also here two extra votes for "only show dimensions when truncated").

And I think I am also +1 on only showing the dimensions when the dataframe is truncated.

BTW, R does something vaguely similar. It also truncates the output of a very large dataframe (only at a much higher limit) and then you get the message [ reached getOption("max.print") -- omitted ..... rows ].

@takluyver You were the author of the display changes, what do you think?

@takluyver
Copy link
Contributor

I quite like seeing the number of rows even when it's not truncated. When
you filter a dataframe, it gives you an easy view of how many things you've
got. The number of columns doesn't matter so much to me, but it's there for
consistency.

I'm not going to fight this if people want to change it, but I will note
that people complain more readily than they praise - people who like the
feature may only say so after it's gone.

@jreback
Copy link
Contributor

jreback commented Apr 10, 2014

@takluyver would you mind repeating your comments to the mailing list issue?

@takluyver
Copy link
Contributor

Can you point me to it? I'm not on the pandas mailing list.

@jreback
Copy link
Contributor

jreback commented Apr 10, 2014

@jreback
Copy link
Contributor

jreback commented Apr 21, 2014

@takluyver #5603 ?

@jreback
Copy link
Contributor

jreback commented Apr 30, 2014

@jseabold @jorisvandenbossche

ok after reading the thread...seems consensus is to allow: display.show_dimensions

to be:

  • True show the dimensions always (currently the default)
  • False never show the dimensions
  • truncate show dimensions only if truncated

good?

@cancan101
Copy link
Contributor

What will be the new default?
On Apr 29, 2014 8:05 PM, "jreback" notifications@github.com wrote:

@jseabold https://github.com/jseabold @jorisvandenbosschehttps://github.com/jorisvandenbossche

ok after reading the thread...seems consensus is to allow:
display.show_dimensions

to be:

  • True show the dimensions always (currently the default)
  • False never show the dimensions
  • truncate show dimensions only if truncated

good?


Reply to this email directly or view it on GitHubhttps://github.com//issues/6547#issuecomment-41747034
.

@jreback
Copy link
Contributor

jreback commented Apr 30, 2014

I wouldn't change the default (of course that is the point of an option, users can change).

@jseabold points out that 0.13 made an API change where this did change, but that was a conscious decision.

@jorisvandenbossche
Copy link
Member

To have consensus to allow to change the behaviour via an option is not that difficult I think :-)
But I don't really know if there was a clear consensus on what should be the default, and that is what the real discussion is about (eg in my comment above (#6547 (comment)) I state the opposite ...).

I also don't know if it was that conscious. OK, it was discussed and clearly chosen to show a truncated view, but not really when the dimensions should be showed (see eg comment of @jseabold above #6547 (comment)).

But of course, the options can already be implemented. Then changing the default if a clear consensus arises is trivial.

@jreback
Copy link
Contributor

jreback commented Apr 30, 2014

are their any other options under consideration that I didn't point out?

@jorisvandenbossche
Copy link
Member

No, I think those three options (always/never/truncated) are all relevant options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
6 participants