Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom column formatters for HTML in IPython, e.g. can show np 2D-array as image #9579

Closed
wants to merge 8 commits into from

Conversation

d1manson
Copy link

@d1manson d1manson commented Mar 3, 2015

I've created a gist to explain in more detail, and with a screenshot.

@d1manson d1manson changed the title Custom column formatters for HTML in IPython, e.g. can show array as image Custom column formatters for HTML in IPython, e.g. can show np 2D-array as image Mar 3, 2015
@jreback
Copy link
Contributor

jreback commented Mar 4, 2015

this would need to be implemented like:

DataFrame.to_html(......, formatters=a_callable_or_dict_of_callables)

so if a formatter exists for a particular column, or genreally then it would get called to generate the html.

this would be then fairly generic.

@d1manson
Copy link
Author

d1manson commented Mar 4, 2015

Yes, that makes sense, but you also need some way to attach the callable_or_dict_of_callables to the DataFrame "permanently" so that DataFrame._repr_html can pass it to DataFrame.to_html when it is called without any arguments. Note that these custom formatters may/probably will be specific to the output type (i.e. html in this case).
Is there a recommended way to do this such that it will automatically be copied across to views/copies of the DataFrame?

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string IO HTML read_html, to_html, Styler.apply, Styler.applymap labels Mar 4, 2015
@jreback
Copy link
Contributor

jreback commented Mar 4, 2015

you cannot copy meta-data like this. It is in theory possible, but this introduces all kinds of issues. Much better to simply pass it to .to_html() no?

@jreback
Copy link
Contributor

jreback commented Mar 4, 2015

this is related to #6488 and #3195

@d1manson
Copy link
Author

d1manson commented Mar 4, 2015

If it's a lot more complicated to do it as meta-data then I'll settle for what you are suggesting.

In fact that would probably be ok for my usage case, as I'm using a DataFrame as the backend of a custom DataSet class, i.e. I can explicitly call .to_html from inside my DataSet._repr_html, and pass it whatever meta data I need to.

Since you seem to have a pretty good idea of what you'd like to see here, and it doesn't seem that far off what is already suggested, can I ask you to finish it off?

p.s. This is my first ever pull request to an open-source project, so I don't know what the etiquette is.

@jreback
Copy link
Contributor

jreback commented Mar 4, 2015

@d1manson well, would love to have you contribute. why don't you change around a bit to use it in calling .to_html() and see how it goes...

@d1manson
Copy link
Author

d1manson commented Mar 5, 2015

is that any better?

@jreback
Copy link
Contributor

jreback commented Mar 5, 2015

looks closer. What would help this is a few tests for this kind of behavior. E.g. you assert that the generated html is correct. see test_formats.py (you don't have to exactly put the generated html, just that certain elements that you are testing are in there).

@lewisacidic
Copy link
Contributor

Any progress on this? I think this is a VERY useful feature, that will make IPython notebook a lot more powerful as a data presentation tool - I see a LOT of use cases:

  • row identification
    • e.g. if you have a data frame with rows corresponding to countries, could repr a flag column
  • data summarization
    • add a e.g. spark chart to a row easily
  • repr objects in dataframe
    • e.g. if have e.g. a column of chemical objects, repr the structure

In my opinion, this should be quite a priority for pandas.

@jreback
Copy link
Contributor

jreback commented Apr 17, 2015

@richlewis42 would love to have another contributor to this. There are several linked issues. This requires a bit of a comprehensive, though not too difficult feature set.

@lewisacidic
Copy link
Contributor

I would love to contribute, although I am very new to open source, and to the internals of pandas. If there is anything specific that I could do to push this forward, please let me know.

@jreback
Copy link
Contributor

jreback commented Apr 17, 2015

here what would help

a code example which shows an api of what a user would do

eg construct a sample frame programmatically
then df.to_html(.....) but with all of the details

this is not a difficult issue to fix but needs a reasonably rich api spec

@lewisacidic
Copy link
Contributor

I will give this a try when I get back from work, hopefully over the weekend. Thanks for the ideas.

@jreback
Copy link
Contributor

jreback commented May 9, 2015

if you guys want to update for 0.17.0 that would be gr8!

@jreback jreback added this to the 0.17.0 milestone May 9, 2015
@lewisacidic
Copy link
Contributor

Hi, sorry I've been off the case for ages, I've been really busy at work. I thought about what could be useful, and it seems to me that the simplest thing would be that if an object in a DataFrame offers an HTML representation (i.e. implements _repr_html_), then that should be used by default (instead of the output from prettyprint, which I think is used at the moment) - this may be a slightly different use case to the original pull request however. I put an example in a gist. I couldn't make the change in pandas.core.format easily, as the representation is generated, then gets escaped by default - there would need to be some handling to make sure html provided by the object isn't escaped, but strings generated from the object is. Any opinions on this?

@shoyer
Copy link
Member

shoyer commented Jun 3, 2015

@richlewis42 that is a really nice idea, actually. For example, it would work well with geopandas, which holds shapely objects inside a column. These objects already have an _repr_html_ that shows a picture of the shape, so the dataframe HTML view could show those shapes as well.

@d1manson
Copy link
Author

d1manson commented Jun 3, 2015

I would vote for both: if a custom formatter is provided, then use that, otherwise if a _repr_html_ exists, use that, otherwise use repr or str or whatever.

The advantage of the custom formatter is that it gives you a lot of control:

  • if the elements are images, then you could resize them, or add a caption giving their dimensions etc.
  • if the elements can be treated as keys into a map, e.g. the string "br" could be treated as a key into a map of flag images, you could then render the flag with the caption "br"...I think this is what @richlewis42 meant in the first comment.
  • if the elements are large data structures (e.g. numpy arrays, trees, dictionaries etc.) there may not be a single obvious _repr_html_ for the given class, but there may be a sensible way of showing the specific data in your column.

Of course one approach would be to wrap each of the objects in the column in a custom object which handles the _repr_html_ call, but that would be a pretty ugly solution... the custom formatters suggestion is not really much more complicated than finding and using the _repr_html_.

@shoyer
Copy link
Member

shoyer commented Jun 3, 2015

I agree, these are not necessarily mutually exclusive. Recursive support for _repr_html_ should go in another PR, in any case.

@lewisacidic
Copy link
Contributor

Agreed, I still like the custom formatters, and think they would be very helpful.

Could the recursive _repr_html_ be implemented by using a formatter by default for all columns, that would call each object's _repr_html_, and fall back to an escaped pretty printed str (thus killing two birds with one stone)?

Or shall I open another pull request for recursive _repr_html_? I had a look at the code, but couldn't see an easy way to modify it without rewriting quite a bit of it, due to the escaping being applied after the object is converted to string representation.

@shoyer
Copy link
Member

shoyer commented Jun 10, 2015

This PR needs a bit of extra work to go in -- notably it needs some tests and slightly better docs. I'll add comments inline in the code.

@richlewis42 I haven't taken a look at the code carefully yet, so up to you on the best order to add this. This PR is also probably going to make it in shortly and might help a title bit (see the notebook argument): #10232

Edit: to clarify, I do think it's probably a good idea to add _repr_html_ later, unless @d1manson wants to add it to this PR. The main work here will probably be documentation, not the actual code changes.

names will be given defaults or ignored respectively. If list/tuple
the length should match the columns exactly.
Each callable can have an optional boolean `escape` attribute,
and an optional string `justify` attribute. See `_make_fixed_width`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please give a brief description instead of referring to the source code? Also -- maybe add an example to the HTML a export docs? http://pandas-docs.github.io/pandas-docs-travis/io.html#io-html

@shoyer
Copy link
Member

shoyer commented Jun 10, 2015

Mostly this just needs tests to verify the stated functionality.

@jreback jreback added this to the Next Major Release milestone Aug 15, 2015
@jreback jreback modified the milestones: 0.17.0, Next Major Release Aug 15, 2015
@jreback
Copy link
Contributor

jreback commented Oct 13, 2015

anyone want to update (or present a new PR)?

@d1manson @richlewis42
@jd

@jreback
Copy link
Contributor

jreback commented Nov 10, 2015

I think this could be done as part of #10250 with just a formatter method(s) (as you have several functionailites added), so closing, but welcome extensions as part of that PR (or after)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants