Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a template-engine based way of rendering pandas data objects #3190

Closed
ghost opened this issue Mar 27, 2013 · 26 comments · Fixed by #10250
Closed

Provide a template-engine based way of rendering pandas data objects #3190

ghost opened this issue Mar 27, 2013 · 26 comments · Fixed by #10250
Labels
Enhancement Ideas Long-Term Enhancement Discussions IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@ghost
Copy link

ghost commented Mar 27, 2013

to_string(), to_html(), to_latex(), formatters, float_format, etc.

related #459

Template engines [jinja2, mako] are the accepted way of converting data into documents,
and the web dev community has provided great libraries to do it.
We should piggyback.

This would be good:

  • Give users the power to scratch their own itch, avoiding the need for pandas devs
    do it, or ignoring "fringe" use cases.
  • Provides users with a readable baseline , which is easy to modify and adapt to their needs.
  • Having a standard way to do it (by customizing a template), users can more easily
    have their output blend into python, and they can contribute / manage / source-control
    document templates in a clean, self-contained way.
  • Escape the "%d" not "{:d}" silliness and format strings in general, which is a relic of
    olden 2.5 support days. No more parameter hell necessary to make the existing methods
    more expressive. I never ever want to format html tables using function arguments.
  • Use a better way to express formatting logic via a DSL, rather then piecing together
    python strings.
  • Unify the handling of textual representations, not different code, only different template files.
  • Can refactor this out of core (but not really, see below)
  • Can provide features such as conditional formatting, styling.

However,

  • Can't really remove existing methods.
  • Adding a template engine dependency to pandas is clearly feature-creep (IMO).

So,

  • Make it a standalone project out of core (or optional dependency?)
  • Translate the existing styles into template form.
  • Provide hooks in pandas to allow users to optionally install the
    library and use it for rendeing
  • Let users know it's available for their use.
@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

@y-p What kind of API are you thinking about here?

For 1) existing to_* methods something like:

df.to_html(template='/path/to/template')

and 2) New to_* methods, maybe something ilke:

df.to_template(template='/path/to/template')

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

Or for 1) just keeping the current API (probably better because of back compat) and "freezing" the keyword parameters (and using them in the template).

@ghost
Copy link
Author

ghost commented Dec 21, 2013

import pandas.export_templates as et
et.register_template("/tmp/html_with_shocking_pink_headings.tmpl","HoTPInK!")
df.render("HoTPInK!",output_file="/tmp/output.html") ; otherwise stdout.
df.render("html",output_file="/tmp/output.html") # bundled template
df.render("latex",output_file="/tmp/output.html") # bundled template
# conditional formatting
et.Stylist(df).cmap("midnight_bluez").zebra([1,1,2],axis=0).topk(rows=[("Foo","Bar"),cols=["ColA"],n=10,axis=0,style="heatmap").render("html")
# or the more general select/apply
et.Stylist(df).select(lambda r,c: r>c).apply("some generalized style DSL here: bg_color=grey20").
  select(lambda r,c: r %2 ==0).apply(font_bold=True).render("latex")

and so on...

The style DSL and all the bells and whistles you could put into this make me fear
feature creep if it's placed within pandas. The library I developed and shelved sort
of balooned into a monstrosity that wasn't very appealing to use.

The idea was to use pandas objects as fundamental data containers, let the user
use some selector language (predicate over position and value/textual mini-dsl/perhaps
even ripping off css3 constructs) to express selections and then provide style "tags" to those cells.
The rendering stage compiles that data structure into a context object for the template engine
in a standardized way, and then the templates themselves use that context to generate
the specific output text that realizes that presentation in a given output format.

I was excited about the prospects but my initial attempt to implement did not end in a system
that's lovely to use. leaky abstractions and ugly templates full of logic were the fundamental
result. The output tables were really pretty though (ignore the failed selector mini-language):

chairs3
chairs2
chairs1

If you drop conditional formating and just provide boilerplate templates for html and latex, which
users can modify in an editor (indent this, double border that) it can be simpler, but may not
be appealing to users if learning a new templating language is required.

That doesn't cover the value formatters problem ("%3.6f", translate factors levels into labels, etc').
There's lots to solve there, which is why I think it makes sense to do it as a project on top of pandas.

I'm not going to try again for the forseeable future.

@ghost
Copy link
Author

ghost commented Dec 21, 2013

I'm working on this again now. typical.

@ghost
Copy link
Author

ghost commented Dec 21, 2013

@olgabot, can you recommend a few alternatives for colormaps to bake into this?
Specifically, heatmaps by value (top10 values in column, for example).

@ghost ghost self-assigned this Dec 21, 2013
@ghost
Copy link
Author

ghost commented Dec 21, 2013

Nailed it. claiming this for 0.14.

@ghost
Copy link
Author

ghost commented Dec 21, 2013

If there are any watchers out there who dabble in design and want to sling some css to style
the way tables look by default for thousands of pandas users out there, raise your hand.

@jreback
Copy link
Contributor

jreback commented Dec 21, 2013

you might want to consider using df.eval/query parsing machinery (instead of lambdas)

@ghost
Copy link
Author

ghost commented Dec 21, 2013

Reduced scope considerably, beautifully simple now and trivial to implement.

@olgabot
Copy link

olgabot commented Dec 22, 2013

@y-p I'm always a fan of the simple sequential colormap for counts, like YlGnBu: http://bl.ocks.org/mbostock/5577023 (top, second from left)

@ghost
Copy link
Author

ghost commented Dec 22, 2013

Thanks... just in time :)

@edoson
Copy link

edoson commented Feb 13, 2014

I see you are planning a conditional formatting, that is quite nice but the colors here are quit ugly.. :) what about changes to the default way tables are rendered? see here: http://coding.smashingmagazine.com/2008/08/13/top-10-css-table-designs/

@jreback
Copy link
Contributor

jreback commented Feb 13, 2014

@edoson this issue is not active ATM as @y-p doesn't really have the time to do this.....you are welcome to pick it up :)

@edoson
Copy link

edoson commented Feb 13, 2014

Actually Im not interested in conditional formatting of a a dataframe, more interested in changing the default rendering to something more stylish. Do you think someone whould like something like that?

@jreback
Copy link
Contributor

jreback commented Feb 13, 2014

@jreback
Copy link
Contributor

jreback commented Mar 14, 2014

@cpcloud @TomAugspurger @jorisvandenbossche @hayd

anyone able to try this for 0.14? small/experimental is ok

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Mar 28, 2014
@TomAugspurger
Copy link
Contributor

I'm looking into this. I'd be using a big chunk of what y-p put together in #5763.

For the API, what are people's thoughts on something like how d3 does it? (I don't know much d3, but this is my understanding. Hopefully someone knows more)

We'd have a method like df.format() that expects a dict of {css class : function } where the function expects a few arguments

  • val (the value of a cell)
  • loc (the index or row label (depends on axis=0 or axis=1))
  • col (the column or row Series (depends on axis=0 or axis=1))

This would let you do something like

def color_max(val, loc, col):
    if val == col.max():
        return "#FF0000"

df = pd.DataFrame({'A': [1, 2]})
df.format(style={'background-color': color_max})

For those cells where color_max evaluates to True, the formatting is applied. Otherwise its the default.

There are a lot of drawbacks to this approach. It's not the friendliest to functions that want to use locations. Or functions that want to apply an operation to the entire dataframe.

It's hard to say without be able to use it. I'll try to get something working.

@TomAugspurger
Copy link
Contributor

An alternative is what y-p had. You define a function that selects part of a table based on CSS attributes:

def tag_col(n,c="grey10", with_headings=False):
    selector="td.col%d" % n
    if not with_headings:
        selector+=".data"
    return [dict(selector=selector,
                props=[("background-color",c)])]

I'm still learning all this stuff, so at this point I'm not sure which approach is more flexible.

@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

I would use a class based approach, e.g. you provide the base class then let users override it

e.g.

class Formatter(object):

        def __init__(self, **kwargs):
               self.kwargs = kwargs

        def set_object(self, obj):
               self.obj = obj

        def render(self):
               self.format_header()
               self.format_body()

        def __str__(self):
               return self.render()

then

df.format(........)

and then can be overriden with a custom class, otherwise or just pass the kwargs in the Formatter.

@psychemedia
Copy link
Contributor

Has anybody looked at using ipythonblocks for visualising dataframes?

I'm not much of a python coder but I had a stab at getting started using ipython blocks to help visualise merge and reshape operations using pandas dataframes for a data/databases course I'm working on: http://nbviewer.ipython.org/gist/psychemedia/9795643

Related issue on ipythonblocks tracker: jiffyclub/ipythonblocks#29 discussing possible implementation issues

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

nice idea.

however, I think this might be better done in ipythonblocks itself, simply patching methods onto the frames at import time.

e.g.

import pandas
import ipythonblocks
def to_iblocks(.....):
    .....

DataFrame.to_iblocks = to_iblocks

normally not in favor of doing this (e.g. messing with an imported package namespace),

but this seems that otherwise you would have to add keywords all over the place which makes the pandas API a bit odd.

alternatively you can create a class

class DataFrameIBlocks(object):

     def __init__(self df):
           self.df = df

then provide methods to draw the blocks on that, and forward methods to the contained df

both a bit non-trivial though

@psychemedia
Copy link
Contributor

@jreback Thanks.. I'll try to explore the idea a little more in the context I'm currently working in (putting together IPython notebooks for a distance learning context) to see what seems tractable/useful and what major cases fall out.

@kynan
Copy link
Contributor

kynan commented Nov 12, 2015

Looks like this hasn't progressed in more than a year now. Is the unanimous view still to only accept a formatter architecture that fits all use cases? Or would a step change be acceptable?

I think it would already be a huge improvement if the to_<format> methods would allow overriding their formatter as an optional argument. I mean not the column formatters (they can stay as is), but e.g. to_html accepts a formatter which is then used instead of the default HTMLFormatter so users can subclass HTMLFormatter and pass it in.

@jreback
Copy link
Contributor

jreback commented Nov 12, 2015

@kynan this will be closed by #10250 shortly

@jankatins
Copy link
Contributor

@jreback I'm not so sure if this issue has the same things in mind as the new style one. While the styling issue is in #10520, a proper template based rendering infrastructure for more than HTML would be really nice (e.g. to implement a to_markdown() method #11052).

@kynan
Copy link
Contributor

kynan commented Nov 12, 2015

Great to hear this is nearing completion! Might be worth adding a reference to #10250 in the opening post for those digging through issues around output formatting (I don't have the required permissions).

@jreback jreback modified the milestones: 0.17.1, Next Major Release Nov 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Ideas Long-Term Enhancement Discussions IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
8 participants