Provide a template-engine based way of rendering pandas data objects #3190

ghost · 2013-03-27T15:44:02Z

to_string(), to_html(), to_latex(), formatters, float_format, etc.

related #459

The existing methods are not as flexible as users sometimes wish,
see Possible interactions with taldcroft/asciitable? #167, Float format syntax #2502, ENH: add escape parameter to to_html() #2919 (comment), Multilevel indexing with .to_latex() method merges two index columns #2942, Multilevel indexing with .to_latex() method merges two index columns #2924,HTMLFormatter.to_html() does not look at show_index_names #3195, Update DataFrame.to_latex() to have more flexibility #3196,ENH: update DataFrame to_latex for nicer typesetting #3264, BUG: adjust to_latex column format when no index #3467, Update DataFrame.to_latex() to have more flexibility #3197,ENH: Remove the hardcoded border=1 in the to_html dataframe export. #4578,API: Would it be useful for everyone if DataFrame.to_html (and to_string) had an argument to specify justify settings for each column? #4315, ml1., BUG: formatters argument to DataFrame.to_latex() is broken #6052
Pandas' core competence is it's data structures and data ops DSL.
the output formats were added for convenience, but don't see much
ongoing work (formatting tables isn't sexy).
Users roll their own using custom code, there's no uniform way to do it
in a way that naturally extends pandas.

Template engines [jinja2, mako] are the accepted way of converting data into documents,
and the web dev community has provided great libraries to do it.
We should piggyback.

This would be good:

Give users the power to scratch their own itch, avoiding the need for pandas devs
do it, or ignoring "fringe" use cases.
Provides users with a readable baseline , which is easy to modify and adapt to their needs.
Having a standard way to do it (by customizing a template), users can more easily
have their output blend into python, and they can contribute / manage / source-control
document templates in a clean, self-contained way.
Escape the "%d" not "{:d}" silliness and format strings in general, which is a relic of
olden 2.5 support days. No more parameter hell necessary to make the existing methods
more expressive. I never ever want to format html tables using function arguments.
Use a better way to express formatting logic via a DSL, rather then piecing together
python strings.
Unify the handling of textual representations, not different code, only different template files.
Can refactor this out of core (but not really, see below)
Can provide features such as conditional formatting, styling.

However,

Can't really remove existing methods.
Adding a template engine dependency to pandas is clearly feature-creep (IMO).

So,

Make it a standalone project out of core (or optional dependency?)
Translate the existing styles into template form.
Provide hooks in pandas to allow users to optionally install the
library and use it for rendeing
Let users know it's available for their use.

The text was updated successfully, but these errors were encountered:

cpcloud · 2013-09-21T18:01:30Z

@y-p What kind of API are you thinking about here?

For 1) existing to_* methods something like:

df.to_html(template='/path/to/template')

and 2) New to_* methods, maybe something ilke:

df.to_template(template='/path/to/template')

cpcloud · 2013-09-21T18:04:38Z

Or for 1) just keeping the current API (probably better because of back compat) and "freezing" the keyword parameters (and using them in the template).

ghost · 2013-12-21T19:57:57Z

import pandas.export_templates as et
et.register_template("/tmp/html_with_shocking_pink_headings.tmpl","HoTPInK!")
df.render("HoTPInK!",output_file="/tmp/output.html") ; otherwise stdout.
df.render("html",output_file="/tmp/output.html") # bundled template
df.render("latex",output_file="/tmp/output.html") # bundled template
# conditional formatting
et.Stylist(df).cmap("midnight_bluez").zebra([1,1,2],axis=0).topk(rows=[("Foo","Bar"),cols=["ColA"],n=10,axis=0,style="heatmap").render("html")
# or the more general select/apply
et.Stylist(df).select(lambda r,c: r>c).apply("some generalized style DSL here: bg_color=grey20").
  select(lambda r,c: r %2 ==0).apply(font_bold=True).render("latex")

and so on...

The style DSL and all the bells and whistles you could put into this make me fear
feature creep if it's placed within pandas. The library I developed and shelved sort
of balooned into a monstrosity that wasn't very appealing to use.

The idea was to use pandas objects as fundamental data containers, let the user
use some selector language (predicate over position and value/textual mini-dsl/perhaps
even ripping off css3 constructs) to express selections and then provide style "tags" to those cells.
The rendering stage compiles that data structure into a context object for the template engine
in a standardized way, and then the templates themselves use that context to generate
the specific output text that realizes that presentation in a given output format.

I was excited about the prospects but my initial attempt to implement did not end in a system
that's lovely to use. leaky abstractions and ugly templates full of logic were the fundamental
result. The output tables were really pretty though (ignore the failed selector mini-language):

If you drop conditional formating and just provide boilerplate templates for html and latex, which
users can modify in an editor (indent this, double border that) it can be simpler, but may not
be appealing to users if learning a new templating language is required.

That doesn't cover the value formatters problem ("%3.6f", translate factors levels into labels, etc').
There's lots to solve there, which is why I think it makes sense to do it as a project on top of pandas.

I'm not going to try again for the forseeable future.

ghost · 2013-12-21T21:42:19Z

I'm working on this again now. typical.

ghost · 2013-12-21T22:05:05Z

@olgabot, can you recommend a few alternatives for colormaps to bake into this?
Specifically, heatmaps by value (top10 values in column, for example).

ghost · 2013-12-21T22:54:43Z

Nailed it. claiming this for 0.14.

ghost · 2013-12-21T23:00:04Z

If there are any watchers out there who dabble in design and want to sling some css to style
the way tables look by default for thousands of pandas users out there, raise your hand.

jreback · 2013-12-21T23:09:59Z

you might want to consider using df.eval/query parsing machinery (instead of lambdas)

ghost · 2013-12-21T23:58:05Z

Reduced scope considerably, beautifully simple now and trivial to implement.

olgabot · 2013-12-22T00:16:43Z

@y-p I'm always a fan of the simple sequential colormap for counts, like YlGnBu: http://bl.ocks.org/mbostock/5577023 (top, second from left)

ghost · 2013-12-22T00:19:10Z

Thanks... just in time :)

edoson · 2014-02-13T19:49:06Z

I see you are planning a conditional formatting, that is quite nice but the colors here are quit ugly.. :) what about changes to the default way tables are rendered? see here: http://coding.smashingmagazine.com/2008/08/13/top-10-css-table-designs/

jreback · 2014-02-13T19:55:32Z

@edoson this issue is not active ATM as @y-p doesn't really have the time to do this.....you are welcome to pick it up :)

edoson · 2014-02-13T19:58:47Z

Actually Im not interested in conditional formatting of a a dataframe, more interested in changing the default rendering to something more stylish. Do you think someone whould like something like that?

jreback · 2014-02-13T20:05:38Z

sure....see also here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_html.html?highlight=to_html#pandas.DataFrame.to_html

jreback · 2014-03-14T12:35:10Z

@cpcloud @TomAugspurger @jorisvandenbossche @hayd

anyone able to try this for 0.14? small/experimental is ok

TomAugspurger · 2014-04-01T13:45:17Z

I'm looking into this. I'd be using a big chunk of what y-p put together in #5763.

For the API, what are people's thoughts on something like how d3 does it? (I don't know much d3, but this is my understanding. Hopefully someone knows more)

We'd have a method like df.format() that expects a dict of {css class : function } where the function expects a few arguments

val (the value of a cell)
loc (the index or row label (depends on axis=0 or axis=1))
col (the column or row Series (depends on axis=0 or axis=1))

This would let you do something like

def color_max(val, loc, col):
    if val == col.max():
        return "#FF0000"

df = pd.DataFrame({'A': [1, 2]})
df.format(style={'background-color': color_max})

For those cells where color_max evaluates to True, the formatting is applied. Otherwise its the default.

There are a lot of drawbacks to this approach. It's not the friendliest to functions that want to use locations. Or functions that want to apply an operation to the entire dataframe.

It's hard to say without be able to use it. I'll try to get something working.

TomAugspurger · 2014-04-01T13:50:00Z

An alternative is what y-p had. You define a function that selects part of a table based on CSS attributes:

def tag_col(n,c="grey10", with_headings=False):
    selector="td.col%d" % n
    if not with_headings:
        selector+=".data"
    return [dict(selector=selector,
                props=[("background-color",c)])]

I'm still learning all this stuff, so at this point I'm not sure which approach is more flexible.

jreback · 2014-04-01T13:50:57Z

I would use a class based approach, e.g. you provide the base class then let users override it

e.g.

class Formatter(object):

        def __init__(self, **kwargs):
               self.kwargs = kwargs

        def set_object(self, obj):
               self.obj = obj

        def render(self):
               self.format_header()
               self.format_body()

        def __str__(self):
               return self.render()

then

df.format(........)

and then can be overriden with a custom class, otherwise or just pass the kwargs in the Formatter.

psychemedia · 2014-04-23T14:36:55Z

Has anybody looked at using ipythonblocks for visualising dataframes?

I'm not much of a python coder but I had a stab at getting started using ipython blocks to help visualise merge and reshape operations using pandas dataframes for a data/databases course I'm working on: http://nbviewer.ipython.org/gist/psychemedia/9795643

Related issue on ipythonblocks tracker: jiffyclub/ipythonblocks#29 discussing possible implementation issues

jreback · 2014-04-23T15:26:22Z

nice idea.

however, I think this might be better done in ipythonblocks itself, simply patching methods onto the frames at import time.

e.g.

import pandas
import ipythonblocks

def to_iblocks(.....):
    .....

DataFrame.to_iblocks = to_iblocks

normally not in favor of doing this (e.g. messing with an imported package namespace),

but this seems that otherwise you would have to add keywords all over the place which makes the pandas API a bit odd.

alternatively you can create a class

class DataFrameIBlocks(object):

     def __init__(self df):
           self.df = df

then provide methods to draw the blocks on that, and forward methods to the contained df

both a bit non-trivial though

psychemedia · 2014-04-23T17:53:36Z

@jreback Thanks.. I'll try to explore the idea a little more in the context I'm currently working in (putting together IPython notebooks for a distance learning context) to see what seems tractable/useful and what major cases fall out.

kynan · 2015-11-12T08:42:29Z

Looks like this hasn't progressed in more than a year now. Is the unanimous view still to only accept a formatter architecture that fits all use cases? Or would a step change be acceptable?

I think it would already be a huge improvement if the to_<format> methods would allow overriding their formatter as an optional argument. I mean not the column formatters (they can stay as is), but e.g. to_html accepts a formatter which is then used instead of the default HTMLFormatter so users can subclass HTMLFormatter and pass it in.

jreback · 2015-11-12T12:17:18Z

@kynan this will be closed by #10250 shortly

jankatins · 2015-11-12T15:52:04Z

@jreback I'm not so sure if this issue has the same things in mind as the new style one. While the styling issue is in #10520, a proper template based rendering infrastructure for more than HTML would be really nice (e.g. to implement a to_markdown() method #11052).

kynan · 2015-11-12T22:19:07Z

Great to hear this is nearing completion! Might be worth adding a reference to #10250 in the opening post for those digging through issues around output formatting (I don't have the required permissions).

jseabold mentioned this issue Mar 27, 2013

Summary re-write statsmodels/statsmodels#636

Closed

6 tasks

cpcloud mentioned this issue Jul 22, 2013

API: Would it be useful for everyone if DataFrame.to_html (and to_string) had an argument to specify justify settings for each column? #4315

Closed

jreback mentioned this issue Jul 25, 2013

DOC: World Bank data needs docs #4354

Closed

ghost mentioned this issue Sep 21, 2013

ENH: Remove the hardcoded border=1 in the to_html dataframe export. #4578

Closed

ghost mentioned this issue Nov 21, 2013

ENH: Add option to highlight NaN cells #5330

Closed

ghost self-assigned this Dec 21, 2013

ghost mentioned this issue Dec 22, 2013

WIP: df rendering using templates + conditional formatting for HTML #5763

Closed

ghost mentioned this issue Jan 20, 2014

ENH: View a data frame as a table with colored backgrounds #6009

Closed

ghost removed their assignment Feb 7, 2014

This was referenced Feb 17, 2014

DataFrame rendering stylists? #459

Closed

pandas integration in IPython notebook #2829

Closed

jreback mentioned this issue Feb 26, 2014

ENH: Html table export: Add of other attributes #6488

Closed

jreback modified the milestones: 0.15.0, 0.14.0 Mar 28, 2014

This was referenced Jun 3, 2014

Possible interactions with taldcroft/asciitable? #167

Closed

add class=„pandas-empty“ to NaN-cells’ HTML #7338

Closed

TomAugspurger mentioned this issue Sep 9, 2014

Template for variables in to_latex() #8213

Closed

jreback mentioned this issue Oct 7, 2014

Add table id option to pandas.DataFrame.to_html #8496

Closed

jreback modified the milestones: 0.16.0, 0.17.0 Jan 26, 2015

jreback mentioned this issue Jun 23, 2015

Add to_fwf support #10415

Open

jreback mentioned this issue Nov 12, 2015

ENH: Conditional HTML Formatting #10250

Merged

jreback modified the milestones: 0.17.1, Next Major Release Nov 14, 2015

TomAugspurger closed this as completed in #10250 Nov 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a template-engine based way of rendering pandas data objects #3190

Provide a template-engine based way of rendering pandas data objects #3190

ghost commented Mar 27, 2013

cpcloud commented Sep 21, 2013

cpcloud commented Sep 21, 2013

ghost commented Dec 21, 2013

ghost commented Dec 21, 2013

ghost commented Dec 21, 2013

ghost commented Dec 21, 2013

ghost commented Dec 21, 2013

jreback commented Dec 21, 2013

ghost commented Dec 21, 2013

olgabot commented Dec 22, 2013

ghost commented Dec 22, 2013

edoson commented Feb 13, 2014

jreback commented Feb 13, 2014

edoson commented Feb 13, 2014

jreback commented Feb 13, 2014

jreback commented Mar 14, 2014

TomAugspurger commented Apr 1, 2014

TomAugspurger commented Apr 1, 2014

jreback commented Apr 1, 2014

psychemedia commented Apr 23, 2014

jreback commented Apr 23, 2014

psychemedia commented Apr 23, 2014

kynan commented Nov 12, 2015

jreback commented Nov 12, 2015

jankatins commented Nov 12, 2015

kynan commented Nov 12, 2015

Provide a template-engine based way of rendering pandas data objects #3190

Provide a template-engine based way of rendering pandas data objects #3190

Comments

ghost commented Mar 27, 2013

cpcloud commented Sep 21, 2013

cpcloud commented Sep 21, 2013

ghost commented Dec 21, 2013

ghost commented Dec 21, 2013

ghost commented Dec 21, 2013

ghost commented Dec 21, 2013

ghost commented Dec 21, 2013

jreback commented Dec 21, 2013

ghost commented Dec 21, 2013

olgabot commented Dec 22, 2013

ghost commented Dec 22, 2013

edoson commented Feb 13, 2014

jreback commented Feb 13, 2014

edoson commented Feb 13, 2014

jreback commented Feb 13, 2014

jreback commented Mar 14, 2014

TomAugspurger commented Apr 1, 2014

TomAugspurger commented Apr 1, 2014

jreback commented Apr 1, 2014

psychemedia commented Apr 23, 2014

jreback commented Apr 23, 2014

psychemedia commented Apr 23, 2014

kynan commented Nov 12, 2015

jreback commented Nov 12, 2015

jankatins commented Nov 12, 2015

kynan commented Nov 12, 2015