Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Styler.to_latex(): conditional styling with native latex format #40422

Merged
merged 136 commits into from
May 24, 2021

Conversation

attack68
Copy link
Contributor

@attack68 attack68 commented Mar 13, 2021

Enhancing Styler to allow LaTeX Input

also indirectly (by providing an alternative to DataFrame.to_latex):

This PR leverages the conditional rendering mechanics and unit tests in Styler to create a pure LaTeX version, i.e. a Styler that instead of CSS (attribute, value) tuples has LaTeX (command, options) tuples and renders directly to LaTeX with nested cell

An extension to this is provided in another PR which converts an HTML-CSS Styler to a LaTeX Styler, and then renders in LaTeX.

How is this achieved?

Internally:

  • a new jinja2 template is created for generating latex output.
  • a new Styler._render_latex() method is added to invoke the new template.
  • a Styler._translate_latex method is added to make structural changes to the usual render dict d to make it suitable for latex template.
  • parsing functions are added which do simple tasks to facilitate the jinja2 template's operation, including sparsifying multiindexes

For Users:

  • a new Styler.to_latex() method is introduced to give the user control.

Comparison with DataFrame.to_latex()

Input Arguments

  • Replicating the io arguments: buf, encoding,
  • Replicating the LaTeX arguments: position, caption, label, sparsify, column_format
  • Adding additional LaTeX arguments: position_float, hrules
  • Removing all of the formatting arguments: na_rep, formatters, float_format, escape, decimal
    • The pattern styler.format(...).to_latex(...) fully replicates the functionality, and can be intermediately viewed in a Notebook.
    • The pattern styler.format(..)...format(..).to_latex(...) performs better formatting than a single to_latex(..) with formatting options could replicate.
    • For code maintenance it is better to only maintain the .format method rather than .to_latex as downstream.
  • Removing the pseudo-formatting arguments: columns, index, index_names, header,
    • The pattern styler.hide_index().hide_columns() can replicate the first three, the last seemed a bit unnecessary.
  • Modifying the multiindex args:
    • Now we have sparsify {bool} sparse_index and sparse_columns and multicol_align {str} and multirow_align {str}.
    • Previously the options were sparsify {bool}, multicolumn {bool}, multirow {bool} and multicolumn_format {str}
  • Additional functionality not coded (yet?): bold_rows (maybe bold_labels), and longtable.

Performance

Suppose a user will never want to create a LaTeX bigger than (200,20) so we test that size:

Method Time
df.to_latex() 151ms
df.style.to_latex() 43ms
df.to_latex(float_format="{:.2f}".format) 65ms
df.style.format(":.2f".format).to_latex() 61ms
df.style.format(":.2f".format).to_latex() 45ms

(this was improved by #41269)

Styler is broadly unaffected here by adding formatting methods, although DataFrame.to_latex is oddly much faster. Anyway the conclusion seems to be for this size frame and smaller performance is broadly the same, if not a bit better in Styler.

Docs

Here are the rendered versions of the to_latex documentation

docs_to_latex.zip

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Thanks for all the work here @attack68

@attack68 attack68 added this to the 1.3 milestone May 8, 2021
@attack68 attack68 requested a review from jreback May 11, 2021 06:03
@rhshadrach
Copy link
Member

cc @pandas-dev/pandas-core for any further reviews

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasonable. pls rebase. can you add some links from io.rst where we show to_latex now to styler / latex section (or add this to the styler section also). can be a followup (pls create an issue).

doc/source/whatsnew/v1.3.0.rst Show resolved Hide resolved
pandas/io/formats/style.py Show resolved Hide resolved
@attack68
Copy link
Contributor Author

Recent commits updated the sparsify keyword arg now that separate sparse_index and sparse_columns are available.

@jreback pinging on green

@jreback jreback merged commit ed9f60c into pandas-dev:master May 24, 2021
@jreback
Copy link
Contributor

jreback commented May 24, 2021

thanks @attack68 very nice!

so basically the followons are

  • add / update docs
  • deprecate DataFrame.to_latex / point to styler.to_latex (do we have an issue for this?)

@ivanovmg
Copy link
Member

@jreback I would also add that the short caption feature has to be implemented here before deprecating DataFrame.to_latex (see #36267)

@attack68 attack68 deleted the latex_styler_mvp branch May 24, 2021 14:46
@jreback
Copy link
Contributor

jreback commented May 24, 2021

sure

@attack68 can u add a follow up issue of things that we need to keep before depreciation

@moi90
Copy link
Contributor

moi90 commented May 26, 2021

Thanks for the great work, @attack68! I can't wait to use it in my dissertation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment