Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Styler.to_latex() #21673

Closed
toobaz opened this issue Jun 29, 2018 · 67 comments · Fixed by #40422 or #45138
Closed

ENH: Styler.to_latex() #21673

toobaz opened this issue Jun 29, 2018 · 67 comments · Fixed by #40422 or #45138
Labels
Enhancement IO LaTeX to_latex Styler conditional formatting using DataFrame.style
Milestone

Comments

@toobaz
Copy link
Member

toobaz commented Jun 29, 2018

I have created a branch adding a to_latex method to Styler.

It is nowhere near readiness, and in particular:

  • supports only cell background color as of now - adding some attributes should be trivial, adding others might be less so
  • most importantly, it works on a copy of the dataframe casted .astype(str), which is plain wrong because it discards features such as float formatting. For a good result, a better integration is needed with the code that currently does cell formatting in DataFrame.to_latex.

This said, if anyone feels like experimenting, it might be slightly better than starting from scratch.

One aspect which I think is useful, even aside from this branch, and might benefit some discussion, is the pair of methods _latex_preserve and _latex_restore, which basically replace LaTeX commands so that they are not disturbed by escaping, and then restores them. There might be better way to code this, but I really think this is something we need to implement, and to offer to users who happen to nest LaTeX code in their cells.

@toobaz toobaz added Enhancement Code Style Code style, linting, code_checks Difficulty Intermediate IO LaTeX to_latex labels Jun 29, 2018
@MingweiSamuel
Copy link

Would really appreciate this feature, thanks for making this issue

@srossi93
Copy link

Any updates?

@jbrockmendel jbrockmendel removed Code Style Code style, linting, code_checks Difficulty Intermediate labels Oct 16, 2019
@soumitrakp
Copy link

+1 for this feature

@jbrockmendel jbrockmendel added the Styler conditional formatting using DataFrame.style label Dec 11, 2019
@KaleabTessera
Copy link

KaleabTessera commented Feb 17, 2020

+1

1 similar comment
@dorukhansergin
Copy link

+1

@th0ger
Copy link

th0ger commented Feb 29, 2020

Does this enhancement aim to solve issues like this:
<pandas.io.formats.style.Styler at 0x.......>?
E.g. pd.style.hide_index() combined with jupyter nbconvert -to pdf

@toobaz
Copy link
Member Author

toobaz commented Mar 1, 2020

Does this enhancement aim to solve issues like this:

Not sure what happens when you convert a notebook to pdf... but yes, it might be that to_latex() gets somehow called on (non-styled) DataFrames, and in that case, yes, fixing this issue would solve that too.

@harakiricode
Copy link

+1

2 similar comments
@yannpequignot
Copy link

+1

@teyden
Copy link

teyden commented Jan 2, 2021

+1

@MarcoGorelli
Copy link
Member

Please, this isn't helpful - see here for how to contribute, else wait for someone else to do it, but carrying on commenting +1 is disruptive

@toobaz
Copy link
Member Author

toobaz commented Jan 3, 2021

but carrying on commenting +1 is disruptive

(and no more effective than adding a simple "thumbs up" on the first comment)

@cauebs
Copy link

cauebs commented Jan 20, 2021

That branch is currently 8352 commits behind. Maybe you should try to rebase and open a PR, to see if it catches the attention of the maintainers.

@toobaz
Copy link
Member Author

toobaz commented Jan 20, 2021

Maybe you should try to rebase and open a PR, to see if it catches the attention of the maintainers.

Who's "you"? If it's me, well, I am a maintainer, and that branch already catched my attention, but as of now it didn't result in me doing anything more :-)

Jokes apart, @cauebs feel free to try to rebase - not sure if it will help merging this eventually, but at least it will help making sure no changes in the last one and a half years broke my approach. In any case, as I wrote, that branch wasn't and wouldn't be ready for a PR.

@moi90
Copy link
Contributor

moi90 commented Mar 5, 2021

I would really like to see this added to Pandas!

What are the minimum requirements to get this merged? What would a sensible test case look like?

I don't think that it needs to support all of the current functionality before being merged, colored background and bold cells would be enough, imho.

@toobaz
Copy link
Member Author

toobaz commented Mar 5, 2021

What are the minimum requirements to get this merged?

I think that more than any set of features, it is just doing it "right", which means

  • no code duplication
  • no .astype(str) (which is the problem of my current approach)

I think a PR satisfying these two points could be perfectly mergeable even with a really minimal set of supported formatting features.

@moi90
Copy link
Contributor

moi90 commented Mar 6, 2021

What is the problem with conversion to str? What happens in DataFrame.to_latex that needs to be duplicated?

(Sorry if this is obvious from the code, I didn't have a look yet. I'm just trying to get a feeling for the complexity of the problem.)

@toobaz
Copy link
Member Author

toobaz commented Mar 6, 2021

What is the problem with conversion to str?

It's explained in my first comment ;-)

In addition, it's an obvious code duplication, since the process of converting cells content to strings is already implemented for DataFrame.to_latex().

@moi90 moi90 mentioned this issue Mar 8, 2021
4 tasks
@moi90
Copy link
Contributor

moi90 commented Mar 8, 2021

@toobaz Your branch also adds a Styler.to_html. Is this safe to remove for now?

@moi90
Copy link
Contributor

moi90 commented Mar 8, 2021

If I understand correctly, this is what we currently have:

"Styler.to_latex" -> "NDFrame.to_latex" -> "DataFrameRenderer.to_latex" -> "LatexFormatter.to_string" -> "TableBuilderAbstract.get_result";

@toobaz
Copy link
Member Author

toobaz commented Mar 8, 2021

@toobaz Your branch also adds a Styler.to_html. Is this safe to remove for now?

Yes, I think the technique to be used for to_html are very similar to those for to_latex, but definitely there is no need to implement them together.

@moi90
Copy link
Contributor

moi90 commented Mar 12, 2021

@attack68 OK, I understand your points.

I made a modification to my template and it now now has a syntax for parsing the usuas CSS and converting it to latex format

This looks really great!
However, I'm not certain that your approach of using custom LaTex props is the best one. I would rather translate the CSS properties that are valid for HTML to proper LaTex markup commands (like @toobaz already did), e.g. font-size: Huge => {\Huge <text>}. This way you wouldn't need different props for HTML and LaTex. I'd be happy to contribute something like that. But I also acknowledge the flexibility in your approach to use any LaTex command. Maybe we could give the user the opportunity to (re)define styles?

@attack68
Copy link
Contributor

attack68 commented Mar 12, 2021

However, I'm not certain that your approach of using custom LaTex props is the best one. I would rather translate the CSS properties that are valid for HTML to proper LaTex markup commands (like @toobaz already did), e.g. font-size: Huge => {\Huge <text>}. This way you wouldn't need different props for HTML and LaTex. I'd be happy to contribute something like that. But I also acknowledge the flexibility in your approach to use any LaTex command. Maybe we could give the user the opportunity to (re)define styles?

So this is what I did at first actually just following the outline, but I quickly realised that:

  1. Depending upon which latex package you used there might be different commands, i.e. multiple translations from CSS to latex, e.g. is font-style: italic or oblique has at least 3 variants in latex: \textit, \textsl, \emph, and I think this gets worse for colors.

  2. It was more restrictive since to get anything to work in latex it first had to have a defined CSS translation rule for it.

  3. It could be possible there is something in latex that does not translate directly from CSS, e.g. custom properties, or even if you define your own latex commands and want to insert the commands into cell formatting. (the equivalent of adding external CSS classes)

If you adopt the format I have provided it is easy to add in a patch that does what you want, i.e.:

def _parse_latex_cell_styles(styles: CSSList, display_value: str) -> str:
    styles = _parse_css_to_latex(styles)  #  <-- here is your patch function
    for style in styles[::-1]:  # in reverse for most recently applied style

where your function needs to convert say [('background-color', 'red')] to [('cellcolor', '{red}')], before the program continues.

Edit: I would rather leave this out for a basic PR, and maybe add into the functionality afterwards as a separate component,

@moi90
Copy link
Contributor

moi90 commented Mar 12, 2021

For the defaults (bold, italics, colored cell) it is pretty straight forward. (italics is textit; textsl is slanted, emph is semantic markup that's commonly rendered as italic, so no problem here). Moreover, some commands always have to be applied in a certain order. I'm all for the possibility of using custom LaTex commands, but I don't think this is the right way.

Maybe @toobaz can give his opinion about this?

@moi90
Copy link
Contributor

moi90 commented Mar 12, 2021

Edit: I would rather leave this out for a basic PR, and maybe add into the functionality afterwards as a separate component

This could be a good idea. However, if it is released this way, some solutions are infeasible in the future, if we don't want to break the API.

Would it be sensible to prefix latex-specific CSS attributes with a "vendor prefix", like -latex-Huge? This would also prevent name collisions between "real" CSS attributes and LaTex ones.

Also, there has to be a better way to define how the markup is applied.
Different from your example, cellcolor is often used like this: \cellcolor[rgb]{1,0,1} value or {\cellcolor[rgb]{1,0,1} value} or even {\cellcolor[rgb]{1,0,1}} value, which is required for siunitx: https://tex.stackexchange.com/a/436148.

@toobaz
Copy link
Member Author

toobaz commented Mar 12, 2021

Maybe @toobaz can give his opinion about this?

I would love to! But I'm confused. My understanding is that you are discussing whether the API should accept formatting in the format of css or of (arbitrary) LaTeX command (and I think I favor the former, and consider the latter a welcome extension). But then

I made a modification to my template and it now now has a syntax for parsing the usuas CSS and converting it to latex format, so:

... suggests that @attack68's current code already accepts both?

@attack68
Copy link
Contributor

attack68 commented Mar 12, 2021

My code currently uses latex structured in Styler's CSS format, i.e ('attr','value') tuple, basically forming a ('command',options') latex pattern. It currently expects that all input is in latex format. However, if it were given in CSS an optional converter could then convert this to the latex format for post-processing.

This could be a good idea. However, if it is released this way, some solutions are infeasible in the future, if we don't want to break the API.

This is a long way from release, but what solutions are infeasible? I believe that using a CSS map to Latex commands is far more restrictive and prohibitive, than what I have offered.

For example,

Different from your example, cellcolor is often used like this: \cellcolor[rgb]{1,0,1} value or {\cellcolor[rgb]{1,0,1} value} or even {\cellcolor[rgb]{1,0,1}} value, which is required for siunitx:

I have provided two variants:
('cellcolor', '[rgb]{1,0,1}') maps to '\cellcolor[rgb]{1,0,1){display_value}'
('cellcolor', '[rgb]{1,0,1}-wrap-') maps to '{\cellcolor[rgb]{1,0,1} display_value}'

More variants are easy to add in but at the risk of creating a parsing language.

@moi90 how would you be accounting for these variants without introducting your own parsing language in a pure CSS transformation solution?

@attack68
Copy link
Contributor

Previous discussion about this considered making two separate classes: html-Styler and a latex-Styler. In the case of the latter, you would not expect a latex-Styler to process CSS language, more so I would think you would expect the user to be inputting latex styling language.

This PR is essentially creating a latex-Styler from the pre-exsiting mechanics and unit tests for html-Styler.

To provide a translator between a html-Styler (in CSS) to a latex-Styler is a fairly easy extension, once the above is approved.

To allow a html-Styler (in CSS and latex) to a latex-Styler requires probably tagging those latex styles with -latex- as highlightlighted by @moi90, so that the above translator can add more functionality.

But this last step might be considered too esoteric and not worth inclusion by developers? maybe not.

@moi90
Copy link
Contributor

moi90 commented Mar 26, 2021

I'm really not happy with the weird syntax to squeeze LaTeX command into CSS... It gets worse, if a user wants to use siunitxs S columns. (I believe this is a pretty standard setup.)

Here, I made a demonstration which steps are required to format tables correctly when using siunitx.
Basically, a cell has to be formatted the following way:

markup = protected + unprotected
protected = ["{\cellcolor{...}"}]
unprotected = ["\color{...}"] + ["\bfseries"] + ["\textit"]

It would be really annoying to get this using the proposed CSS syntax.

ExcelFormatter has a CSSToExcelConverter that uses a CSSResolver to parse the markup and generate the markup required by Excel.

Styler.to_latex should do the same.

While flexibility is great, I think it is more important to cover the common cases with ease (background-color, color, font-weight: bold, font-style: italic). (Also, it will not be very hard to extend the existing protected and unprotected formatting commands.)

@attack68
Copy link
Contributor

@moi90 allowing Styler to operate as CSS ('attribute', 'value') pairs or as LaTeX ('command', 'options') creates a non-duplicated and maintainable codebase that is flexible in both formats. Another advantage is that the unit tests and formatting methods are available in both versions.

You should not eliminate all of this flexibility, for the sake of covering the common cases, which I have said are already very easy to translate into the LaTeX format. See this unpublished PR (not included in my original PR because this is an extension which just complicates the PR reviewers)

So far the parameters that you have raised that have been solved are:

  • user defined caption
  • user defined position
  • user defined label
  • automatic or user defined wrapping in {table}
  • multirow and multi column sparsification
  • automatic alignment of columns r for numeric and l for non-numeric (in .to_latex())
  • automatic alignment of columns r for numeric and l for non-numeric (in .render(latex=True) - this is harder to implement without code duplication)
  • user defined column_format
  • giving a simple option --wrap to change LaTeX \<command><ops>{<display>} to {\<command><ops> <display>}
  • other braces wrapping formats compatible with siunitx (probably fairly easy to incorporate)
  • converting 4 basic CSS attributes into LaTeX format

Very much this discussion has followed the line of you challenging my development's capabilities. This has been valuable since it has driven the development of these options. However, without a suitable challenger model it is difficult for me to question how you plan to deal with some of these items you have raised and others, if you still believe that working with a LateXFormatter is preferable (I don't)?

  • How do you differentiate between bfseries and textbf, or textit and emph i.e. the cases where there is not a 1-1 direct CSS translation?
  • How do you plan to deal with the positioning of braces which may be different for siunitx or other packages?
  • How do you plan on translating font-size for which Large and Huge are not valid CSS values?

Here is an example of the extension module from the above PR:
Screen Shot 2021-03-28 at 17 50 40

@moi90
Copy link
Contributor

moi90 commented Mar 29, 2021

@attack68 I greatly appreciate your skills as a developer and am impressed that you have always skillfully addressed my "challenges". 👍

I'm sorry I haven't come up with a challenger model. I tried to dig into the existing code from multiple angles but I currently lack the time to get to a solution that would worth discussing. Also, you convinced me that using Jinja is not so bad after all.

@moi90 allowing Styler to operate as CSS ('attribute', 'value') pairs or as LaTeX ('command', 'options') creates a non-duplicated and maintainable codebase that is flexible in both formats. Another advantage is that the unit tests and formatting methods are available in both versions.

I don't see how a translation between CSS properties and LaTeX commands would lead to more code duplication or less maintainability.

automatic alignment of columns r for numeric and l for non-numeric (in .to_latex())

Maybe we can have an additional option for siunitx that uses the S column type for numeric columns?

automatic alignment of columns r for numeric and l for non-numeric (in .render(latex=True) - this is harder to implement without code duplication)

Maybe we can drop render(latex=True) altogether? What is the advantage over to_latex?

How do you differentiate between bfseries and textbf, or textit and emph i.e. the cases where there is not a 1-1 direct CSS translation?

  • Do not use textbf or textit (as it leads to problems with siunitx)
  • Do not use emph (as it is semantic markup, not a certain style)

How do you plan to deal with the positioning of braces which may be different for siunitx or other packages?

I answered here in detail. Basically: Always be compatible with siunitx; this does not harm other setups.

How do you plan on translating font-size for which Large and Huge are not valid CSS values?

I would make this a problem. On the CSS side, there is large, x-large, xx-large, and xxx-large. On the LaTeX side, there is \large, \Large, \LARGE, \huge, \Huge. These can be matched nicely (when excluding \Huge). (Same for the small sizes.)

Here is an example of the extension module from the above PR

Cool! (Albeit not yet siunitx-compatible.)

@jreback
Copy link
Contributor

jreback commented Apr 23, 2021

if anyone would like to review / try out / comment on the PR #40422 would be appreciated.

@jreback jreback added this to the 1.3 milestone May 21, 2021
@alevinetx
Copy link

As an end user, when I have a Styler object that renders fine while viewing in a notebook, but exported to PDF it becomes class display (....Styler), do I need to explicitly call .toLatex() or will that be done behind the scenes?

Thank you for working on this issue!

@asapsmc
Copy link

asapsmc commented Dec 8, 2021

Could you please provide some more advanced examples in documentation of usage of to_latex()?

@attack68
Copy link
Contributor

attack68 commented Dec 8, 2021

what, specifically , would you like to see?

@asapsmc
Copy link

asapsmc commented Dec 8, 2021

Building more complex tables, with multicols and multirows, other Latex elements such as \cmidrule and if possible (can't understand from the documentation if that's possible or not) if there's way to include logic in the table building (e.g. include \midrule after a specific row, etc.). I have to publish lot's of tables in LateX, but I'm unable to explore all possibilities with the current examples.

@attack68
Copy link
Contributor

attack68 commented Dec 9, 2021

Building more complex tables, with multicols and multirows,

You need a MultiIndex and use the mulitrow_align, multicol_align, sparse_index, sparse_columns arguments. The rest is handled by default. Data values will never be multi-columned or multi-rowed, only indexes.

other Latex elements such as \cmidrule and if possible (can't understand from the documentation if that's possible or not) if there's way to include logic in the table building (e.g. include \midrule after a specific row, etc.).

No you cant add conditional out of cell logic. The only custom commands you can add are as described in the docs page, akin to the example for \rowcolors{1}{pink}{red}

@asapsmc
Copy link

asapsmc commented Dec 9, 2021

Thanks for your feedback. But for a novice user like me, it's been impossible to make it work from the examples (both in Styler.to_latex() and Dataframe.to_latex()).

@moi90
Copy link
Contributor

moi90 commented Dec 10, 2021

include \midrule after a specific row

For that, I usually split the generated output into individual lines, insert the extra rules at pre-defined locations and concatenate the result. This breaks easily but it is better than nothing.

@asapsmc
Copy link

asapsmc commented Dec 12, 2021

include \midrule after a specific row

For that, I usually split the generated output into individual lines, insert the extra rules at pre-defined locations and concatenate the result. This breaks easily but it is better than nothing.
@moi90:
How do you split the generated output into individual lines?

@asapsmc
Copy link

asapsmc commented Dec 12, 2021

(Sorry to put this here, but I'm finding several questions on StackOverflow (e.g. this or this) unanswered, and my time is running out, so I bring this question into where I know there is knowledge to solve this. Please excuse me)

why Styler.to_latex() does not produce the same outputs (namely the \cline) that DataFrame.to_latex().

My original dataframe after aggregating results (dfg) is this:

                   F-1   F-2
dataset Model               
G       Baseline 5.825 5.804
        Version2 5.825 5.804
H       Baseline 4.677 4.571
        Version2 4.802 4.660
S       Baseline 2.406 1.921
        Version2 2.719 2.189
T       Baseline 5.284 4.949
        Version2 5.931 5.909 

Then I use the following code:

pd.options.display.float_format = '{:,.3f}'.format    
styler_latex = dfg.style.to_latex(position="H", hrules=True, multirow_align="c", multicol_align="r", sparse_index=True)
dfg_latex = dfg.to_latex(position='H', escape=False, sparsify=True, multirow=True, multicolumn=True)
print('styler:', styler_latex)
print('dfg_latex', dfg_latex)

Output from Styler.to_latex() (Latex Code and Image)

\begin{table}[H]
\centering
\begin{tabular}{llrr}
\toprule
{} & {} & {F-1} & {F-2} \\
{dataset} & {Model} & {} & {} \\
\midrule
\multirow[c]{2}{*}{G} & Baseline & 5.824811 & 5.804303 \\
 & Version2 & 5.824811 & 5.804303 \\
\multirow[c]{2}{*}{H} & Baseline & 4.677066 & 4.570626 \\
 & Version2 & 4.801857 & 4.660115 \\
\multirow[c]{2}{*}{S} & Baseline & 2.406244 & 1.921260 \\
 & Version2 & 2.719123 & 2.189293 \\
\multirow[c]{2}{*}{T} & Baseline & 5.284241 & 4.949087 \\
 & Version2 & 5.931376 & 5.909215 \\
\bottomrule
\end{tabular}
\end{table}

enter image description here

Output from DataFrame.to_latex() (Latex Code and Image)

\begin{table}[H]
    \centering
    \begin{tabular}{llrr}
    \toprule
      &          &   F-1 &   F-2 \\
    dataset & Model &       &       \\
    \midrule
    \multirow{2}{*}{G} & Baseline & 5.825 & 5.804 \\
      & Version2 & 5.825 & 5.804 \\
    \cline{1-4}
    \multirow{2}{*}{H} & Baseline & 4.677 & 4.571 \\
      & Version2 & 4.802 & 4.660 \\
    \cline{1-4}
    \multirow{2}{*}{S} & Baseline & 2.406 & 1.921 \\
      & Version2 & 2.719 & 2.189 \\
    \cline{1-4}
    \multirow{2}{*}{T} & Baseline & 5.284 & 4.949 \\
      & Version2 & 5.931 & 5.909 \\
    \bottomrule
    \end{tabular}
    \end{table}

enter image description here

Questions:

  1. Why does not Styler.to_latex() include \cline, contrarily to DataFrame.to_latex() ? Is there any way to "force" this behaviour into Styler.to_latex() . Why does not Styler.to_latex() include \cline, contrarily to DataFrame.to_latex() ? Is there any way to "force" this behaviour into Styler.to_latex() ?

I tried to do

my_dfstyle = my_dfstyle.set_table_styles([
        {'selector': 'toprule', 'props': ':toprule;'},
        {'selector': 'midrule', 'props': ':midrule;'},
        {'selector': 'bottomrule', 'props': ':bottomrule;'},
    ], overwrite=False)

but I was unsuccessful. Is there any way to accomplish this type of control (e.g. force \midrule between multirows)?

  1. In a report I wouldn't like to see table headers with 2 lines (as in the above tables where one line would suffice). But to achieve that, I have to reset index, and then I lose the ability of multirows (e.g. in the dataset column). Is there any way to circumvent this? Is it possible to "merge" data cells?

@attack68
Copy link
Contributor

pandas is a volunteer library. cline is not (yet) implemented in Styler.to_latex, no one has volunteered the time to develop it. toprule, midrule, and bottomrule will not help you here.

no, datacell merging is not possible. not sure what you mean by "two lines" but irrespective i am confident the fearures you are looking for have not been developed.

@attack68 attack68 reopened this Dec 12, 2021
@asapsmc
Copy link

asapsmc commented Dec 12, 2021

@attack68: Just to be clear I know that pandas is a volunteer library and I'm extremely grateful for pandas!! I was just checking if there was any option to accomplish what I wanted.
About the 2 lines, I meant these 2 lines in the header:
image

@attack68
Copy link
Contributor

those lines are the toprule and midrule. they are visible in both DF and Styler version.

In Styler version you can take them both away with hrules=False. you can also take just the midrule away by adding table styles for toprule and bottomrule and not including the midrule (and keeping Hrules=false)

@asapsmc
Copy link

asapsmc commented Dec 12, 2021

@attack68 : sorry, I misguided you: I understand what you explained about the lines, but what I meant was "is there any way I can put the header in one row (instead of 2 rows) without losing the ability of the multiindex grouping under the "dataset" column?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO LaTeX to_latex Styler conditional formatting using DataFrame.style
Projects
None yet