ENH: Adding highlighting options to `to_latex` function #38328

yanaiela · 2020-12-06T11:44:51Z

Following #3196, it would be extremely useful to add the possibility of highlighting the best value in a row or column of a DataFrame object when converting into a latex table (the to_latex function)

A convenient api for that would be to add a parameter to the to_latex() function, such as highlight_rows=TYPE, where TYPE can be bold|italics|... and than each row's best value would be highlighted.

There could also be a scenario where the best value is the lowest one, so maybe an additional parameter should be best_highlight: str between high|low.

The text was updated successfully, but these errors were encountered:

arw2019 · 2020-12-09T04:47:26Z

Looking through the discussion in #3196 it seems like there was interest in doing things like this but nobody found time to do the legwork.

Would you be interested in submitting a PR?

ivanovmg · 2020-12-10T07:59:34Z

@yanaiela, it would be great if you would post the expected output latex code here.
It would let us better understand the amount of work required.

From my perspective highlighting the rows is quite doable thing.
However, how do we need to handle multiindex? Or multicolumn index?
In my opinion, this is the most complicated part.

You refer to the best value. But what would it be? Should there be a function passed to define it?

Therefore, please provide the latex code, or even better suggest some tests.

yanaiela · 2020-12-30T11:57:31Z

Hey @arw2019, @ivanovmg, sorry for the late reply.

I'd be happy to get this functionality first even without handling multiindex or multicolumn. Different people may have different functionalities in mind, but for me I'd say the multiindex should be highlighted based on the best value in each of these indices.

For the best value, as I mentioned, I think it can be passed as an argument, since it can be either based on small or large values (but I can image other scenarios). So maybe the best way would be to support the more common usage of min and max (to my perspective) but also allow to pass a function that will be used instead?

For an example:

If currently a latex table would look like:

\begin{tabular}{lr}
\toprule
        types &  vals \\
\midrule
     model1 & 0.75 \\
     model2 & 0.65 \\
	 model3 & 0.68 \\
\bottomrule
\end{tabular}

to_latex(highlight_rows='bold', best_highlight='min')
would create the following:

\begin{tabular}{lr}
\toprule
        types &  vals \\
\midrule
     model1 & \textbf{0.75} \\
     model2 & 0.65 \\
	 model3 & 0.68 \\
\bottomrule
\end{tabular}

MaxSchambach · 2021-01-24T13:53:43Z

For anyone that came here like me looking for a temporary solution:
I came up with the following which could be a little shorte but I think should get people started with further conditional formatting:

On the Python side:

from functools import partial

import pandas as pd
import numpy as np


def bold_formatter(x, value, num_decimals=2):
    """Format a number in bold when (almost) identical to a given value.
    
    Args:
        x: Input number.
        
        value: Value to compare x with.
        
        num_decimals: Number of decimals to use for output format.

    Returns:
        String converted output.

    """
    # Consider values equal, when rounded results are equal
    # otherwise, it may look surprising in the table where they seem identical
    if round(x, num_decimals) == round(value, num_decimals):
        return f"{{\\bfseries\\num{{{x:.{num_decimals}f}}}}}"
    else:
        return f"\\num{{{x:.{num_decimals}f}}}"


df = pd.DataFrame(np.array([[1.123456, 2.123456, 3.123456, 4.123456],
                            [11.123456, 22.123456, 33.123456, 44.123456],
                            [111.123456, 222.123456, 333.123456, 444.123456],]),
                   columns=['a', 'b', 'c', 'd'])

col_names = ['a in \\si{\\meter}',
             'b in \\si{\\volt}',
             'c in \\si{\\seconds}',
             'd']

# Colums to format with maximum condition and 2 floating decimals
max_columns_2f = ['a']

# Colums to format with minimum condition and 2 floating decimals
min_columns_2f = ['b', 'c']

# Colums to format with minimum condition and 4 floating decimals
min_columns_4f= ['d']

fmts_max_2f = {column: partial(bold_formatter, value=df[column].max(), num_decimals=2) for column in max_columns_2f}
fmts_min_2f = {column: partial(bold_formatter, value=df[column].min(), num_decimals=2) for column in min_columns_2f}
fmts_min_4f = {column: partial(bold_formatter, value=df[column].min(), num_decimals=4) for column in min_columns_4f}

fmts = dict(**fmts_max_2f, **fmts_min_2f, **fmts_min_4f)

with open("test_table.tex", "w") as fh:
    df.to_latex(buf=fh,
                index=False,
                header=col_names,
                formatters=fmts,
                escape=False)

Of course, this could be made a bit shorter, however I believe this way it is still pretty readable which I think improves adaptability.

In your LaTex code, use

\usepackage{booktabs}
\usepackage{siunitx}

\begin{table}
	\centering
	\caption{Test table.}
	\label{tab:test-table}
	\input{test_table.tex}
\end{table}

This should procude something like this:

Using the siunitx package and wrapping the table numbers in \num{} has the advantage that the number formatting can also be globally changed at the latex side of things. In particular, decimal seperators can be changed from . to , which is used in many European countries without having to change the python part (of course, it could also be done there using the locale package.)

matteoguarrera · 2021-04-22T05:00:41Z

For the min and max just use a function like that:

min_pandas = df.min(1)
def f_tex(x):
    if x in min_pandas.values:
        return '\\textbf{' +f'{x:0.2f}'+ '}'
    else:
        return f'{x:0.2f}'
    
df.to_latex(  buf = name, bold_rows =True,  escape = False,
                formatters = [f_tex]*len(df.columns))

Hope this could be helpful.
#40422

MaxSchambach · 2021-04-22T06:26:05Z

@matteoguarrera this does not really do what it's supposed to.
If you use my example above, and alter it a little bit to

df = pd.DataFrame(np.array([[1.123456, 2.123456, 1, 4.123456],
                            [11.123456, 22.123456, 11.123456, 11.123456],
                            [111.123456, 222.123456, 111.123456, 444.123456],]),
                   columns=['a', 'b', 'c', 'd'])

you formatter yields

so its formatting rows and not columns.
Furhtermore, comparing against a globally defined list of minima is neither very explicit nor is it suitable since it leads to mixups between rows, as can be seen below. That is, if one value happens to be the minimum in another column, it will be bold formatted, even though it is not a minimum within the column under consideration. This is why I used the partialevaluation in my example above. See for example this:

df = pd.DataFrame(np.array([[1, 2, 1, 2],
                            [3, 4, 11, 3],
                            [10, 5, 11, 4],]),
                   columns=['a', 'b', 'c', 'd'])

giving with your formatter

which does not make sense, neither row-, nor column-wise.

attack68 · 2021-05-01T15:03:18Z

@MaxSchambach
With #40422 (hopefully for 1.3.0 end of May) it will be possible to achieve your output with:

df = pd.DataFrame(np.array([[1.123456, 2.123456, 1, 4.123456],
                            [11.123456, 22.123456, 11.123456, 11.123456],
                            [111.123456, 222.123456, 111.123456, 444.123456],]),
                   columns=['a', 'b', 'c', 'd'])
df.style.highlight_min(subset=['a'], props='textbf:--rwrap;')\
        .highlight_max(subset=['b','c','d'], props='textbf:--rwrap').to_latex(hrules=True)

\begin{tabular}{lrrrr}
\toprule
{} & {a} & {b} & {c} & {d} \\
\midrule
0 & \textbf{1.123456} & 2.123456 & 1.000000 & 4.123456 \\
1 & 11.123456 & 22.123456 & 11.123456 & 11.123456 \\
2 & 111.123456 & \textbf{222.123456} & \textbf{111.123456} & \textbf{444.123456} \\
\bottomrule
\end{tabular}

This is of course a specific example, but the entire functionality of Styler for CSS is being reproduced for use with LaTeX.

MaxSchambach · 2021-05-03T06:25:41Z

Yes, looks promising!

asapsmc · 2021-12-09T21:16:44Z

df.style.highlight_min(subset=['a'], props='textbf:--rwrap;')
.highlight_max(subset=['b','c','d'], props='textbf:--rwrap').to_latex(hrules=True)

Is this possible in current version?

yanaiela added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 6, 2020

arw2019 added IO LaTeX to_latex API Design and removed Needs Triage Issue that has not been reviewed by a pandas team member API Design labels Dec 9, 2020

attack68 mentioned this issue May 9, 2021

ENH: Styler.to_latex(): conditional styling with native latex format #40422

Merged

15 tasks

jreback added this to the 1.3 milestone May 21, 2021

jreback closed this as completed in #40422 May 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Adding highlighting options to `to_latex` function #38328

ENH: Adding highlighting options to `to_latex` function #38328

yanaiela commented Dec 6, 2020

arw2019 commented Dec 9, 2020

ivanovmg commented Dec 10, 2020 •

edited

Loading

yanaiela commented Dec 30, 2020

MaxSchambach commented Jan 24, 2021

matteoguarrera commented Apr 22, 2021 •

edited

Loading

MaxSchambach commented Apr 22, 2021 •

edited

Loading

attack68 commented May 1, 2021

MaxSchambach commented May 3, 2021

asapsmc commented Dec 9, 2021

ENH: Adding highlighting options to to_latex function #38328

ENH: Adding highlighting options to to_latex function #38328

Comments

yanaiela commented Dec 6, 2020

arw2019 commented Dec 9, 2020

ivanovmg commented Dec 10, 2020 • edited Loading

yanaiela commented Dec 30, 2020

MaxSchambach commented Jan 24, 2021

matteoguarrera commented Apr 22, 2021 • edited Loading

MaxSchambach commented Apr 22, 2021 • edited Loading

attack68 commented May 1, 2021

MaxSchambach commented May 3, 2021

asapsmc commented Dec 9, 2021

ENH: Adding highlighting options to `to_latex` function #38328

ENH: Adding highlighting options to `to_latex` function #38328

ivanovmg commented Dec 10, 2020 •

edited

Loading

matteoguarrera commented Apr 22, 2021 •

edited

Loading

MaxSchambach commented Apr 22, 2021 •

edited

Loading