Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Adding highlighting options to to_latex function #38328

Closed
yanaiela opened this issue Dec 6, 2020 · 9 comments · Fixed by #40422
Closed

ENH: Adding highlighting options to to_latex function #38328

yanaiela opened this issue Dec 6, 2020 · 9 comments · Fixed by #40422
Milestone

Comments

@yanaiela
Copy link

yanaiela commented Dec 6, 2020

Following #3196, it would be extremely useful to add the possibility of highlighting the best value in a row or column of a DataFrame object when converting into a latex table (the to_latex function)

A convenient api for that would be to add a parameter to the to_latex() function, such as highlight_rows=TYPE, where TYPE can be bold|italics|... and than each row's best value would be highlighted.

There could also be a scenario where the best value is the lowest one, so maybe an additional parameter should be best_highlight: str between high|low.

@yanaiela yanaiela added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 6, 2020
@arw2019 arw2019 added IO LaTeX to_latex API Design and removed Needs Triage Issue that has not been reviewed by a pandas team member API Design labels Dec 9, 2020
@arw2019
Copy link
Member

arw2019 commented Dec 9, 2020

Looking through the discussion in #3196 it seems like there was interest in doing things like this but nobody found time to do the legwork.

Would you be interested in submitting a PR?

@ivanovmg
Copy link
Member

ivanovmg commented Dec 10, 2020

@yanaiela, it would be great if you would post the expected output latex code here.
It would let us better understand the amount of work required.

From my perspective highlighting the rows is quite doable thing.
However, how do we need to handle multiindex? Or multicolumn index?
In my opinion, this is the most complicated part.

You refer to the best value. But what would it be? Should there be a function passed to define it?

Therefore, please provide the latex code, or even better suggest some tests.

@yanaiela
Copy link
Author

Hey @arw2019, @ivanovmg, sorry for the late reply.

I'd be happy to get this functionality first even without handling multiindex or multicolumn. Different people may have different functionalities in mind, but for me I'd say the multiindex should be highlighted based on the best value in each of these indices.

For the best value, as I mentioned, I think it can be passed as an argument, since it can be either based on small or large values (but I can image other scenarios). So maybe the best way would be to support the more common usage of min and max (to my perspective) but also allow to pass a function that will be used instead?

For an example:

If currently a latex table would look like:

\begin{tabular}{lr}
\toprule
        types &  vals \\
\midrule
     model1 & 0.75 \\
     model2 & 0.65 \\
	 model3 & 0.68 \\
\bottomrule
\end{tabular}

to_latex(highlight_rows='bold', best_highlight='min')
would create the following:

\begin{tabular}{lr}
\toprule
        types &  vals \\
\midrule
     model1 & \textbf{0.75} \\
     model2 & 0.65 \\
	 model3 & 0.68 \\
\bottomrule
\end{tabular}

@MaxSchambach
Copy link

For anyone that came here like me looking for a temporary solution:
I came up with the following which could be a little shorte but I think should get people started with further conditional formatting:

On the Python side:

from functools import partial

import pandas as pd
import numpy as np


def bold_formatter(x, value, num_decimals=2):
    """Format a number in bold when (almost) identical to a given value.
    
    Args:
        x: Input number.
        
        value: Value to compare x with.
        
        num_decimals: Number of decimals to use for output format.

    Returns:
        String converted output.

    """
    # Consider values equal, when rounded results are equal
    # otherwise, it may look surprising in the table where they seem identical
    if round(x, num_decimals) == round(value, num_decimals):
        return f"{{\\bfseries\\num{{{x:.{num_decimals}f}}}}}"
    else:
        return f"\\num{{{x:.{num_decimals}f}}}"


df = pd.DataFrame(np.array([[1.123456, 2.123456, 3.123456, 4.123456],
                            [11.123456, 22.123456, 33.123456, 44.123456],
                            [111.123456, 222.123456, 333.123456, 444.123456],]),
                   columns=['a', 'b', 'c', 'd'])

col_names = ['a in \\si{\\meter}',
             'b in \\si{\\volt}',
             'c in \\si{\\seconds}',
             'd']

# Colums to format with maximum condition and 2 floating decimals
max_columns_2f = ['a']

# Colums to format with minimum condition and 2 floating decimals
min_columns_2f = ['b', 'c']

# Colums to format with minimum condition and 4 floating decimals
min_columns_4f= ['d']

fmts_max_2f = {column: partial(bold_formatter, value=df[column].max(), num_decimals=2) for column in max_columns_2f}
fmts_min_2f = {column: partial(bold_formatter, value=df[column].min(), num_decimals=2) for column in min_columns_2f}
fmts_min_4f = {column: partial(bold_formatter, value=df[column].min(), num_decimals=4) for column in min_columns_4f}

fmts = dict(**fmts_max_2f, **fmts_min_2f, **fmts_min_4f)

with open("test_table.tex", "w") as fh:
    df.to_latex(buf=fh,
                index=False,
                header=col_names,
                formatters=fmts,
                escape=False)

Of course, this could be made a bit shorter, however I believe this way it is still pretty readable which I think improves adaptability.

In your LaTex code, use

\usepackage{booktabs}
\usepackage{siunitx}

\begin{table}
	\centering
	\caption{Test table.}
	\label{tab:test-table}
	\input{test_table.tex}
\end{table}

This should procude something like this:

test_table

Using the siunitx package and wrapping the table numbers in \num{} has the advantage that the number formatting can also be globally changed at the latex side of things. In particular, decimal seperators can be changed from . to , which is used in many European countries without having to change the python part (of course, it could also be done there using the locale package.)

@matteoguarrera
Copy link

matteoguarrera commented Apr 22, 2021

For the min and max just use a function like that:

min_pandas = df.min(1)
def f_tex(x):
    if x in min_pandas.values:
        return '\\textbf{' +f'{x:0.2f}'+ '}'
    else:
        return f'{x:0.2f}'
    
df.to_latex(  buf = name, bold_rows =True,  escape = False,
                formatters = [f_tex]*len(df.columns))

Hope this could be helpful.
#40422

@MaxSchambach
Copy link

MaxSchambach commented Apr 22, 2021

@matteoguarrera this does not really do what it's supposed to.
If you use my example above, and alter it a little bit to

df = pd.DataFrame(np.array([[1.123456, 2.123456, 1, 4.123456],
                            [11.123456, 22.123456, 11.123456, 11.123456],
                            [111.123456, 222.123456, 111.123456, 444.123456],]),
                   columns=['a', 'b', 'c', 'd'])

you formatter yields
Screenshot from 2021-04-22 08-23-13_2

so its formatting rows and not columns.
Furhtermore, comparing against a globally defined list of minima is neither very explicit nor is it suitable since it leads to mixups between rows, as can be seen below. That is, if one value happens to be the minimum in another column, it will be bold formatted, even though it is not a minimum within the column under consideration. This is why I used the partialevaluation in my example above. See for example this:

df = pd.DataFrame(np.array([[1, 2, 1, 2],
                            [3, 4, 11, 3],
                            [10, 5, 11, 4],]),
                   columns=['a', 'b', 'c', 'd'])

giving with your formatter
Screenshot from 2021-04-22 08-34-16

which does not make sense, neither row-, nor column-wise.

@attack68
Copy link
Contributor

attack68 commented May 1, 2021

@MaxSchambach
With #40422 (hopefully for 1.3.0 end of May) it will be possible to achieve your output with:

df = pd.DataFrame(np.array([[1.123456, 2.123456, 1, 4.123456],
                            [11.123456, 22.123456, 11.123456, 11.123456],
                            [111.123456, 222.123456, 111.123456, 444.123456],]),
                   columns=['a', 'b', 'c', 'd'])
df.style.highlight_min(subset=['a'], props='textbf:--rwrap;')\
        .highlight_max(subset=['b','c','d'], props='textbf:--rwrap').to_latex(hrules=True)

\begin{tabular}{lrrrr}
\toprule
{} & {a} & {b} & {c} & {d} \\
\midrule
0 & \textbf{1.123456} & 2.123456 & 1.000000 & 4.123456 \\
1 & 11.123456 & 22.123456 & 11.123456 & 11.123456 \\
2 & 111.123456 & \textbf{222.123456} & \textbf{111.123456} & \textbf{444.123456} \\
\bottomrule
\end{tabular}

This is of course a specific example, but the entire functionality of Styler for CSS is being reproduced for use with LaTeX.

@MaxSchambach
Copy link

Yes, looks promising!

@asapsmc
Copy link

asapsmc commented Dec 9, 2021

df.style.highlight_min(subset=['a'], props='textbf:--rwrap;')
.highlight_max(subset=['b','c','d'], props='textbf:--rwrap').to_latex(hrules=True)

Is this possible in current version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants