Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: 1.3.0 styler.render() throws exception when using highlight_max/highlight_min (code worked with pandas 1.2.5) #42466

Closed
2 of 3 tasks
rendner opened this issue Jul 9, 2021 · 6 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@rendner
Copy link
Contributor

rendner commented Jul 9, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample

These are minimal examples which throw an exception. The same problem exist when you use highlight_min instead of highlight_max.

import pandas as pd

# throws a TypeError("'>=' not supported between instances of 'int' and 'str'")
# pd.DataFrame({'col_0': [0, 'a']}).style.highlight_max().render()

# throws a TypeError("'>=' not supported between instances of 'Timestamp' and 'int'")
# pd.DataFrame({'col_0': [pd.to_datetime('1/1/2000'), 0]}).style.highlight_max().render()

# throws a ValueError('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()')
pd.DataFrame({'col_0': [pd.period_range(start='2000-01-01', end='2000-01-02')]}).style.highlight_max().render()

Problem description

All examples worked in pandas 1.2.5 without throwing an exception.

Expected Output

The code doesn't throw an exception.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f00ed8f
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-77-generic
Version : #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.0
numpy : 1.21.0
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.2
setuptools : 57.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@rendner rendner added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 9, 2021
@rendner rendner changed the title BUG: 1.3.0 styler.highlight_max/styler.highlight_min throw exceptions for code which worked with pandas 1.2.5 BUG: 1.3.0 styler.render() throws exception when using highlight_max/styler.highlight_min (code worked with pandas 1.2.5) Jul 9, 2021
@rendner rendner changed the title BUG: 1.3.0 styler.render() throws exception when using highlight_max/styler.highlight_min (code worked with pandas 1.2.5) BUG: 1.3.0 styler.render() throws exception when using highlight_max/highlight_min (code worked with pandas 1.2.5) Jul 9, 2021
@attack68
Copy link
Contributor

attack68 commented Jul 9, 2021

Do you mean this:

# throws a ValueError('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()')
pd.DataFrame({'col_0': [pd.period_range(start='2000-01-01', end='2000-01-02')]}).style.highlight_max().render()

or do you mean this (notice no list):

pd.DataFrame({'col_0': pd.period_range(start='2000-01-01', end='2000-01-02')}).style.highlight_max().render()

which works just fine?

@attack68
Copy link
Contributor

attack68 commented Jul 9, 2021

Additionally it is not Styler throwing the error it is numpy.nanmax:

df = pd.DataFrame({'col_0': [0, 'a']})
numpy.nanmax(df["col_0"])
# TypeError: '>=' not supported between instances of 'int' and 'str'

@rendner
Copy link
Contributor Author

rendner commented Jul 10, 2021

Additionally it is not Styler throwing the error it is numpy.nanmax:

That's correct.

or do you mean this (notice no list):
pd.DataFrame({'col_0': pd.period_range(start='2000-01-01', end='2000-01-02')}).style.highlight_max().render()
which works just fine?

I tried to simplify the problem, my original was (which runs into the same error):

import pandas as pd

# throws a # ValueError('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()')
pd.DataFrame({'col_0': [pd.to_datetime('1/1/2000'), pd.to_timedelta(0, unit='d'), pd.period_range(start='2000-01-01', end='2000-01-02')]}).style.highlight_max().render()

I did some research. Version 1.2.5 also uses numpy.nanmax:
https://github.com/pandas-dev/pandas/blob/v1.2.5/pandas/io/formats/style.py#L1547
But there was an additional step before:
https://github.com/pandas-dev/pandas/blob/v1.2.5/pandas/io/formats/style.py#L1531

In 1.3.0 the maybe_numeric_slice call is no longer present:
https://github.com/pandas-dev/pandas/blob/v1.3.0/pandas/io/formats/style.py#L2277-L2286

@attack68
Copy link
Contributor

I don't believe this is a bug or your expected output is warranted.

Your original code works in 1.2.5 simply because maybe_numeric_slice detects that dtype is not in [np.number], so highlight_max is not applied to col_0, but this was never documented that it took place. You cannot highlight anything in col_0 because it makes no sense to compare the different object types.

Your requests comes down to either:

  • pandas handling errors silently when a column with incompatible dtypes is present for a less than comparison.
  • or pandas raising this error to highlight the method fails and why.

I prefer the second for two reasons.

  1. There is a subset argument provided which allows the user to select his columns (or rows) upon which the function should work, being reactive to an error message if necessary.
  2. In 1.2.5 the following did nothing:
df = DataFrame([['a', 'b'], ['c', 'd']])
df.style.highlight_max().render()  # pre-selects only numeric columns

In 1.3.0 there is now a correct comparison made and the maximum of strings is highlighted.

df.style.highlight_max().render()  # highlights 'c' and 'd'

I think it is more valuable for the method to work where it should, than avoid errors where it can't.

@rendner
Copy link
Contributor Author

rendner commented Jul 10, 2021

Thank you. I understand the reasons and can live with the changed and now more consistent behavior. Because maybe_numeric_slice only adjusted the subset if it was None. I was just a bit irritated because I couldn't find any hint about the changed behavior in the release notes.

@rendner rendner closed this as completed Jul 10, 2021
@attack68
Copy link
Contributor

I was just a bit irritated because I couldn't find any hint about the changed behavior in the release notes.

Fair point, was my bad, must have missed it when I made the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants