Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Pyarrow dependency warning starts with newline which makes it impossible to filter out by message with -W or PYTHONWARNINGS #57082

Closed
3 tasks done
lesteve opened this issue Jan 26, 2024 · 16 comments
Labels
Arrow pyarrow functionality Blocker Blocking issue or pull request for an upcoming release Bug Deprecate Functionality to remove in pandas Warnings Warnings that appear or should be added to pandas
Milestone

Comments

@lesteve
Copy link
Contributor

lesteve commented Jan 26, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Issue Description

I would expect that the Pyarrow warning is easily ignorable by message i.e. something like the following does not produce any warning:

python -W 'ignore:Pyarrow:DeprecationWarning' -c 'import pandas'

The issue is that because the warning starts with a newline it can not be targetted by message. This seems to be the case of any warning that starts with some kind of whitespace character.

I tried a variety of things like -W 'ignore:\nPyarrow:DeprecationWarning' but I could not make it work. I think the reason is the "ignoring any whitespace at the start or end of message" from the warnings documentation:

In -W and PYTHONWARNINGS, message is a literal string that the start of the warning message must contain (case-insensitively), ignoring any whitespace at the start or end of message.

In scikit-learn tests warnings are turned into errors with pytest -W and we would like to use -W too to ignore the Pyarrow dependency warning for now. A possible work-around is to use filterwarnings which accept a regex, see Note in https://docs.pytest.org/en/stable/how-to/capture-warnings.html#controlling-warnings for more details, but being able to use -W or PYTHONWARNINGS seems desirable e.g. to control warnings from the command-line outside of pytest.

Expected Behavior

There is an easy way with -W or PYTHONWARNINGS to ignore the Pyarrow DeprecationWarning e.g.

python -W 'ignore:Pyarrow:DeprecationWarning' -c 'import pandas'

Installed Versions

INSTALLED VERSIONS ------------------ commit : f538741 python : 3.11.4.final.0 python-bits : 64 OS : Linux OS-release : 6.7.0-arch3-1 Version : #1 SMP PREEMPT_DYNAMIC Sat, 13 Jan 2024 14:37:14 +0000 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 2.2.0
numpy : 1.26.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.2
Cython : 0.29.35
pytest : 7.4.0
hypothesis : None
sphinx : 6.0.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.18.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.0
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.12.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : 0.21.0
tzdata : 2023.3
qtpy : None
pyqt5 : None

@lesteve lesteve added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 26, 2024
@jorenham
Copy link

This works:
(?s).*Pyarrow will become a required dependency of pandas:DeprecationWarning"

@lesteve
Copy link
Contributor Author

lesteve commented Jan 26, 2024

python -W or PYTHONWARNINGS does not handle regexes, see above link to the Python documentation, so no this does not work:

❯ python -W 'ignore:(?s).*Pyarrow:DeprecationWarning' -c 'import pandas' 
<string>:1: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466

As the doc says, regexes are accepted in warning.filterwarnings and your regex likely works with warning.filterwarnings

@jorenham
Copy link

Ah I didn't know that. I assumed it would, since it did work within my pytest config.

@lesteve
Copy link
Contributor Author

lesteve commented Jan 26, 2024

Yeah filterwarnings accepts regex contrary to -W quite a quirk which I did not know about either before very recently. This is mentioned in the pytest doc: https://docs.pytest.org/en/stable/how-to/capture-warnings.html#controlling-warnings

@rhshadrach rhshadrach added Deprecate Functionality to remove in pandas Warnings Warnings that appear or should be added to pandas Arrow pyarrow functionality and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 26, 2024
@rhshadrach rhshadrach added this to the 2.2.1 milestone Jan 26, 2024
@rhshadrach
Copy link
Member

Thanks for the report, a linked PR is already up to resolve this.

@lesteve
Copy link
Contributor Author

lesteve commented Jan 26, 2024

Nice, thanks! This is #57003 indeed. I looked for an open PR but didn't find it somehow ...

@rhshadrach
Copy link
Member

Just to be sure, I stated "already up" as more of a celebration and not an indication this shouldn't have been reported. I would only expect searching the issue tracker and not PRs.

@lesteve
Copy link
Contributor Author

lesteve commented Jan 26, 2024

Yep that's the way I understood it! I am quite happy someone had a similar issue as me and took the time to open a PR in pandas with some additional explanations.

To me, it seems quite a weird Python quirk that it is not possible to filter out a message starting with a whitespace with -W.

@rhshadrach rhshadrach added the Blocker Blocking issue or pull request for an upcoming release label Feb 7, 2024
@rhshadrach
Copy link
Member

@lithomas1 - I've marked this as a blocker for 2.2.1. Either we should remove the newline, or the entire deprecation, but either way I think this issue needs to be resolved for 2.2.1.

@lithomas1
Copy link
Member

Yep, this is on my radar - was planning to bring it up at the meeting next week.

@mroeschke
Copy link
Member

Preemptively +100 to remove the deprecation warning in 2.2.1

@mimizone
Copy link

mimizone commented Feb 9, 2024

Could you also remove the entire github link in the message or at least the https:?
The semi colon is a problem because the warnings package only split by semi colon. So it's actually impossible to use PYTHONWARNINGS anyway.
https://github.com/python/cpython/blob/5914a211ef5542edd1f792c2684e373a42647b04/Lib/warnings.py#L221

for example, you can see the problem here:
PYTHONWARNINGS=ignore:'message\nhttps:notacategory' python
Invalid -W option ignored: unknown warning category: 'notacategory'

It's easier to change this message than going and ask cpython to fix it I believe.
I will still make a bug report on cpython.

@lesteve
Copy link
Contributor Author

lesteve commented Feb 9, 2024

@mimizone you probably know this already but you don't have to use the full message, it only needs to match at the beginning. Since "https://" is towards the end of the message, I don't think it matters.

For example if there was not the newline at the start you could ignore the message like this:

PYTHONWARNINGS='ignore:Pyarrow:DeprecationWarning'

or

PYTHONWARNINGS='ignore:Pyarrow will become a required dependency:DeprecationWarning'

@mimizone
Copy link

mimizone commented Feb 9, 2024

thanks @lesteve for clarifying this point

@mimizone
Copy link

In the meantime, in case somebody is interested in filtering this in the context of Jupyter. I did it using the ipython config, which works well in my context.
https://discourse.jupyter.org/t/hide-deprecationwarning-via-the-server-config/23864/5?u=jhuylebroeck

@mroeschke
Copy link
Member

The warning ended up being removed in #57556 so closing. The warning will not appear in 2.2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Blocker Blocking issue or pull request for an upcoming release Bug Deprecate Functionality to remove in pandas Warnings Warnings that appear or should be added to pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants