Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas DataFrames with a multiindex fails to be converted to a Polars DataFrame #18130

Closed
2 tasks done
mwouts opened this issue Aug 10, 2024 · 4 comments · Fixed by #18133
Closed
2 tasks done

Pandas DataFrames with a multiindex fails to be converted to a Polars DataFrame #18130

mwouts opened this issue Aug 10, 2024 · 4 comments · Fixed by #18133
Labels
A-interop-pandas Area: interoperability with pandas bug Something isn't working python Related to Python Polars regression Issue introduced by a new release

Comments

@mwouts
Copy link

mwouts commented Aug 10, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import pandas as pd
import polars as pl

df = pd.DataFrame(
    range(4),
    columns=['A'],
    index=pd.MultiIndex.from_product((["C", "D"], [3, 4])),
)

pl.from_pandas(df)

Log output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/marc/miniconda3/envs/itables/lib/python3.11/site-packages/polars/convert/general.py", line 570, in from_pandas
    pandas_to_pydf(
  File "/home/marc/miniconda3/envs/itables/lib/python3.11/site-packages/polars/_utils/construction/dataframe.py", line 1078, in pandas_to_pydf
    _check_pandas_columns(data)
  File "/home/marc/miniconda3/envs/itables/lib/python3.11/site-packages/polars/_utils/construction/dataframe.py", line 1059, in _check_pandas_columns
    raise ValueError(msg)
ValueError: Pandas dataframe contains non-unique indices and/or column names. Polars dataframes require unique string names for columns.

Issue description

The code above passes with polars==1.1.0, but not with 1.2.0 to 1.4.1

Expected behavior

I expect to be able to call pl.from_pandas(df) on a dataframe with a multiindex

Installed versions

>>> pl.show_versions()
--------Version info---------
Polars:               1.4.1
Index type:           UInt32
Platform:             Linux-6.5.0-35-generic-x86_64-with-glibc2.35
Python:               3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           3.8.4
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              14.0.2
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           2.0.30
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@mwouts mwouts added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Aug 10, 2024
mwouts added a commit to mwouts/itables that referenced this issue Aug 10, 2024
mwouts added a commit to mwouts/itables that referenced this issue Aug 10, 2024
@MarcoGorelli MarcoGorelli added regression Issue introduced by a new release bug Something isn't working A-interop-pandas Area: interoperability with pandas and removed needs triage Awaiting prioritization by a maintainer bug Something isn't working labels Aug 10, 2024
@ritchie46
Copy link
Member

@MarcoGorelli could you take a look at this one? Might be related to our own from_pandas logic?

@MarcoGorelli
Copy link
Collaborator

Hey - looks like this was caused by #17628

@MarcoGorelli
Copy link
Collaborator

The index in this case has a name of None for both levels:

In [4]: df.index.names
Out[4]: FrozenList([None, None])

So, if include_index=True had been passed, I'd agree with the error message

But, given that the default is include_index=False, I think this should pass, and so this is a bug

@mwouts
Copy link
Author

mwouts commented Aug 11, 2024

Thank you all for the super fast resolution! Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interop-pandas Area: interoperability with pandas bug Something isn't working python Related to Python Polars regression Issue introduced by a new release
Projects
None yet
3 participants