Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: ibis raises with Polars Array #10244

Open
1 task done
MarcoGorelli opened this issue Sep 27, 2024 · 3 comments · May be fixed by #10260
Open
1 task done

bug: ibis raises with Polars Array #10244

MarcoGorelli opened this issue Sep 27, 2024 · 3 comments · May be fixed by #10260
Labels
bug Incorrect behavior inside of ibis

Comments

@MarcoGorelli
Copy link

MarcoGorelli commented Sep 27, 2024

What happened?

In [3]: import polars as pl

In [4]: import ibis

In [5]: ibis.memtable(pl.DataFrame({'a': [[1, 2], [3, 4]]}, schema={'a': pl.Array(pl.Int64, 2)}))
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[5], line 1
----> 1 ibis.memtable(pl.DataFrame({'a': [[1, 2], [3, 4]]}, schema={'a': pl.Array(pl.Int64, 2)}))

File ~/scratch/.venv/lib/python3.12/site-packages/ibis/expr/api.py:462, in memtable(data, columns, schema, name)
    457 if columns is not None and schema is not None:
    458     raise NotImplementedError(
    459         "passing `columns` and schema` is ambiguous; "
    460         "pass one or the other but not both"
    461     )
--> 462 return _memtable(data, name=name, schema=schema, columns=columns)

File ~/scratch/.venv/lib/python3.12/site-packages/ibis/common/dispatch.py:140, in lazy_singledispatch.<locals>.call(arg, *args, **kwargs)
    137 @functools.wraps(func)
    138 def call(arg, *args, **kwargs):
    139     impl = dispatcher.dispatch(type(arg))
--> 140     return impl(arg, *args, **kwargs)

File ~/scratch/.venv/lib/python3.12/site-packages/ibis/expr/api.py:557, in _memtable_from_polars_dataframe(data, name, schema, columns)
    553     assert schema is None, "if `columns` is not `None` then `schema` must be `None`"
    554     schema = sch.Schema(dict(zip(columns, sch.infer(data).values())))
    555 return ops.InMemoryTable(
    556     name=name if name is not None else util.gen_name("polars_memtable"),
--> 557     schema=sch.infer(data) if schema is None else schema,
    558     data=PolarsDataFrameProxy(data),
    559 ).to_expr()

File ~/scratch/.venv/lib/python3.12/site-packages/ibis/common/dispatch.py:140, in lazy_singledispatch.<locals>.call(arg, *args, **kwargs)
    137 @functools.wraps(func)
    138 def call(arg, *args, **kwargs):
    139     impl = dispatcher.dispatch(type(arg))
--> 140     return impl(arg, *args, **kwargs)

File ~/scratch/.venv/lib/python3.12/site-packages/ibis/expr/schema.py:373, in infer_polars_dataframe(df)
    368 @infer.register("polars.DataFrame")
    369 @infer.register("polars.LazyFrame")
    370 def infer_polars_dataframe(df):
    371     from ibis.formats.polars import PolarsSchema
--> 373     return PolarsSchema.to_ibis(df.collect_schema())

File ~/scratch/.venv/lib/python3.12/site-packages/ibis/formats/polars.py:128, in PolarsSchema.to_ibis(cls, schema)
    124 @classmethod
    125 def to_ibis(cls, schema: dict[str, pl.DataType]) -> Schema:
    126     """Convert a polars schema to a schema."""
    127     return Schema.from_tuples(
--> 128         [(name, PolarsType.to_ibis(typ)) for name, typ in schema.items()]
    129     )

File ~/scratch/.venv/lib/python3.12/site-packages/ibis/formats/polars.py:77, in PolarsType.to_ibis(cls, typ, nullable)
     72     return dt.Struct.from_tuples(
     73         [(field.name, cls.to_ibis(field.dtype)) for field in typ.fields],
     74         nullable=nullable,
     75     )
     76 else:
---> 77     return _from_polars_types[base_type](nullable=nullable)

KeyError: Array

What version of ibis are you using?

9.5.0

What backend(s) are you using, if any?

duckdb (default)

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@MarcoGorelli MarcoGorelli added the bug Incorrect behavior inside of ibis label Sep 27, 2024
@akanz1
Copy link
Contributor

akanz1 commented Sep 30, 2024

Just came across this too, slightly different error as i did not pass the schema explicitly

Ibis 9.5.0
Polars 0.20.31
Python 3.10.14

In [1]: import polars as pl

In [2]: import ibis

In [3]: pl_df = df = pl.DataFrame({
   ...:     'A': [1, 2, 3],
   ...:     'B': ['a', 'b', 'c']
   ...: })

In [4]: pd_df = pl_df.to_pandas()

In [5]: con = ibis.duckdb.connect()

In [6]: ibis.memtable(pl_df)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[6], line 1
----> 1 ibis.memtable(pl_df)

File ~/code/.venv/lib/python3.10/site-packages/ibis/expr/api.py:462, in memtable(data, columns, schema, name)
    457 if columns is not None and schema is not None:
    458     raise NotImplementedError(
    459         "passing `columns` and schema` is ambiguous; "
    460         "pass one or the other but not both"
    461     )
--> 462 return _memtable(data, name=name, schema=schema, columns=columns)

File ~/code/.venv/lib/python3.10/site-packages/ibis/common/dispatch.py:140, in lazy_singledispatch.<locals>.call(arg, *args, **kwargs)
    137 @functools.wraps(func)
    138 def call(arg, *args, **kwargs):
    139     impl = dispatcher.dispatch(type(arg))
--> 140     return impl(arg, *args, **kwargs)

File ~/code/.venv/lib/python3.10/site-packages/ibis/expr/api.py:557, in _memtable_from_polars_dataframe(data, name, schema, columns)
    553     assert schema is None, "if `columns` is not `None` then `schema` must be `None`"
    554     schema = sch.Schema(dict(zip(columns, sch.infer(data).values())))
    555 return ops.InMemoryTable(
    556     name=name if name is not None else util.gen_name("polars_memtable"),
--> 557     schema=sch.infer(data) if schema is None else schema,
    558     data=PolarsDataFrameProxy(data),
    559 ).to_expr()

File ~/code/.venv/lib/python3.10/site-packages/ibis/common/dispatch.py:140, in lazy_singledispatch.<locals>.call(arg, *args, **kwargs)
    137 @functools.wraps(func)
    138 def call(arg, *args, **kwargs):
    139     impl = dispatcher.dispatch(type(arg))
--> 140     return impl(arg, *args, **kwargs)

File ~/code/.venv/lib/python3.10/site-packages/ibis/expr/schema.py:373, in infer_polars_dataframe(df)
    368 @infer.register("polars.DataFrame")
    369 @infer.register("polars.LazyFrame")
    370 def infer_polars_dataframe(df):
    371     from ibis.formats.polars import PolarsSchema
--> 373     return PolarsSchema.to_ibis(df.collect_schema())

AttributeError: 'DataFrame' object has no attribute 'collect_schema'

In [7]: ibis.memtable(pd_df)
Out[7]:
InMemoryTable
  data:
    PandasDataFrameProxy:
         A  B
      0  1  a
      1  2  b
      2  3  c

@cpcloud
Copy link
Member

cpcloud commented Oct 1, 2024

@akanz1 @MarcoGorelli These are two different errors for two different reasons. I'll open up another issue for the second one.

@cpcloud
Copy link
Member

cpcloud commented Oct 1, 2024

@akanz1 Actually, you're using a version of Polars that ibis doesn't support. Ibis only supports version 1 or higher. Please try again with the latest version and open a new issue if you continue to encounter a problem.

@cpcloud cpcloud linked a pull request Oct 1, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

Successfully merging a pull request may close this issue.

3 participants