Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: flatten does not work with polars #9995

Closed
1 task done
GLeurquin opened this issue Sep 2, 2024 · 1 comment · Fixed by #9997
Closed
1 task done

bug: flatten does not work with polars #9995

GLeurquin opened this issue Sep 2, 2024 · 1 comment · Fixed by #9997
Labels
bug Incorrect behavior inside of ibis
Milestone

Comments

@GLeurquin
Copy link

What happened?

I'm taking the example from the docs for the flatten operation. I'm just adding the line to set the backend to polars

import ibis
import ibis.selectors as s
from ibis import _

# Use polars
ibis.set_backend("polars")

ibis.options.interactive = True
schema = {
     "empty": "array<array<int>>",
     "happy": "array<array<string>>",
     "nulls_only": "array<array<struct<a: array<string>>>>",
     "mixed_nulls": "array<array<string>>",
 }
data = {
     "empty": [[], [], []],
     "happy": [[["abc"]], [["bcd"]], [["def"]]],
     "nulls_only": [None, None, None],
     "mixed_nulls": [[], None, [None]],
 }
import pyarrow as pa
t = ibis.memtable(
     pa.Table.from_pydict(
         data,
         schema=ibis.schema(schema).to_pyarrow(),
     )
)

t.happy.flatten()

I expected to see:

┏━━━━━━━━━━━━━━━━━━━━━┓
┃ ArrayFlatten(happy) ┃
┡━━━━━━━━━━━━━━━━━━━━━┩
│ array<string>       │
├─────────────────────┤
│ ['abc']             │
│ ['bcd']             │
│ ['def']             │
└─────────────────────┘

But instead I see the following error:
ArrowNotImplementedError: Unsupported cast from large_list<item: large_string> to utf8 using function cast_string

After some investigation, I think that the issue comes from the polars translator for the operation ArrayFlatten:

@translate.register(ops.ArrayFlatten)
def array_flatten(op, **kw):
    return pl.concat_list(translate(op.arg, **kw))

Changing the implementation the following seems to solve the issue and produces the expected result

@translate.register(ops.ArrayFlatten)
def array_flatten(op, **kw):
    return translate(op.arg, **kw).flatten()

What version of ibis are you using?

9.3.0

What backend(s) are you using, if any?

polars 1.5.0

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@GLeurquin
Copy link
Author

GLeurquin commented Sep 16, 2024

Thanks @gforsyth and @cpcloud for the above quick fix. However I don't think it totally fixed the issue.

I am trying to get an array<array<int64>> to become an array<int64>

Using Polars:

import ibis
ibis.set_backend("polars")
t = ibis.memtable([
    {
        "arr": [[1, 5, 7], [3,4]]
    },
])
t.arr.flatten()

┏━━━━━━━━━━━━━━━━━━━┓
┃ ArrayFlatten(arr) ┃
┡━━━━━━━━━━━━━━━━━━━┩
│ array<int64>      │
├───────────────────┤
│ [1, 5, ... +1]    │
│ [3, 4]            │
└───────────────────┘

Using pandas:

ibis.set_backend("pandas")
t = ibis.memtable([
    {
        "arr": [[1, 5, 7], [3,4]]
    },
])
t.arr.flatten()

┏━━━━━━━━━━━━━━━━━━━┓
┃ ArrayFlatten(arr) ┃
┡━━━━━━━━━━━━━━━━━━━┩
│ array<int64>      │
├───────────────────┤
│ [1, 5, ... +3]    │
└───────────────────┘

Notice how using pandas I get a single line back with a single "flattened" array, while using polars, I get 2 rows back. Hence I think something is still missing somewhere, unless there is another way to achieve this using ibis or I misunderstand the flatten function. Anyways, I believe the result should be the same by using pandas or polars.

I expect the result to be [1, 5, 7, 3, 4] in both cases.

I created a new issue: #10135 to follow the topic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Archived in project
1 participant