Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(postgres): to_pyarrow fails with json type #8318

Closed
1 task done
turntable-justin opened this issue Feb 12, 2024 · 6 comments · Fixed by #8439
Closed
1 task done

bug(postgres): to_pyarrow fails with json type #8318

turntable-justin opened this issue Feb 12, 2024 · 6 comments · Fixed by #8439
Assignees
Labels
bug Incorrect behavior inside of ibis postgres The PostgreSQL backend

Comments

@turntable-justin
Copy link

What happened?

Hi there -- I'm trying to convert a postgres table to pyarrow and getting this error:

ArrowNotImplementedError: extension

I went into the ibis backend to look at what Ibis thinks the types are, and found this is what it thinks the array types are

struct<id: string not null, created_at: timestamp[us, tz=UTC], response: extension<ibis.json<JSONType>>, email: string>

And here is the ibis expression:

r0 := DatabaseTable: waitlist
  id         !uuid
  created_at timestamp('UTC')
  response   json
  email      string
Limit[r0, n=10000]

What version of ibis are you using?

8.0.0

What backend(s) are you using, if any?

Postgres

Relevant log output

--------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
Cell In[20], line 1
----> 1 w.execution_helper().to_pyarrow()

File ~/Documents/GitHub/spoonbill/ibis/expr/types/core.py:444, in Expr.to_pyarrow(self, params, limit, **kwargs)
    416 @experimental
    417 def to_pyarrow(
    418     self,
   (...)
    422     **kwargs: Any,
    423 ) -> pa.Table:
    424     """Execute expression and return results in as a pyarrow table.
    425
    426     This method is eager and will execute the associated expression
   (...)
    442         A pyarrow table holding the results of the executed expression.
    443     """
--> 444     return self._find_backend(use_default=True).to_pyarrow(
    445         self, params=params, limit=limit, **kwargs
    446     )

File ~/Documents/GitHub/spoonbill/ibis/backends/base/__init__.py:367, in _FileIOHandler.to_pyarrow(self, expr, params, limit, **kwargs)
    363 arrow_schema = schema.to_pyarrow()
    364 with self.to_pyarrow_batches(
    365     table_expr, params=params, limit=limit, **kwargs
    366 ) as reader:
--> 367     table = pa.Table.from_batches(reader, schema=arrow_schema)
    369 return expr.__pyarrow_result__(
    370     table.rename_columns(table_expr.columns).cast(arrow_schema)
    371 )

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/table.pxi:4104, in pyarrow.lib.Table.from_batches()

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/ipc.pxi:666, in pyarrow.lib.RecordBatchReader.__next__()

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/ipc.pxi:700, in pyarrow.lib.RecordBatchReader.read_next_batch()

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/types.pxi:88, in pyarrow.lib._datatype_to_pep3118()

File ~/Documents/GitHub/spoonbill/ibis/backends/base/sql/__init__.py:245, in <genexpr>(.0)
    242 array_type = schema.as_struct().to_pyarrow()
    243 print(array_type)
    244 arrays = (
--> 245     pa.array(map(tuple, batch), type=array_type)
    246     for batch in self._cursor_batches(
    247         expr, params=params, limit=limit, chunk_size=chunk_size
    248     )
    249 )
    250 batches = map(pa.RecordBatch.from_struct_array, arrays)
    252 return pa.ipc.RecordBatchReader.from_batches(schema.to_pyarrow(), batches)

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/array.pxi:344, in pyarrow.lib.array()

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/array.pxi:42, in pyarrow.lib._sequence_to_array()

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/error.pxi:91, in pyarrow.lib.check_status()

Code of Conduct

  • I agree to follow this project's Code of Conduct
@turntable-justin turntable-justin added the bug Incorrect behavior inside of ibis label Feb 12, 2024
@turntable-justin turntable-justin changed the title bug Postgres to_pyarrow bug with json type Feb 12, 2024
@cpcloud cpcloud changed the title Postgres to_pyarrow bug with json type bug(postgres): to_pyarrow fails with json type Feb 14, 2024
@jcrist jcrist added the postgres The PostgreSQL backend label Feb 15, 2024
@turntable-justin
Copy link
Author

@cpcloud -- confirmed that this still exists with v. 9.0.0

@cpcloud
Copy link
Member

cpcloud commented Feb 16, 2024

Yep, this is a problem in the postgres to_pyarrow implementation, which wasn't touched much in the latest big refactor.

@turntable-justin
Copy link
Author

I think the underlying execute() function also uses to_pyarrow, so effectively the connector is blocked. Is there an alternative that will work? What is the ETA for fixing this?

@cpcloud
Copy link
Member

cpcloud commented Feb 21, 2024

You should be able to use something like

pa.Table.from_pandas(expr.to_pandas())

If you definitely need a PyArrow Table.

@cpcloud
Copy link
Member

cpcloud commented Feb 21, 2024

Not 100% sure what the ETA is. It will probably be in the next release, but no promises :)

@turntable-justin
Copy link
Author

turntable-justin commented Feb 22, 2024 via email

@cpcloud cpcloud self-assigned this Feb 23, 2024
kszucs pushed a commit that referenced this issue Feb 27, 2024
Localize custom pyarrow json serialization to snowflake. This custom
type doesn't compose well (e.g., inside structs, which aren't directly
supported in snowflake anyway) and was causing problems for other
backends when trying to convert rows into a struct of the table schema
and then into a proper table.

Fixes #8318.
@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis postgres The PostgreSQL backend
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants