Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datafusion): use pyarrow for type conversion #9299

Merged

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Jun 3, 2024

Description of changes

This PR moves datafusion type system conversion to use PyArrow, which is much
more robust than the SQL hacking we've been doing.

I've added two new methods to support this, required to workaround datafusion inconsistencies in output types
but also potentially useful for ClickHouse
:

  • DataType._as_nullable: returns a nullable version of the type, recursively converting for nested types.
  • DataType._as_non_nullable: similar to _as_nullable but going the other direction, making everything non-nullable.

Two analogous methods have been added to Schema as well.

Unit tests are included, including a hypothesis test to cover as many datatypes as possible.

This now is specific to datafusion to avoid an unnecessary maintenance burden for functionality that might never be used.

@cpcloud cpcloud force-pushed the datafusion-simpler-types-from-query branch 3 times, most recently from b5dc24b to 02f1bfc Compare June 3, 2024 19:44
@cpcloud cpcloud added this to the 9.1 milestone Jun 4, 2024
@cpcloud cpcloud added the datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) label Jun 4, 2024
@cpcloud cpcloud force-pushed the datafusion-simpler-types-from-query branch from 02f1bfc to fc40cbd Compare June 4, 2024 11:07
@cpcloud
Copy link
Member Author

cpcloud commented Jun 4, 2024

Going to make this only apply to datafusion instead of a broader set of functionality. No reason to do it unless needed/asked for and it's currently only needed in the DataFusion case.

@cpcloud cpcloud force-pushed the datafusion-simpler-types-from-query branch 2 times, most recently from c82a6b6 to c7b742a Compare June 4, 2024 12:53
@cpcloud cpcloud requested a review from jcrist June 4, 2024 14:13
@cpcloud cpcloud force-pushed the datafusion-simpler-types-from-query branch from c7b742a to 7c6d312 Compare June 4, 2024 14:14
@cpcloud cpcloud merged commit 5bef96a into ibis-project:main Jun 4, 2024
75 checks passed
@cpcloud cpcloud deleted the datafusion-simpler-types-from-query branch June 4, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants