-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add to_arrow
to get a pyarrow.Table
from query results.
#8609
Add to_arrow
to get a pyarrow.Table
from query results.
#8609
Conversation
An Arrow `Table` supports a richer set of types than a pandas `DataFrame`, and is the basis of many data analysis systems. It can be used in conjunction with pandas through the `Table.to_pandas()` method or the pandas extension types provided by the `fletcher` package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the tests seem to deal with NULL interactions. Might be good to add something? From a brief read of pyarrow docs, it looks like you don't have to deal with the avro type unioning at schema time, but nulls in arrays vs scalar values looks like it has some differences.
pyarrow.field("field05", pyarrow.float64()), | ||
pyarrow.field("field06", pyarrow.float64()), | ||
pyarrow.field("field07", module_under_test.pyarrow_numeric()), | ||
pyarrow.field("field08", pyarrow.bool_()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the underscore signify anything, or just an arrow oddity in type representation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing underscore is the Python convention for avoiding name collisions with Python built-in functions such as bool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving based on in person conversation. There's possible followup around null handling, but arrow exhibits more nullable-by-default behaviors rather than needing explicit care. BigQuery doesn't allow null elements in an array in results/tables, so the difference in arrow's handling isn't relevant here.
An Arrow
Table
supports a richer set of types than a pandasDataFrame
,and is the basis of many data analysis systems. It can be used in
conjunction with pandas through the
Table.to_pandas()
method or thepandas extension types provided by the
fletcher
package.Towards #5204.