Add `to_arrow` to get a `pyarrow.Table` from query results. #8609

tswast · 2019-07-03T23:36:19Z

An Arrow Table supports a richer set of types than a pandas DataFrame,
and is the basis of many data analysis systems. It can be used in
conjunction with pandas through the Table.to_pandas() method or the
pandas extension types provided by the fletcher package.

Towards #5204.

An Arrow `Table` supports a richer set of types than a pandas `DataFrame`, and is the basis of many data analysis systems. It can be used in conjunction with pandas through the `Table.to_pandas()` method or the pandas extension types provided by the `fletcher` package.

shollyman

None of the tests seem to deal with NULL interactions. Might be good to add something? From a brief read of pyarrow docs, it looks like you don't have to deal with the avro type unioning at schema time, but nulls in arrays vs scalar values looks like it has some differences.

shollyman · 2019-07-08T23:33:31Z

bigquery/tests/unit/test__pandas_helpers.py

+            pyarrow.field("field05", pyarrow.float64()),
+            pyarrow.field("field06", pyarrow.float64()),
+            pyarrow.field("field07", module_under_test.pyarrow_numeric()),
+            pyarrow.field("field08", pyarrow.bool_()),


Does the underscore signify anything, or just an arrow oddity in type representation?

Trailing underscore is the Python convention for avoiding name collisions with Python built-in functions such as bool.

shollyman

Approving based on in person conversation. There's possible followup around null handling, but arrow exhibits more nullable-by-default behaviors rather than needing explicit care. BigQuery doesn't allow null elements in an array in results/tables, so the difference in arrow's handling isn't relevant here.

This reverts commit f8b96d5.

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jul 3, 2019

tswast mentioned this pull request Jul 3, 2019

BigQuery: Add support for BigQuery Storage API Arrow format in to_dataframe and to_arrow. #8551

Merged

tswast added 2 commits July 3, 2019 16:43

_response is unused.

e04be8e

Add unit tests for to_arrow.

3f542db

tswast marked this pull request as ready for review July 8, 2019 22:13

tswast requested review from a team, shollyman and plamut July 8, 2019 22:13

Add comment for excluding pyarrow 0.14.0

3a5fdda

shollyman reviewed Jul 8, 2019

View reviewed changes

shollyman approved these changes Jul 9, 2019

View reviewed changes

tswast added 5 commits July 9, 2019 09:40

Test for nullable data in to_arrow.

b9cb3ed

Correct docstring for bq_to_arrow_schema.

3a59793

Bad wheels have been removed from PyPI.

f8b96d5

Add to_arrow to EmptyRowIterator.

ec2afe2

Revert "Bad wheels have been removed from PyPI."

9ae3572

This reverts commit f8b96d5.

tswast merged commit d5f5d24 into googleapis:master Jul 10, 2019

tswast deleted the issue5204-bq-tabledata.list-to_arrow branch July 10, 2019 01:31

tswast mentioned this pull request Jul 12, 2019

BigQuery: to_arrow() method similar to to_dataframe() #5204

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `to_arrow` to get a `pyarrow.Table` from query results. #8609

Add `to_arrow` to get a `pyarrow.Table` from query results. #8609

tswast commented Jul 3, 2019

shollyman left a comment

shollyman Jul 8, 2019

tswast Jul 9, 2019 •

edited

Loading

shollyman left a comment

Add to_arrow to get a pyarrow.Table from query results. #8609

Add to_arrow to get a pyarrow.Table from query results. #8609

Conversation

tswast commented Jul 3, 2019

shollyman left a comment

Choose a reason for hiding this comment

shollyman Jul 8, 2019

Choose a reason for hiding this comment

tswast Jul 9, 2019 • edited Loading

Choose a reason for hiding this comment

shollyman left a comment

Choose a reason for hiding this comment

Add `to_arrow` to get a `pyarrow.Table` from query results. #8609

Add `to_arrow` to get a `pyarrow.Table` from query results. #8609

tswast Jul 9, 2019 •

edited

Loading