-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support RANGE in queries Part 2: Arrow #1868
Changes from 39 commits
5dd6b24
74fb1d3
a67e1aa
75a9855
53635bc
5dfd65e
73a5001
6a735ca
d54336a
8dc4ae5
1b2d68f
6f93d8e
005d409
839eafe
58a0e18
cc12e1b
691710c
6d5ce1b
3ddfbf8
b7c42ea
f54a1d7
b716f98
c46c65c
b8401d2
4b96ee8
2b7095d
790b3d1
0be9fb6
b7f3779
edc8b5c
2a0d518
a0d01f7
2c9782f
40afa27
203e0c0
bb17b3b
e58739a
c3db3c9
2211dd0
e2a9552
0357b6f
4c20bd7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -142,6 +142,17 @@ def bq_to_arrow_struct_data_type(field): | |
return pyarrow.struct(arrow_fields) | ||
|
||
|
||
def bq_to_arrow_range_data_type(field): | ||
if field is None: | ||
raise ValueError( | ||
"Range element type cannot be None, must be one of " | ||
"DATE, DATETIME, or TIMESTAMP" | ||
) | ||
element_type = field.element_type.upper() | ||
arrow_element_type = _pyarrow_helpers.bq_to_arrow_scalars(element_type)() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need to do validation here? None-check? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great point, I will add a None-check here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added it, as well as the unit tests in |
||
return pyarrow.struct([("start", arrow_element_type), ("end", arrow_element_type)]) | ||
|
||
|
||
def bq_to_arrow_data_type(field): | ||
"""Return the Arrow data type, corresponding to a given BigQuery column. | ||
|
||
|
@@ -160,6 +171,9 @@ def bq_to_arrow_data_type(field): | |
if field_type_upper in schema._STRUCT_TYPES: | ||
return bq_to_arrow_struct_data_type(field) | ||
|
||
if field_type_upper == "RANGE": | ||
return bq_to_arrow_range_data_type(field.range_element_type) | ||
|
||
data_type_constructor = _pyarrow_helpers.bq_to_arrow_scalars(field_type_upper) | ||
if data_type_constructor is None: | ||
return None | ||
|
@@ -220,6 +234,9 @@ def default_types_mapper( | |
datetime_dtype: Union[Any, None] = None, | ||
time_dtype: Union[Any, None] = None, | ||
timestamp_dtype: Union[Any, None] = None, | ||
range_date_dtype: Union[Any, None] = None, | ||
range_datetime_dtype: Union[Any, None] = None, | ||
range_timestamp_dtype: Union[Any, None] = None, | ||
): | ||
"""Create a mapping from pyarrow types to pandas types. | ||
|
||
|
@@ -274,6 +291,22 @@ def types_mapper(arrow_data_type): | |
elif time_dtype is not None and pyarrow.types.is_time(arrow_data_type): | ||
return time_dtype | ||
|
||
elif pyarrow.types.is_struct(arrow_data_type): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to handle structs more generally here, or is that logic elsewhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question! Indeed, our types mapper function doesn't seem to do any conversion for STRUCT or ARRAY. This function is used as the parameter |
||
if range_datetime_dtype is not None and arrow_data_type.equals( | ||
range_datetime_dtype.pyarrow_dtype | ||
): | ||
return range_datetime_dtype | ||
|
||
elif range_date_dtype is not None and arrow_data_type.equals( | ||
range_date_dtype.pyarrow_dtype | ||
): | ||
return range_date_dtype | ||
|
||
elif range_timestamp_dtype is not None and arrow_data_type.equals( | ||
range_timestamp_dtype.pyarrow_dtype | ||
): | ||
return range_timestamp_dtype | ||
|
||
return types_mapper | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we change this to indicate the element type is unsupported, rather than "field type"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I'll change it to be consistent with the name of the field.