-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feast BigQuery offline store gets wrong argument types for event_timestamp
#2532
Comments
@KarolisKont could you also share your feature view definition and the schema of the offline store bigquery table? |
Of course, I can @achals, but have one issue this Bigquery table has 109 fields 😅 . We don't use all fields in the feature view. So perhaps for you will be enough just those fields that are used feature view only? Also perhaps forgot to mention that when I dropped the |
That's very interesting, yeah I just need the fields used in the feature view. Can you also provide the output of the table schema, after getting the table using the python API? Looking at this code snippet an example: https://googleapis.dev/python/bigquery/latest/usage/tables.html#getting-a-table |
Sorry for the late response 😊 , but here is the info: from datetime import timedelta
from feast import BigQuerySource, Feature, FeatureView, ValueType
from marketplace_sandbox.config import BQ_POC_RERANKER_TABLE_REFERENCE
from marketplace_sandbox.entities import item_user
from marketplace_sandbox.utils import generate_data_bq_query
FEATURE_VIEW_NAME = "item_user_stats_v1"
entities = [item_user]
features = [
Feature(
name="total_items_sold",
dtype=ValueType.INT32,
labels={"owner": "VMIP"},
),
Feature(
name="is_high_risk_seller",
dtype=ValueType.INT32,
labels={"owner": "VMIP"},
),
]
batch_source = BigQuerySource(
name=f"{BQ_POC_RERANKER_TABLE_REFERENCE}.{FEATURE_VIEW_NAME}",
query=generate_data_bq_query(
features=features,
entities=entities,
tables=[BQ_POC_RERANKER_TABLE_REFERENCE],
),
event_timestamp_column="event_time",
)
item_user_stats_v1_fv = FeatureView(
name=FEATURE_VIEW_NAME,
entities=[entity.name for entity in entities],
ttl=timedelta(weeks=4),
features=features,
batch_source=batch_source,
owner="VMIP",
) In this feature view used entity definition: from feast import Entity, ValueType
item_user = Entity(
name="item_user",
join_key="item_user_id",
value_type=ValueType.INT64,
owner="VMIP",
) In this feature view helper function to construct query: from typing import Iterable
from feast import Entity, Feature
def generate_data_bq_query(
features: Iterable[Feature],
entities: Iterable[Entity],
tables: Iterable[str],
) -> str:
"""BigQuery SQL query to execute to generate data for feature view.
Args:
features (Iterable[Feature]): all used features in feature view.
entities (Iterable[Entity]): all used entities in feature view.
tables (Iterable[str]): all BigQuery tables that contain used features
and enities.
Returns:
str: BigQuery SQL query.
"""
return (
f"SELECT event_time, {', '.join(f.name for f in features)}, "
f"{', '.join(entity.join_key for entity in entities)} "
f"FROM `{', '.join(table for table in tables)}`"
) This feature view schema from BigQuery UI: Field name | Type | Mode
item_user_id | INTEGER | NULLABLE
total_items_sold | INTEGER | NULLABLE
is_high_risk_seller | INTEGER | NULLABLE
event_time | TIMESTAMP | NULLABLE The output of table schema using python API ( Table schema: [SchemaField('item_user_id', 'INTEGER', 'NULLABLE', None, (), None), SchemaField('event_time', 'TIMESTAMP', 'NULLABLE', None, (), None), SchemaField('total_items_sold', 'INTEGER', 'NULLABLE', None, (), None), SchemaField('is_high_risk_seller', 'INTEGER', 'NULLABLE', None, (), None)] The output of table schema using python API ( able schema: [SchemaField('item_user_id', 'INTEGER', 'NULLABLE', None, (), None), SchemaField('event_time', 'TIMESTAMP', 'NULLABLE', None, (), None), SchemaField('total_items_sold', 'INTEGER', 'NULLABLE', None, (), None), SchemaField('is_high_risk_seller', 'INTEGER', 'NULLABLE', None, (), None)] |
OK seems that my proposed solution was valid for this particular issue. Thanks @achals 🙇 |
Expected Behavior
Retrieving training data from an offline store that is BigQuery by using
get_historical_features
method from Python SDK.Current Behavior
Getting this error when running
get_historical_features
:The error happens on a query where it compares event timestamps (last line):
Steps to reproduce
Use Feast project with GCP provider, where the offline store is BigQuery. After installing feast[gcp] (v0.19.4) packages you should get this
google-cloud-bigquery==3.0.1
dependency version. I think it should brake also with newer versions ofgoogle-cloud-bigquery
if it will be available.Specifications
Possible Solution
Quick solution: add upper bound of
<3.0.0
here.A better solution: check what is causing this event timestamp type mismatch and fix it 😊 .
The text was updated successfully, but these errors were encountered: