Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer min and max timestamps from entity_df to limit data read from BQ source #1665

Merged
merged 11 commits into from
Jul 8, 2021
Merged
20 changes: 17 additions & 3 deletions sdk/python/feast/infra/offline_stores/bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,12 +131,23 @@ def get_historical_features(
feature_refs, feature_views, registry, project
)

# TODO: Infer min_timestamp and max_timestamp from entity_df
# Infer min_timestamp and max_timestamp from entity_df
if isinstance(entity_df, pandas.DataFrame):
min_timestamp = datetime.fromisoformat(
min(entity_df[[DEFAULT_ENTITY_DF_EVENT_TIMESTAMP_COL]])
)
max_timestamp = datetime.fromisoformat(
max(entity_df[DEFAULT_ENTITY_DF_EVENT_TIMESTAMP_COL])
)
else:
min_timestamp = datetime.now() - timedelta(days=365)
max_timestamp = datetime.now() + timedelta(days=1)
Mwad22 marked this conversation as resolved.
Show resolved Hide resolved

# Generate the BigQuery SQL query from the query context
query = build_point_in_time_query(
query_context,
min_timestamp=datetime.now() - timedelta(days=365),
max_timestamp=datetime.now() + timedelta(days=1),
min_timestamp=min_timestamp,
max_timestamp=max_timestamp,
left_table_query_string=str(table.reference),
entity_df_event_timestamp_col=entity_df_event_timestamp_col,
)
Expand Down Expand Up @@ -496,6 +507,9 @@ def _get_bigquery_client():
{{ feature }} as {{ featureview.name }}__{{ feature }}{% if loop.last %}{% else %}, {% endif %}
{% endfor %}
FROM {{ featureview.table_subquery }}
WHERE {{ featureview.event_timestamp_column }} <= '{{ max_timestamp }}' {% if featureview.ttl == 0 %}{% else %}
Mwad22 marked this conversation as resolved.
Show resolved Hide resolved
AND {{ featureview.event_timestamp_column }} >= Timestamp_sub(TIMESTAMP '{{ min_timestamp }}', interval {{ featureview.ttl }} second)
{% endif %}
),

{{ featureview.name }}__base AS (
Expand Down