Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined features should be rejected when being fetched via get_historical_features / get_online_features #2576

Closed
benjamintanweihao opened this issue Apr 20, 2022 · 0 comments · Fixed by #2665

Comments

@benjamintanweihao
Copy link
Contributor

benjamintanweihao commented Apr 20, 2022

Context

I want to create versioned feature views. Through various versions, features could be added or removed.

Expected Behavior

When doing feast.get_historical_features, features that are not defined should be rejected.

Current Behavior

The features get returned even though they have not been defined.

Steps to reproduce

  1. Initialize a new feast repository
  2. Define the features:
driver_hourly_stats = FileSource(
    path="/home/benjamintan/workspace/feast-workflow-demo/feature_repo/data/driver_stats.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)

driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)

driver_hourly_stats_view_v1 = FeatureView(
    name="driver_hourly_stats_v1",
    entities=["driver_id"],
    ttl=timedelta(days=1),
    schema=[
        Field(name="avg_daily_trips", dtype=Int64),
    ],
    online=True,
    batch_source=driver_hourly_stats,
    tags={},
)

driver_hourly_stats_view_v2 = FeatureView(
    name="driver_hourly_stats_v2",
    entities=["driver_id"],
    ttl=timedelta(days=1),
    schema=[
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64),
    ],
    online=True,
    batch_source=driver_hourly_stats,
    tags={},
)
  1. feast apply
  2. Querying Feast:
fs = FeatureStore(repo_path='.')

entity_df = pd.DataFrame(
    {
        "event_timestamp": [
            pd.Timestamp(dt, unit="ms", tz="UTC").round("ms")
            for dt in pd.date_range(
                start=datetime.now() - timedelta(days=3),
                end=datetime.now(),
                periods=3,
            )
        ],
        "driver_id": [1001, 1002, 1003],
    }
)

I do not expect the following to work:

# THIS PART SHOULDN'T WORK
features_wrong = ['driver_hourly_stats_v1:conv_rate',  # doesn't exist in V1
                              'driver_hourly_stats_v1:acc_rate',   # doesn't exist in V1
                              'driver_hourly_stats_v1:avg_daily_trips',
                             ]

hist_features_wrong = fs.get_historical_features(
    entity_df=entity_df,
    features=features_wrong,
)

But I do get results:

                   event_timestamp  driver_id  ...  acc_rate  avg_daily_trips
0 2022-04-17 09:35:35.658000+00:00       1001  ...  0.536431            742.0
1 2022-04-18 21:35:35.658000+00:00       1002  ...  0.496901            678.0
2 2022-04-20 09:35:35.658000+00:00       1003  ...       NaN     

I do not expect this to work because driver_hourly_stats_v1:conv_rate and driver_hourly_stats_v1:acc_rate were not defined in the driver_hourly_stats_view_v1 FeatureView.

And just to double check that driver_hourly_stats_v1 only has avg_daily_trips defined:

➜ feast feature-views describe driver_hourly_stats_v1
spec:
  name: driver_hourly_stats_v1
  entities:
  - driver_id
  features:
  - name: avg_daily_trips
    valueType: INT64
  ttl: 86400s

Specifications

  • Version: 0.20 (tested this on 0.19 and 0.18)
  • Platform: Linux
  • Subsystem: Ubuntu

Possible Solution

The list of features being passed in should be checked against the registry. Currently the feature view name and feature name pairs are not validated together. Here's an example that modifies get_historical_features:

    @log_exceptions_and_usage
    def get_historical_features(
        self,
        entity_df: Union[pd.DataFrame, str],
        features: Union[List[str], FeatureService],
        full_feature_names: bool = False,
    ) -> RetrievalJob:

        # Build a dictionary of feature view names -> feature names (not sure if this function already exists ...)
        fv_name_features = dict([(fv.name, [f.name.split('-')[0] for f in fv.features]) for fv in self.list_feature_views()])
        
        # Check that input features are found in the `fv_name_features` dictionary
        feature_views_not_found = []
        for feature in features:
            k, v = feature.split(":")
            if v not in fv_name_features[k]:
                feature_views_not_found.append(f'{k}:{v}')

        if feature_views_not_found:
            raise FeatureViewNotFoundException(', '.join(feature_views_not_found))

This returns:

feast.errors.FeatureViewNotFoundException: Feature view driver_hourly_stats_v1:conv_rate, driver_hourly_stats_v1:acc_rate does not exist

This doesn't handle the case when a FeatureService is passed in but it shouldn't be too hard.

This should also apply to get_online_features.

@benjamintanweihao benjamintanweihao changed the title Undefined features should be rejected when being fetched via get_historical_features Undefined features should be rejected when being fetched via get_historical_features / get_online_features Apr 20, 2022
@adchia adchia self-assigned this Apr 20, 2022
adchia pushed a commit to chhabrakadabra/feast that referenced this issue May 11, 2022
Reject undefined features when using `get_historical_features` or
`get_online_features`.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants