Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FeatureViewProjection instead of FeatureView in ODFV #2186

Merged
merged 12 commits into from
Jan 7, 2022

Conversation

judahrand
Copy link
Member

@judahrand judahrand commented Jan 4, 2022

Signed-off-by: Judah Rand 17158624+judahrand@users.noreply.github.com

What this PR does / why we need it:
Before this PR if an ODFV feature is requested then all Features from FeatureView dependencies must be retrieved. This is not performant for FeatureViews which are very wide but with few required features.

This change allows the following spelling to require only a subset of features:

# Define a request data source which encodes features / information only 
# available at request time (e.g. part of the user initiated HTTP request)
input_request = RequestDataSource(
    name="vals_to_add",
    schema={
        "val_to_add": ValueType.INT64,
        "val_to_add_2": ValueType.INT64
    }
)

# Use the input data and feature view features to create new features
@on_demand_feature_view(
   inputs={
       'driver_hourly_stats': driver_hourly_stats_view[['conv_rate']],
       'vals_to_add': input_request
   },
   features=[
     Feature(name='conv_rate_plus_val1', dtype=ValueType.DOUBLE),
     Feature(name='conv_rate_plus_val2', dtype=ValueType.DOUBLE)
   ]
)
def transformed_conv_rate(features_df: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame()
    df['conv_rate_plus_val1'] = (features_df['conv_rate'] + features_df['val_to_add'])
    df['conv_rate_plus_val2'] = (features_df['conv_rate'] + features_df['val_to_add_2'])
    return df

This can dramatically reduce serving latency of ODFVs in some cases.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

- Use `FeatureViewProjections` to determine OnDemandFeatureView dependencies.
- Fix case where changes to OnDemandFeatureView would not be applied to FeatureStore.
- Reduce serving latency when features from both FeatureViews and OnDemandFeatureViews are requested.
- Improve performance of `get_online_features`.

@judahrand judahrand requested a review from a team as a code owner January 4, 2022 14:32
@judahrand judahrand requested review from tsotnet and removed request for a team January 4, 2022 14:32
@judahrand judahrand added kind/feature New feature or request and removed size/M labels Jan 4, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jan 4, 2022

Codecov Report

Merging #2186 (153ca13) into master (4c32d75) will decrease coverage by 0.10%.
The diff coverage is 91.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2186      +/-   ##
==========================================
- Coverage   84.55%   84.44%   -0.11%     
==========================================
  Files         104      104              
  Lines        8286     8255      -31     
==========================================
- Hits         7006     6971      -35     
- Misses       1280     1284       +4     
Flag Coverage Δ
integrationtests 74.20% <90.35%> (-0.41%) ⬇️
unittests 58.66% <64.03%> (-0.26%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/cli.py 38.65% <0.00%> (+0.11%) ⬆️
sdk/python/feast/feature_view.py 90.78% <ø> (ø)
sdk/python/feast/on_demand_feature_view.py 95.20% <83.33%> (-4.05%) ⬇️
sdk/python/feast/feature_store.py 91.51% <96.05%> (+0.02%) ⬆️
sdk/python/feast/infra/passthrough_provider.py 100.00% <100.00%> (ø)
sdk/python/feast/infra/provider.py 90.09% <100.00%> (ø)
sdk/python/feast/online_response.py 94.73% <100.00%> (+7.41%) ⬆️
.../integration/online_store/test_online_retrieval.py 100.00% <100.00%> (ø)
sdk/python/feast/type_map.py 60.93% <0.00%> (-6.25%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c32d75...153ca13. Read the comment docs.

Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
map<string, OnDemandInput> inputs = 4;

// Map of projections for the inputs of this feature view.
// Keys must also be present in `inputs`
map<string, FeatureViewProjection> projections = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we add FeatureViewProjection in the OnDemandInput oneof? And remove the feature view if you so wish?

Copy link
Member Author

@judahrand judahrand Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up avoiding doing this because this retrieves the entities of the backing FeatureViews.

for backing_fv in feature_view.inputs.values():

This can't be done if a FeatureView Projection is store instead of the FeatureView.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm should we look up the entites expicitly via the feature view name, if it's just in that method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be worried to have two fields in the proto which have this weird dependency between each other, and would prefer a single field - and it feels like we should be able to get away with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'll take another pass at doing that tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ended up being quite a bit more involved.

The changes in this PR definitely reduce the number of calls to the OnlineStore in the simple case of Features being requested from 'an ODFV which depends on a projection of a FV' and from the unprojected version of that FV (1 call rather than 2 and only the required features).

However, when multiple different Projections are used in a FeatureService I'm not 100% we won't end up making multiple calls to the OnlineStore for each Projection + the projection used by the ODFV.

I'm also not 100% we catch the case where 2 ODFVs used different projections of the same FV.

But I do think this approach of working out what we need before starting to request things from the OnlineStore is better than it was.

@achals achals self-assigned this Jan 4, 2022
@judahrand judahrand requested a review from achals January 4, 2022 18:50
…ions in ODFVs.

Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
…ection

Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
@judahrand
Copy link
Member Author

judahrand commented Jan 6, 2022

@achals I've tacked on some additional changes here which improve the performance of get_online_features quite significantly for large numbers of entity_rows. I'm seeing 2-3x speed ups for parts of the serialization of EntityKeyProtos and adding the entity_rows to the results_rows.

It looks like there may still be a few issues but hopefully will get them solved quickly.

Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
@judahrand
Copy link
Member Author

@achals All the tests other than the ones that are failing on master are now fixed. It looks like I've actually fixed a handful of bugs which required some alternations to the tests as you can see. This is because the customer entity is actually defined as ValueType.STRING but the tests treated it as if it were ValueType.INT64.

Do you think you could give this another review and look at merging?

@achals
Copy link
Member

achals commented Jan 7, 2022

Nit, release notes are a bit unclear:

- Update ODFV in UDF code changes.
- Reduce number of calls to OnlineStore ODFVs are used.

@judahrand
Copy link
Member Author

Nit, release notes are a bit unclear:

- Update ODFV in UDF code changes.
- Reduce number of calls to OnlineStore ODFVs are used.

Reworded those two.

Comment on lines +91 to +110
def __eq__(self, other):
if not super().__eq__(other):
return False

if (
not self.input_feature_view_projections
== other.input_feature_view_projections
):
return False

if not self.input_request_data_sources == other.input_request_data_sources:
return False

if not self.udf.__code__.co_code == other.udf.__code__.co_code:
return False

return True

def __hash__(self):
return super().__hash__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why these needed to be added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default only the name and features properties are checked. This seemed obviously wrong to me... You could change the inputs and UDF but keep the outputs and name the same.

def __eq__(self, other):
if not isinstance(other, BaseFeatureView):
raise TypeError(
"Comparisons should only involve BaseFeatureView class objects."
)
if self.name != other.name:
return False
if sorted(self.features) != sorted(other.features):
return False
return True

Then is __eq__ is overridden __hash__ also has to be! Annoying but true.
https://docs.python.org/3/reference/datamodel.html#object.__hash__

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I remember that __hash__ has to be overridden too - okay cool makes sense.

Comment on lines +163 to +165
elif on_demand_input.WhichOneof("input") == "feature_view_projection":
inputs[input_name] = FeatureViewProjection.from_proto(
on_demand_input.feature_view_projection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that fv inputs will be converted from a feature view to a projection when feast apply runs for existing feature repos?

I wonder what the output of feast plan looks like in that case, if anything has changed. If the change does appear, we probably should keep that in mind and update the release notes to have users expect it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe they should be updated the next time that feast apply is run. The diff is 'large' but makes sense. I'd also suggest maybe this isn't a big deal as the ODFVs are still labelled as Alpha (and I think will need some iteration as calling either to_dict or to_df for them isn't terribly performant).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fair I don't really want to block this change on the perceived change, but maybe it should be something we call out in the release notes (or the blog post about the release).

@@ -99,65 +88,3 @@ def to_df(self) -> pd.DataFrame:
"""

return pd.DataFrame(self.to_dict())


def _infer_online_entity_rows(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love seeing red

Copy link
Member

@achals achals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achals, judahrand

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit f969e53 into feast-dev:master Jan 7, 2022
@judahrand judahrand deleted the odfv-feature-projection branch January 19, 2022 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants