Feature engineering in multiple contexts examples #94

skrawcz · 2023-03-04T05:27:53Z

Adds feature engineering in multiple examples.

This adds two scenarios:

one where there is no feature store, and you want to recompute features in two places.
one where this is a feature store, however the feature store only stores "raw" data -- so you want to use that
as the place to get data from to compute features for input to the model.

In the process, this fixes #93, as this issue was found while building out this example.

Changes

adds examples
makes extract* work with async functions.

How I tested this

Runs locally.

Notes

This isn't an exhaustive set of examples on this topic and how you could use Hamilton. For example this omits talking about:

Streaming settings. Though arguably reusing Hamilton functions would be simpler in that context.
How to ask Hamilton what features are needed as input to know what to request from the feature store. With tags, and querying the DAG at the start of the app, you could dynamically ask Hamilton what's required and then only go to the
feature store for that data. But I thought that might be a little too complex, so I leave it on the TODO list.

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

elijahbenizzy

The first question everyone is going to be asking is "is it slow to be dataframes when its a single item", and the answer is "Not even close to as slow as that API call we're making".

So, let's add:

A section on why we're using pd.Series/pd.DataFrame and how that helps us
Alternatives (see idea at @map decorator #95)

examples/feature_engineering_multiple_contexts/scenario_1/README.md

examples/feature_engineering_multiple_contexts/scenario_1/fastapi_server.py

examples/feature_engineering_multiple_contexts/scenario_2/online_loader.py

hamilton/function_modifiers/expanders.py

Assumptions: - the API request can provide the same raw data that training provides. - if you have aggregation features, you need to store the training result for them, and provide them to the online side. This example shows how one might use Hamilton to compute features in an offline and online fashion. The assumption here is that the request passed into the API has all the raw data required to compute features. This example also shows how one might "override" some values that are required for computing features, in this example they are `age_mean` and `age_std_dev`. This can be required when you computing aggregation features does not make sense at inference time.

This is a less than ideal solution, but basically if the function being wrapped is a coroutine, then make the wrapper async as well. This duplicates a lot of code. I'm sure there's a more succinct way to do this, but because I'm time pressed I'm doing the more verbose solution. Note: this doesn't fix it for all decorators, just the extract* ones. Adds async tests to double check and ensure that things work as expected.

To make it clear that this isn't a generic feature engineering example, but one about doing it in multiple contexts.

This example here is contrived again. However it should illustrate how you can replace getting data with Hamilton quite easily. The example wont fit the needs of everyone, since people's needs will likely fall in between scenario 1 and scenario 2, but hopefully it provides them enough context to get going with feature engineering and Hamilton.

This updates the sphinx docs with a high level overview of feature engineering, for various contexts, with links to examples.

So that it's clear there are caveats that people should think about when doing feature engineering. Adds FAQ section, with single Feast question.

skrawcz marked this pull request as ready for review March 4, 2023 05:28

skrawcz requested a review from elijahbenizzy March 4, 2023 05:28

elijahbenizzy mentioned this pull request Mar 4, 2023

@map decorator #95

Closed

elijahbenizzy reviewed Mar 4, 2023

View reviewed changes

skrawcz added 5 commits March 4, 2023 13:08

Renames feature_engineering to feature_engineering_multiple_contexts

d8ff208

To make it clear that this isn't a generic feature engineering example, but one about doing it in multiple contexts.

Adds more documentation and explanation for feature engineering

9522f17

This updates the sphinx docs with a high level overview of feature engineering, for various contexts, with links to examples.

skrawcz force-pushed the feature-engineering-examples branch from 07b7000 to 6263e3e Compare March 4, 2023 21:32

Fixes some typos and adds clarifications around feature-eng docs

4944623

So that it's clear there are caveats that people should think about when doing feature engineering. Adds FAQ section, with single Feast question.

skrawcz force-pushed the feature-engineering-examples branch from 6263e3e to 4944623 Compare March 4, 2023 21:54

skrawcz merged commit 0254a4e into main Mar 4, 2023

skrawcz deleted the feature-engineering-examples branch March 4, 2023 22:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature engineering in multiple contexts examples #94

Feature engineering in multiple contexts examples #94

skrawcz commented Mar 4, 2023

elijahbenizzy left a comment

Feature engineering in multiple contexts examples #94

Feature engineering in multiple contexts examples #94

Conversation

skrawcz commented Mar 4, 2023

Changes

How I tested this

Notes

Checklist

elijahbenizzy left a comment

Choose a reason for hiding this comment