-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature engineering in multiple contexts examples #94
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first question everyone is going to be asking is "is it slow to be dataframes when its a single item", and the answer is "Not even close to as slow as that API call we're making".
So, let's add:
- A section on why we're using pd.Series/pd.DataFrame and how that helps us
- Alternatives (see idea at
@map
decorator #95)
examples/feature_engineering_multiple_contexts/scenario_1/fastapi_server.py
Show resolved
Hide resolved
examples/feature_engineering_multiple_contexts/scenario_1/fastapi_server.py
Show resolved
Hide resolved
examples/feature_engineering_multiple_contexts/scenario_2/online_loader.py
Show resolved
Hide resolved
examples/feature_engineering_multiple_contexts/scenario_2/online_loader.py
Show resolved
Hide resolved
Assumptions: - the API request can provide the same raw data that training provides. - if you have aggregation features, you need to store the training result for them, and provide them to the online side. This example shows how one might use Hamilton to compute features in an offline and online fashion. The assumption here is that the request passed into the API has all the raw data required to compute features. This example also shows how one might "override" some values that are required for computing features, in this example they are `age_mean` and `age_std_dev`. This can be required when you computing aggregation features does not make sense at inference time.
This is a less than ideal solution, but basically if the function being wrapped is a coroutine, then make the wrapper async as well. This duplicates a lot of code. I'm sure there's a more succinct way to do this, but because I'm time pressed I'm doing the more verbose solution. Note: this doesn't fix it for all decorators, just the extract* ones. Adds async tests to double check and ensure that things work as expected.
To make it clear that this isn't a generic feature engineering example, but one about doing it in multiple contexts.
This example here is contrived again. However it should illustrate how you can replace getting data with Hamilton quite easily. The example wont fit the needs of everyone, since people's needs will likely fall in between scenario 1 and scenario 2, but hopefully it provides them enough context to get going with feature engineering and Hamilton.
This updates the sphinx docs with a high level overview of feature engineering, for various contexts, with links to examples.
skrawcz
force-pushed
the
feature-engineering-examples
branch
from
March 4, 2023 21:32
07b7000
to
6263e3e
Compare
So that it's clear there are caveats that people should think about when doing feature engineering. Adds FAQ section, with single Feast question.
skrawcz
force-pushed
the
feature-engineering-examples
branch
from
March 4, 2023 21:54
6263e3e
to
4944623
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds feature engineering in multiple examples.
This adds two scenarios:
as the place to get data from to compute features for input to the model.
In the process, this fixes #93, as this issue was found while building out this example.
Changes
How I tested this
Notes
This isn't an exhaustive set of examples on this topic and how you could use Hamilton. For example this omits talking about:
feature store for that data. But I thought that might be a little too complex, so I leave it on the TODO list.
Checklist