Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

Latest commit

 

History

History
37 lines (27 loc) · 2.02 KB

training_data.mdx

File metadata and controls

37 lines (27 loc) · 2.02 KB
title description icon
Model Training

Context: Feast has some really useful documentation around how point-in-time joins work.

For model training, you want to get all the offline features and real-time features that are used in production.

WyvernAPI provides the get_historical_features function to retrieve a number of features that correspond to a specific entity (or set of entities) at the time a user request happened. It not only covers the batch features that correspond to the entity (or entity set), which is what feast's get_historical_features does, but also covers the realt time features that are logged by Wyvern.

For example, let’s say we had a user request like this:

{
    "request_id": "example_request_id".
    "api_source": "/api/v1/product-search-ranking"
    "candidates": [{"product_id": "p_1"}, {"product_id": "p_2"}],
		"user": {"user_id": "u_1"},
		"query": {"query": "chocolate"}
}

The request data in a notebook may look like this, and all of this information would be supplied to get_historical_features:

Input Dataframe (entities):

timestamp request product brand user query was_clicked was_ordered
2023-07-07T22:01:00 example_request_id p_1 b_1 u_1 chocolate 0 0
2023-07-07T22:01:00 example_request_id p_2 b_1 u_1 chocolate 1 0

The goal of the get_historical_features call is to retrieve all of the features that were available to the machine learning model at the time of the above request. Specifically for this example input, it retrieves all requested product, user, query, and combination features, as they were at July 7th, 2023, 10:01pm

Besides this input dataframe, the list of features has to be passed to the get_historical_features call as well.