Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

direct data ingestion into Online store #1939

Merged
merged 8 commits into from
Oct 25, 2021

Conversation

vas28r13
Copy link

Signed-off-by: Vitaly Sergeyev vsergeyev@better.com

Issue docs: #1863

Directly ingest data into the Online store.

Use Case
a model process/service will get a stream of data and would need to keep track of the new/updated features in the Online feature store

Code example

feature_store_object.ingest_df("my_feature_view_name", dataframe_object)

Process 1

# load feature store
feature_store = _get_feature_store()
for feature_data in _kafka_consume():
    # clean the data
    features_df = _clean_features(feature_data)

    # ingest the data directly into the online store
    feature_store.ingest_df("my_feature_view_name", features_df)

Process 2 (materialization job)

feature_store = _get_feature_store()

# materialization from Offline sources
feature_store.materialize_incremental(end_date=datetime.now(timezone.utc))

@feast-ci-bot
Copy link
Collaborator

@vas28r13: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@feast-ci-bot
Copy link
Collaborator

Hi @vas28r13. Thanks for your PR.

I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@felixwang9817
Copy link
Collaborator

/ok-to-test

@felixwang9817
Copy link
Collaborator

/kind feature

@codecov-commenter
Copy link

codecov-commenter commented Oct 14, 2021

Codecov Report

Merging #1939 (161be8d) into master (e8a1519) will decrease coverage by 0.18%.
The diff coverage is 97.05%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1939      +/-   ##
==========================================
- Coverage   82.26%   82.08%   -0.19%     
==========================================
  Files          96       99       +3     
  Lines        7669     7958     +289     
==========================================
+ Hits         6309     6532     +223     
- Misses       1360     1426      +66     
Flag Coverage Δ
integrationtests 74.14% <50.00%> (-0.01%) ⬇️
unittests 59.77% <88.23%> (+0.38%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/infra/provider.py 89.34% <50.00%> (-0.66%) ⬇️
sdk/python/feast/feature_store.py 91.31% <100.00%> (-3.24%) ⬇️
sdk/python/feast/flags.py 100.00% <100.00%> (ø)
sdk/python/feast/flags_helper.py 90.47% <100.00%> (+1.00%) ⬆️
sdk/python/feast/infra/passthrough_provider.py 100.00% <100.00%> (ø)
...sts/integration/registration/test_feature_store.py 99.37% <100.00%> (+0.05%) ⬆️
...ts/integration/feature_repos/repo_configuration.py 93.18% <0.00%> (-6.82%) ⬇️
sdk/python/tests/conftest.py 91.66% <0.00%> (-3.65%) ⬇️
sdk/python/feast/infra/utils/aws_utils.py 70.37% <0.00%> (-0.65%) ⬇️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e8a1519...161be8d. Read the comment docs.

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

@@ -766,6 +766,19 @@ def tqdm_builder(length):
feature_view, self.project, start_date, end_date
)

@log_exceptions_and_usage
def ingest_df(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind calling this write_to_online_store instead? ingest can be confusing for folks who think this might also mean writing to the offline store

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name is fine for the provider since i think this naming will work for those once we want to ingest to both offline and online stores. But as an external facing API, should be more clear to start.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, mind making this an experimental feature?

e.g. like https://github.com/feast-dev/feast/blob/master/sdk/python/feast/feature_store.py#L399

sdk/python/feast/feature_store.py Show resolved Hide resolved
sdk/python/feast/infra/provider.py Show resolved Hide resolved
@@ -766,6 +766,19 @@ def tqdm_builder(length):
feature_view, self.project, start_date, end_date
)

@log_exceptions_and_usage
def ingest_df(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, mind making this an experimental feature?

e.g. like https://github.com/feast-dev/feast/blob/master/sdk/python/feast/feature_store.py#L399

@vas28r13
Copy link
Author

/retest

@@ -424,6 +426,46 @@ def test_apply_remote_repo():
)


@pytest.mark.parametrize(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine for now. mind adding a TODO to move this to test_universal_online? that parametrizes across online stores

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a quick request changes to add the TODO i mentioned (even better if you move it to test_universal_online as per https://docs.feast.dev/how-to-guides/adding-or-reusing-tests

@felixwang9817
Copy link
Collaborator

Hey @vas28r13, thanks for this PR. We just fixed the issue with the linter that's been blocking all PRs for the last two days. Would you mind rebasing your changes on master and then force pushing your changes up? Thanks!

@feast-ci-bot feast-ci-bot removed the lgtm label Oct 25, 2021
Vitaly Sergeyev added 6 commits October 25, 2021 15:11
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
@feast-ci-bot feast-ci-bot removed the lgtm label Oct 25, 2021
Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, vas28r13

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 6728f2a into feast-dev:master Oct 25, 2021
@vas28r13 vas28r13 deleted the 1863-direct-ingestion branch October 26, 2021 01:57
@@ -809,6 +809,26 @@ def tqdm_builder(length):
feature_view, self.project, start_date, end_date
)

@log_exceptions_and_usage
def write_to_online_store(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vas28r13 thx for adding this capability, we use this capability so we can write inference data in a real-time flow. However, the get_feature_view does not use the allow_cache option of the registry. This causes a lookup to refresh the registry for every call to write_to_online_store. This blocks the new capability from being used real-time. Would it make sense to add a PR that fixes that? thx...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind making an issue so we can track this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants