feat: Support retrieval from multiple feature views with different join keys #2835

yongheng · 2022-06-22T10:22:03Z

What this PR does / why we need it:

Currently Java Feature Server doesn't support retrieval from multiple feature views with different join keys. For each gPRC request, OnlineServingServiceV2 calls OnlineRetriever once and only once. In this call the former sends all join keys in the original request to the latter, and the latter simply sorts and concatenates all join keys to make a Redis key.

This PR supports retrieval from multiple feature views with different join keys. For each gPRC request, it groups feature references by join keys and for each group it makes a call to OnlineRetriever.

Which issue(s) this PR fixes:

Fixes #

yongheng · 2022-06-22T17:25:02Z

/assign pyalex

yongheng · 2022-06-22T17:25:18Z

/assign adchia

yongheng · 2022-06-22T19:14:34Z

/cc @pyalex @adchia

achals · 2022-06-22T20:04:13Z

/ok-to-test

codecov-commenter · 2022-06-22T20:20:26Z

Codecov Report

Merging #2835 (f913090) into master (7d344b7) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2835      +/-   ##
==========================================
- Coverage   59.63%   59.62%   -0.01%     
==========================================
  Files         174      174              
  Lines       15493    15493              
==========================================
- Hits         9239     9238       -1     
- Misses       6254     6255       +1

Flag	Coverage Δ
unittests	`59.62% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
sdk/python/tests/conftest.py	`66.18% <0.00%> (-0.72%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d344b7...f913090. Read the comment docs.

pyalex · 2022-06-22T22:48:36Z

Hi @yongheng , can you please add integration test for this flow?

pyalex · 2022-06-22T22:52:19Z

java/serving/src/main/java/feast/serving/service/OnlineServingServiceV2.java

+
+    // Group feature references by join keys.
+    Map<String, List<FeatureReferenceV2>> groupNameToFeatureReferencesMap =
+        featureReferences.stream()


To speed up this part we might want to extract distinct feature views from all feature references. And then group feature views instead.

IIUC grouping by join keys results in the same or less groups (therefore same or more efficient) than grouping by feature view. The is because different feature views can have the same join keys. In L286, this.registryRepository.getEntitiesList(featureReference) internally gets feature view spec first, then gets entity names of the feature view spec, then we find join keys for the entity names.

Actually, I grouped by feature view at the beginning. Then I switched to grouping by join keys in the second commit of this PR, as an optimization.

yongheng · 2022-06-23T00:21:43Z

Hi @yongheng , can you please add integration test for this flow?

@pyalex Can you show me where are the current integration tests? Then I can add to there.

kevjumba · 2022-06-23T17:59:27Z

Hi @yongheng , can you please add integration test for this flow?

@pyalex Can you show me where are the current integration tests? Then I can add to there.

Hey @yongheng the integration tests should be any tests that have a tag for @pytest.mark.integration. For this particular test, just take a look at test_feature_views.py

achals · 2022-06-23T18:32:14Z

@yongheng You can find the java integration tests here: https://github.com/feast-dev/feast/tree/master/java/serving/src/test/java/feast/serving/it

Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>

yongheng · 2022-06-24T22:31:04Z

@achals @pyalex Integration test has been added and CI is green. Please take a look.

yongheng · 2022-06-28T04:53:56Z

/cc achals

yongheng · 2022-06-28T17:14:48Z

@achals @pyalex @kevjumba I've added an integration test. Please take a look. Thanks!

achals

/lgtm

feast-ci-bot · 2022-06-30T19:40:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achals, yongheng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [achals]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…in keys (feast-dev#2835) * feat: Support retrieving from multiple feature views Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com> * group by join keys instead of feature view Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com> * tolerate insufficient entities Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com> * mock registry.getEntityJoinKey Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com> * add integration test Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>

# [0.23.0](v0.22.0...v0.23.0) (2022-08-02) ### Bug Fixes * Add dummy alias to pull_all_from_table_or_query ([#2956](#2956)) ([5e45228](5e45228)) * Bump version of Guava to mitigate cve ([#2896](#2896)) ([51df8be](51df8be)) * Change numpy version on setup.py and upgrade it to resolve dependabot warning ([#2887](#2887)) ([80ea7a9](80ea7a9)) * Change the feature store plan method to public modifier ([#2904](#2904)) ([0ec7d1a](0ec7d1a)) * Deprecate 3.7 wheels and fix verification workflow ([#2934](#2934)) ([040c910](040c910)) * Do not allow same column to be reused in data sources ([#2965](#2965)) ([661c053](661c053)) * Fix build wheels workflow to install apache-arrow correctly ([#2932](#2932)) ([bdeb4ae](bdeb4ae)) * Fix file offline store logic for feature views without ttl ([#2971](#2971)) ([26f6b69](26f6b69)) * Fix grpc and update protobuf ([#2894](#2894)) ([86e9efd](86e9efd)) * Fix night ci syntax error and update readme ([#2935](#2935)) ([b917540](b917540)) * Fix nightly ci again ([#2939](#2939)) ([1603c9e](1603c9e)) * Fix the go build and use CgoArrowAllocator to prevent incorrect garbage collection ([#2919](#2919)) ([130746e](130746e)) * Fix typo in CONTRIBUTING.md ([#2955](#2955)) ([8534f69](8534f69)) * Fixing broken links to feast documentation on java readme and contribution ([#2892](#2892)) ([d044588](d044588)) * Fixing Spark min / max entity df event timestamps range return order ([#2735](#2735)) ([ac55ce2](ac55ce2)) * Move gcp back to 1.47.0 since grpcio-tools 1.48.0 got yanked from pypi ([#2990](#2990)) ([fc447eb](fc447eb)) * Refactor testing and sort out unit and integration tests ([#2975](#2975)) ([2680f7b](2680f7b)) * Remove hard-coded integration test setup for AWS & GCP ([#2970](#2970)) ([e4507ac](e4507ac)) * Resolve small typo in README file ([#2930](#2930)) ([16ae902](16ae902)) * Revert "feat: Add snowflake online store ([#2902](#2902))" ([#2909](#2909)) ([38fd001](38fd001)) * Snowflake_online_read fix ([#2988](#2988)) ([651ce34](651ce34)) * Spark source support table with pattern "db.table" ([#2606](#2606)) ([3ce5139](3ce5139)), closes [#2605](#2605) * Switch mysql log string to use regex ([#2976](#2976)) ([5edf4b0](5edf4b0)) * Update gopy to point to fork to resolve github annotation errors. ([#2940](#2940)) ([ba2dcf1](ba2dcf1)) * Version entity serialization mechanism and fix issue with int64 vals ([#2944](#2944)) ([d0d27a3](d0d27a3)) ### Features * Add an experimental lambda-based materialization engine ([#2923](#2923)) ([6f79069](6f79069)) * Add column reordering to `write_to_offline_store` ([#2876](#2876)) ([8abc2ef](8abc2ef)) * Add custom JSON table tab w/ formatting ([#2851](#2851)) ([0159f38](0159f38)) * Add CustomSourceOptions to SavedDatasetStorage ([#2958](#2958)) ([23c09c8](23c09c8)) * Add Go option to `feast serve` command ([#2966](#2966)) ([a36a695](a36a695)) * Add interfaces for batch materialization engine ([#2901](#2901)) ([38b28ca](38b28ca)) * Add pages for individual Features to the Feast UI ([#2850](#2850)) ([9b97fca](9b97fca)) * Add snowflake online store ([#2902](#2902)) ([f758f9e](f758f9e)), closes [#2903](#2903) * Add Snowflake online store (again) ([#2922](#2922)) ([2ef71fc](2ef71fc)), closes [#2903](#2903) * Add to_remote_storage method to RetrievalJob ([#2916](#2916)) ([109ee9c](109ee9c)) * Support retrieval from multiple feature views with different join keys ([#2835](#2835)) ([056cfa1](056cfa1))

feast-ci-bot added the size/L label Jun 22, 2022

yongheng force-pushed the retrieve-multiple-fvs branch 3 times, most recently from 01c5aff to f913090 Compare June 22, 2022 17:11

feast-ci-bot assigned pyalex Jun 22, 2022

feast-ci-bot assigned adchia Jun 22, 2022

feast-ci-bot requested review from adchia and pyalex June 22, 2022 19:14

feast-ci-bot added the ok-to-test label Jun 22, 2022

pyalex reviewed Jun 22, 2022

View reviewed changes

yongheng requested a review from pyalex June 23, 2022 02:47

yongheng force-pushed the retrieve-multiple-fvs branch from f913090 to 56d982c Compare June 24, 2022 22:14

yongheng added 5 commits June 24, 2022 15:14

feat: Support retrieving from multiple feature views

8faefa2

Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>

group by join keys instead of feature view

b8910ea

Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>

tolerate insufficient entities

6480640

Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>

mock registry.getEntityJoinKey

f5b12ef

Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>

add integration test

0b4fedc

Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>

yongheng force-pushed the retrieve-multiple-fvs branch from 56d982c to 0b4fedc Compare June 24, 2022 22:14

feast-ci-bot requested a review from achals June 28, 2022 04:53

achals approved these changes Jun 30, 2022

View reviewed changes

feast-ci-bot assigned achals Jun 30, 2022

feast-ci-bot added the lgtm label Jun 30, 2022

feast-ci-bot added the approved label Jun 30, 2022

feast-ci-bot merged commit 056cfa1 into feast-dev:master Jun 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support retrieval from multiple feature views with different join keys #2835

feat: Support retrieval from multiple feature views with different join keys #2835

yongheng commented Jun 22, 2022 •

edited

Loading

yongheng commented Jun 22, 2022

yongheng commented Jun 22, 2022

yongheng commented Jun 22, 2022

achals commented Jun 22, 2022

codecov-commenter commented Jun 22, 2022 •

edited

Loading

pyalex commented Jun 22, 2022

pyalex Jun 22, 2022 •

edited

Loading

yongheng Jun 23, 2022 •

edited

Loading

yongheng commented Jun 23, 2022

kevjumba commented Jun 23, 2022

achals commented Jun 23, 2022

yongheng commented Jun 24, 2022

yongheng commented Jun 28, 2022

yongheng commented Jun 28, 2022

achals left a comment

feast-ci-bot commented Jun 30, 2022

feat: Support retrieval from multiple feature views with different join keys #2835

feat: Support retrieval from multiple feature views with different join keys #2835

Conversation

yongheng commented Jun 22, 2022 • edited Loading

yongheng commented Jun 22, 2022

yongheng commented Jun 22, 2022

yongheng commented Jun 22, 2022

achals commented Jun 22, 2022

codecov-commenter commented Jun 22, 2022 • edited Loading

Codecov Report

pyalex commented Jun 22, 2022

pyalex Jun 22, 2022 • edited Loading

Choose a reason for hiding this comment

yongheng Jun 23, 2022 • edited Loading

Choose a reason for hiding this comment

yongheng commented Jun 23, 2022

kevjumba commented Jun 23, 2022

achals commented Jun 23, 2022

yongheng commented Jun 24, 2022

yongheng commented Jun 28, 2022

yongheng commented Jun 28, 2022

achals left a comment

Choose a reason for hiding this comment

feast-ci-bot commented Jun 30, 2022

yongheng commented Jun 22, 2022 •

edited

Loading

codecov-commenter commented Jun 22, 2022 •

edited

Loading

pyalex Jun 22, 2022 •

edited

Loading

yongheng Jun 23, 2022 •

edited

Loading