-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support retrieval from multiple feature views with different join keys #2835
feat: Support retrieval from multiple feature views with different join keys #2835
Conversation
01c5aff
to
f913090
Compare
/assign pyalex |
/assign adchia |
/ok-to-test |
Codecov Report
@@ Coverage Diff @@
## master #2835 +/- ##
==========================================
- Coverage 59.63% 59.62% -0.01%
==========================================
Files 174 174
Lines 15493 15493
==========================================
- Hits 9239 9238 -1
- Misses 6254 6255 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Hi @yongheng , can you please add integration test for this flow? |
|
||
// Group feature references by join keys. | ||
Map<String, List<FeatureReferenceV2>> groupNameToFeatureReferencesMap = | ||
featureReferences.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To speed up this part we might want to extract distinct feature views from all feature references. And then group feature views instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC grouping by join keys results in the same or less groups (therefore same or more efficient) than grouping by feature view. The is because different feature views can have the same join keys. In L286, this.registryRepository.getEntitiesList(featureReference)
internally gets feature view spec first, then gets entity names of the feature view spec, then we find join keys for the entity names.
Actually, I grouped by feature view at the beginning. Then I switched to grouping by join keys in the second commit of this PR, as an optimization.
Hey @yongheng the integration tests should be any tests that have a tag for @pytest.mark.integration. For this particular test, just take a look at test_feature_views.py |
@yongheng You can find the java integration tests here: https://github.com/feast-dev/feast/tree/master/java/serving/src/test/java/feast/serving/it |
f913090
to
56d982c
Compare
Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>
Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>
Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>
Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>
Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>
56d982c
to
0b4fedc
Compare
/cc achals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: achals, yongheng The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…in keys (feast-dev#2835) * feat: Support retrieving from multiple feature views Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com> * group by join keys instead of feature view Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com> * tolerate insufficient entities Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com> * mock registry.getEntityJoinKey Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com> * add integration test Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>
# [0.23.0](v0.22.0...v0.23.0) (2022-08-02) ### Bug Fixes * Add dummy alias to pull_all_from_table_or_query ([#2956](#2956)) ([5e45228](5e45228)) * Bump version of Guava to mitigate cve ([#2896](#2896)) ([51df8be](51df8be)) * Change numpy version on setup.py and upgrade it to resolve dependabot warning ([#2887](#2887)) ([80ea7a9](80ea7a9)) * Change the feature store plan method to public modifier ([#2904](#2904)) ([0ec7d1a](0ec7d1a)) * Deprecate 3.7 wheels and fix verification workflow ([#2934](#2934)) ([040c910](040c910)) * Do not allow same column to be reused in data sources ([#2965](#2965)) ([661c053](661c053)) * Fix build wheels workflow to install apache-arrow correctly ([#2932](#2932)) ([bdeb4ae](bdeb4ae)) * Fix file offline store logic for feature views without ttl ([#2971](#2971)) ([26f6b69](26f6b69)) * Fix grpc and update protobuf ([#2894](#2894)) ([86e9efd](86e9efd)) * Fix night ci syntax error and update readme ([#2935](#2935)) ([b917540](b917540)) * Fix nightly ci again ([#2939](#2939)) ([1603c9e](1603c9e)) * Fix the go build and use CgoArrowAllocator to prevent incorrect garbage collection ([#2919](#2919)) ([130746e](130746e)) * Fix typo in CONTRIBUTING.md ([#2955](#2955)) ([8534f69](8534f69)) * Fixing broken links to feast documentation on java readme and contribution ([#2892](#2892)) ([d044588](d044588)) * Fixing Spark min / max entity df event timestamps range return order ([#2735](#2735)) ([ac55ce2](ac55ce2)) * Move gcp back to 1.47.0 since grpcio-tools 1.48.0 got yanked from pypi ([#2990](#2990)) ([fc447eb](fc447eb)) * Refactor testing and sort out unit and integration tests ([#2975](#2975)) ([2680f7b](2680f7b)) * Remove hard-coded integration test setup for AWS & GCP ([#2970](#2970)) ([e4507ac](e4507ac)) * Resolve small typo in README file ([#2930](#2930)) ([16ae902](16ae902)) * Revert "feat: Add snowflake online store ([#2902](#2902))" ([#2909](#2909)) ([38fd001](38fd001)) * Snowflake_online_read fix ([#2988](#2988)) ([651ce34](651ce34)) * Spark source support table with pattern "db.table" ([#2606](#2606)) ([3ce5139](3ce5139)), closes [#2605](#2605) * Switch mysql log string to use regex ([#2976](#2976)) ([5edf4b0](5edf4b0)) * Update gopy to point to fork to resolve github annotation errors. ([#2940](#2940)) ([ba2dcf1](ba2dcf1)) * Version entity serialization mechanism and fix issue with int64 vals ([#2944](#2944)) ([d0d27a3](d0d27a3)) ### Features * Add an experimental lambda-based materialization engine ([#2923](#2923)) ([6f79069](6f79069)) * Add column reordering to `write_to_offline_store` ([#2876](#2876)) ([8abc2ef](8abc2ef)) * Add custom JSON table tab w/ formatting ([#2851](#2851)) ([0159f38](0159f38)) * Add CustomSourceOptions to SavedDatasetStorage ([#2958](#2958)) ([23c09c8](23c09c8)) * Add Go option to `feast serve` command ([#2966](#2966)) ([a36a695](a36a695)) * Add interfaces for batch materialization engine ([#2901](#2901)) ([38b28ca](38b28ca)) * Add pages for individual Features to the Feast UI ([#2850](#2850)) ([9b97fca](9b97fca)) * Add snowflake online store ([#2902](#2902)) ([f758f9e](f758f9e)), closes [#2903](#2903) * Add Snowflake online store (again) ([#2922](#2922)) ([2ef71fc](2ef71fc)), closes [#2903](#2903) * Add to_remote_storage method to RetrievalJob ([#2916](#2916)) ([109ee9c](109ee9c)) * Support retrieval from multiple feature views with different join keys ([#2835](#2835)) ([056cfa1](056cfa1))
What this PR does / why we need it:
Currently Java Feature Server doesn't support retrieval from multiple feature views with different join keys. For each gPRC request,
OnlineServingServiceV2
callsOnlineRetriever
once and only once. In this call the former sends all join keys in the original request to the latter, and the latter simply sorts and concatenates all join keys to make a Redis key.This PR supports retrieval from multiple feature views with different join keys. For each gPRC request, it groups feature references by join keys and for each group it makes a call to
OnlineRetriever
.Which issue(s) this PR fixes:
Fixes #