Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with feature views being detected as duplicated when imported #1905

Merged

Conversation

achals
Copy link
Member

@achals achals commented Sep 23, 2021

Signed-off-by: Achal Shah achals@gmail.com

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #1895

Does this PR introduce a user-facing change?:

none

@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2021

Codecov Report

Merging #1905 (c25ac21) into master (75f67ec) will increase coverage by 20.69%.
The diff coverage is 64.44%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #1905       +/-   ##
===========================================
+ Coverage   62.30%   82.99%   +20.69%     
===========================================
  Files          96       96               
  Lines        7353     7357        +4     
===========================================
+ Hits         4581     6106     +1525     
+ Misses       2772     1251     -1521     
Flag Coverage Δ
integrationtests 74.78% <62.22%> (?)
unittests 60.66% <62.22%> (-1.64%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/repo_operations.py 42.46% <0.00%> (-1.76%) ⬇️
sdk/python/feast/feature_table.py 45.27% <80.00%> (+1.21%) ⬆️
sdk/python/feast/entity.py 74.80% <100.00%> (+0.81%) ⬆️
sdk/python/feast/feature_service.py 76.62% <100.00%> (+3.38%) ⬆️
sdk/python/feast/feature_store.py 93.80% <100.00%> (+16.00%) ⬆️
sdk/python/feast/feature_view.py 86.92% <100.00%> (+1.69%) ⬆️
sdk/python/feast/importer.py 73.68% <100.00%> (+9.39%) ⬆️
sdk/python/feast/on_demand_feature_view.py 89.65% <100.00%> (+56.61%) ⬆️
sdk/python/tests/unit/test_entity.py 100.00% <100.00%> (ø)
sdk/python/feast/online_response.py 86.79% <0.00%> (-5.71%) ⬇️
... and 59 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75f67ec...c25ac21. Read the comment docs.

Copy link
Collaborator

@MattDelac MattDelac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice !
I tried couple of the inspect functions like getsourcefile() in the other PR but could not make it work.
It makes sense to track where the objects are created 👍

@@ -107,15 +107,20 @@ def parse_repo(repo_root: Path) -> ParsedRepo:
for attr_name in dir(module):
obj = getattr(module, attr_name)
if isinstance(obj, FeatureTable):
res.feature_tables.append(obj)
if obj.defined_in is not None and obj.defined_in == module.__file__:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can module.file be None ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should maybe also raise exceptions in case defined_in is None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. And no, module.file cannot be None

stack = inspect.stack()
# Get two levels up from current, to ignore usage.py
previous_stack_frame = stack[2]
self.defined_in = previous_stack_frame.filename
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we abstract that in an helper function and also test it in a test file ?
Don't think we need to test each class but at least the helper function

What do you think ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A helper class probably makes sense since there's some state tracking being done. wdyt @achals ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A helper class

yeah even better 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@woop
Copy link
Member

woop commented Sep 26, 2021

Just to play devil's advocate here. Tracking the file names does add complexity to our objects. Is there any reason we cant import all the objects and use built in functionality __hash__() and __eq__ to ensure that they are the same if their names are the same, then we can dedupe/remove the additional ones?

@achals
Copy link
Member Author

achals commented Sep 27, 2021

Just to play devil's advocate here. Tracking the file names does add complexity to our objects. Is there any reason we cant import all the objects and use built in functionality __hash__() and __eq__ to ensure that they are the same if their names are the same, then we can dedupe/remove the additional ones?

IMO updating __hash__ and __eq__ is more fragile since we may add/remove fields without updating those methods, and end up in problematic situations. I think the source file is a more robust solution and less likely to break with changes.

Signed-off-by: Achal Shah <achals@gmail.com>
stack = inspect.stack()
# Get two levels up from current, to ignore usage.py
previous_stack_frame = stack[2]
self.defined_in = previous_stack_frame.filename
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like missing one get_calling_file_name() here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah missed this

@@ -188,6 +188,7 @@ def online_read(

for entity_key in entity_keys:
redis_key_bin = _redis_key(project, entity_key)
print(redis_key_bin)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be removed ?

Signed-off-by: Achal Shah <achals@gmail.com>
Signed-off-by: Achal Shah <achals@gmail.com>
Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achals, adchia

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Achal Shah <achals@gmail.com>
@feast-ci-bot
Copy link
Collaborator

New changes are detected. LGTM label has been removed.

@feast-ci-bot feast-ci-bot removed the lgtm label Sep 28, 2021
@achals achals added the lgtm label Sep 28, 2021
@feast-ci-bot feast-ci-bot merged commit 1aedb8a into feast-dev:master Sep 28, 2021
@MattDelac
Copy link
Collaborator

MattDelac commented Sep 29, 2021

👋
I just tried it on our repo and this got shipped with a bug. The problem is how we save the information defined_in.

Look at how my feast apply behaves
image

module.__file__ is correct but obj.defined_in is not.
The entity "customer" is, indeed, defined in pano/entities/customer.py

@woop
Copy link
Member

woop commented Sep 29, 2021

Whether or not there is a bug here, I am not 100% convinced by the approach taken here. There are many reasons to support object equality, deduplication just being one of them. Reaching into the filesystem seems like it will be a lot more prone to errors since its OS/env dependent.

@achals
Copy link
Member Author

achals commented Sep 29, 2021

Whether or not there is a bug here, I am not 100% convinced by the approach taken here. There are many reasons to support object equality, deduplication just being one of them. Reaching into the filesystem seems like it will be a lot more prone to errors since its OS/env dependent.

I'm coming around to this view point after this bug as well. It's pretty easy to get wrong. I'm going to undo this change and use equality based deduping.

achals added a commit to achals/feast that referenced this pull request Sep 29, 2021
Signed-off-by: Achal Shah <achals@gmail.com>
feast-ci-bot pushed a commit that referenced this pull request Sep 30, 2021
* Use set when parsing repos to prevent duplicates

Signed-off-by: Achal Shah <achals@gmail.com>

* Undo #1905 changes

Signed-off-by: Achal Shah <achals@gmail.com>

* use id() in addition to name to differentiate different objects with the same name

Signed-off-by: Achal Shah <achals@gmail.com>

* add a test

Signed-off-by: Achal Shah <achals@gmail.com>

* add one more test

Signed-off-by: Achal Shah <achals@gmail.com>

* remove debugging

Signed-off-by: Achal Shah <achals@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FeatureView is considered "duplicated" during feast apply if its imported in another file
6 participants