-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter include #1626
base: main
Are you sure you want to change the base?
Filter include #1626
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thorwhalen, thanks for splitting the PRs and thinking of this ergonomic aspect of the API. Even then, I'm not too sure about this change. Seemingly innocuous, it has the potential to invalidate assumptions made by downstream projects. Also, the need to add filter_include
as an extra keyword on top of the already existing include
seems counter-intuitive - it seems like saying something and then saying, "What I mean to say is..."
@HammadB, wdyt?
|
||
if filter_include: | ||
keep_keys = ['ids'] + [key for key in include if key != 'ids'] | ||
return subdict(get_results, keep_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thorwhalen, do you find this might break existing downstream projects that may depend on attribute None
checks?
Also it feels awkward that we return something else than GetResult
, in this case, a subset dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand: "(...) break existing downstream projects that may depend on attribute None checks?".
But about the GetResult
TypeDict, know (in case you didn't) that since 3.11, one can make some keys optional, using the NotRequired type.
@@ -364,6 +388,10 @@ def query( | |||
if "uris" not in include: | |||
query_results["uris"] = None | |||
|
|||
if filter_include: | |||
keep_keys = ['ids'] + [key for key in include if key != 'ids'] | |||
return subdict(query_results, keep_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as above:
- Downstream impacts
- Non-typed returns
Quite frankly, I think the solution, proposed earlier in the issue, that we only return keys that are listed in The PR I'm proposing here is just to see what a non-breaking solution would be. Still, if it were my choice, I'd go for changing the semantics of Downstream projectI'm not sure I understand the downstream concern specifically, but understand all too well the concern generally. It's always a hard choice, and the only thing we can be certain of is any change will invalidate someone's assumptions. Yet the solution of changing the semantics of Non-typed returnsCan still be typed, but using NotRequired with the Awkward
|
@thorwhalen, let me give it a thought. |
Description of changes
Resolves Feature Request: A filter_include argument for Collection.get and Collection.query #1622, which itself is related to #300.
Include a
filter_include
argument inCollection.get
andCollection.query
, which (ifTrue
) will have the effect of filtering in only those fields that were requested (plus "ids").I propose to set the default of
filter_include
to beFalse
for now, controlled via aDFLT_FILTER_INCLUDE
variable.This is so that the current behavior doesn't change, so we have minimal disruption.
Test plan
Included tests in chromadb/test/property/test_collections.py.
Documentation Changes
Documentation changes are needed to mention the additional
filter_include
argument, but will be done if and after this PR is accepted.One reason is: The
Collection.get
andCollection.query
documentation are already behind on the arguments, so will add those as well at the same time.