Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve typing of Dataset.search, matching definition #6816

Closed

Conversation

Dref360
Copy link
Contributor

@Dref360 Dref360 commented Apr 16, 2024

Previously, the output of score, indices = Dataset.search(...) would be numpy arrays.

The definition in SearchResult is a List[int] so this PR now matched the expected type.

The previous behavior is a bit annoying as Dataset.__getitem__ doesn't support numpy.int64 which forced me to convert indices to int eg:

score, indices = ds.search(...)
item = ds[int(indices[0])]

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@mariosasko
Copy link
Collaborator

Hi! This is a breaking change. A better solution is to check for "indexable" types in __getitem__ to support keys such as np.int64:

import operator

def _query_table_with_indices_mapping(...):  # or _query_table
    ...
    try:
        operator.index(key)
    except TypeError:
        pass
 
    _raise_bad_key_type(key)

@Dref360
Copy link
Contributor Author

Dref360 commented Apr 16, 2024

Sounds good! We should still update type annotations for SearchResult in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants