Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new queryByVectorId and queryByVector functions to IndexInterface #106

Merged
merged 2 commits into from
Apr 10, 2024

Conversation

austin-denoble
Copy link
Contributor

Problem

Because of the lack of optional arguments, we have a lot of overloaded functions in IndexInterface which are meant to make working with the data plane easier for users.

Currently, we're missing some variations of queryByVectorId which need to be added. We're also missing any functionality for queryByVector behavior which is a commonly used operations by users.

Solution

  • Add new function overloads to IndexInterface to support queryByVectorId minimal requests with namespace and/or includeValues and includeMetadata.
  • Duplicate our queryByVectorId functions into new queryByVector functions.

All of these functions are going through the existing validation logic, so it should be safe to add these in calling the base query function. Unfortunately, there's not any unit tests for Index or AsyncIndex and those will need to be added in a follow up.

This is also a lot of overloads, I'm sure there is a better pattern for handling this from a UX perspective.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

CI tests, or running an example app locally.

Copy link
Contributor

@rohanshah18 rohanshah18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this change, left a small comment, LGTM!

@Override
public ListenableFuture<QueryResponseWithUnsignedIndices> queryByVector(int topK, List<Float> vector,
String namespace, Struct filter) {
return query(topK, vector, null, null, null, namespace, filter, false, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying this is wrong or right, but OOC why do you allow users to have a filter when they query by vector, but not when they query by ID? I'd say they both probs would benefit from the inclusion of a filter, since users who use these types of queries (I'd imagine) are looking for similarity clusters, essentially. So getting all vectors similar to a given vector (or a given ID), filtered by xyz would be helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we support what you're talking ab out on ln 100 in this file, am I misunderstanding?

    @Override
    public ListenableFuture<QueryResponseWithUnsignedIndices> queryByVectorId(int topK,
                                                                              String id,
                                                                              String namespace,
                                                                              Struct filter) {
        return query(topK, null, null, null, id, namespace, filter, false, false);
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad! I must've just missed this while reviewing. Thanks a lot! Carry on!

Copy link
Contributor

@aulorbe aulorbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look good to me, but do we need any unit and integration tests for these? I think we do. I wouldn't merge w/o those.

@austin-denoble austin-denoble force-pushed the adenoble/add-data-plane-overloads branch from 1d4e0a5 to cbe9a7d Compare April 10, 2024 19:05
@austin-denoble
Copy link
Contributor Author

The code changes look good to me, but do we need any unit and integration tests for these? I think we do. I wouldn't merge w/o those.

We should have them, but we're a bit strapped for time at the moment. I've added Asana tickets to track adding unit tests for our Index and AsyncIndex classes along with PineconeConnection. For now, since these overloads are all calling the base function I think we should be ok.

@austin-denoble austin-denoble merged commit 5ed7205 into main Apr 10, 2024
8 checks passed
@austin-denoble austin-denoble deleted the adenoble/add-data-plane-overloads branch April 10, 2024 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants