Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find* command changes for vector search #459

Closed
maheshrajamani opened this issue Jul 10, 2023 · 4 comments · Fixed by #474
Closed

Find* command changes for vector search #459

maheshrajamani opened this issue Jul 10, 2023 · 4 comments · Fixed by #474
Assignees

Comments

@maheshrajamani
Copy link
Contributor

maheshrajamani commented Jul 10, 2023

Change sort clause to handle $vector field. This special field will accept an array of floats which will be passed to CQL “ORDER BY query_vector_value ANN OF []”. Default response document limit will be 20 (jsonapi page-size). If the limit option comes in, it will be used as a limit.
Filter clause will only support the $exists operator for the $vector field.

{
  "find" : {
    "filter" : {},
    "sort" : {"$vector" : [0.11,0.12,...]},
    "options" : {"limit" : 5}
  }
}

Translated CQL query
SELECT * FROM <namespace>.<collection> WHERE <> ORDER BY query_vector_value ANN OF [0.11,0.12,...] limit 5

@vkarpov15
Copy link
Collaborator

Re: discussion in Slack, I put together an overview of vector db APIs: Pinecone, Chroma, Qdrant, Zilliz. I'm not certain that reusing sort for vector search is the best API decision, because it seems like every other vector db API I've seen exposes a distinct function for vector search, even MongoDB's is a distinct aggregation framework stage with its own distinct API.

I'm worried that reusing sort is trying to fit a square peg in a round hole, because we need to be careful that, if you're using sort for vector search, you need to 1) specify the vector in the sort clause, 2) make sure you don't include any other fields in the sort clause, both of which are inconsistent with the normal sort API.

Furthermore, presumably JSON API would also have to disallow sorting by $vector in updateOne(), findOneAndUpdate(), etc.?

Also, a minor unreleated question: vector search supports limit, does it support skip?

@maheshrajamani
Copy link
Contributor Author

I'm not certain that reusing sort for vector search is the best API decision
We could create it as separate search clause in mangoose api and could be translated in the stargate-mangoose library.

Furthermore, presumably JSON API would also have to disallow sorting by $vector in updateOne(), findOneAndUpdate(), etc.?

In the current design yes, it can be used in any command where sort() is supported.

does it support skip?

skip is not currently designed to be supported.

@vkarpov15
Copy link
Collaborator

Sorting by $vector in updateOne(), findOneAndUpdate(), etc. could be neat. I'd like to think about that a little more.

Also, just to clarify, do we want to disallow sorting by multiple fields if sorting by $vector? Would "sort" : {"username": -1, "$vector" : [0.11,0.12,...]} be invalid?

@maheshrajamani
Copy link
Contributor Author

Sorting by $vector in updateOne(), findOneAndUpdate(), etc. could be neat. I'd like to think about that a little more.

Let me know if this is needed, I think this will be simple extension of what is created for find and findOne command. I will create a separate issue for this.

Also, just to clarify, do we want to disallow sorting by multiple fields if sorting by $vector? Would "sort" : {"username": -1, "$vector" : [0.11,0.12,...]} be invalid?

Yes multiple field sort is not supported if $vector is used in sort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants