-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making vector similarity functions pluggable #12219
Comments
What about doing this through vector formats: vector formats could take a similarity parameter, which would win over the one configured on the field type when set. The benefit is that it builds on the fact that vector formats are already pluggable, which makes things simpler in my view. The downside is that you can't plug in similarity functions independently from vectors formats. This makes me wonder more generally if the similarity should have been an implementation detail of vector formats, like maxConn and beamWidth. Having it on the field type is a bit more user friendly if we think it's important for users to be able to choose between cosine, dot-product and euclidean, but a downside of this choice it that it requires any legal KNN vectors format to support all 3 similarity functions. |
@msokolov what do you think about this? |
It makes sense to me. I think we got where we are because initially all these things were field-level and then some of them got migrated to the format. Now we're in a middle place straddling two different APIs. Adrien's suggestion to override the field-level setting seems a bit odd from a user perspective though -- I guess a per-field vector format can choose to ignore the similarity function, but then it feels like we ought to avoid setting it in the first place. Maybe we could create a FieldType similarity function "DEFAULT" that is whatever default is provided by the format? |
OK, I think this design would look like this:
|
Sorry, I'm -1 to this. This is going with the approach that only those with the resources of amazon or elastic can have performance search. IMO it is OUR JOB AS A LIBRARY to implement these functions in a performant way, for everyone, not just those with the resources of big tech to plug in some custom shit because openjdk is a shitshow. taking a stand. I propose an alternative approach here: #12302 |
@rmuir I was not suggesting it as a way to only get performance for "some big company". I just thought using an incubating API was out of the question in Lucene (you have implied as much in other vector API discussions) and I was hoping to find a way forward while we were stalled. You have mentioned before that it must at least be "preview". I am very happy to see you initiating the work for Vector API support in Lucene. I think it will make so many things faster! I will be glad to help where I can in using the Vector API in Lucene. I will happily pause any further work here for the time being :) |
We can close this, we added panama vector API to Lucene directly, that was my main concern with this issue. |
Description
There are two major reasons for adding a custom vector similarity function:
I am not 100% sure Lucene itself should go through the work of consistently adding new similarity functions.
We should make these pluggable in such a way that developers using Lucene can provide specialized distance functions.
I think the main issue is that the Vector Similarity function is tied to the FieldType and currently that is not pluggable via any external configuration.
There are two ways I can think of for doing this:
Opening this issue for discussion.
The text was updated successfully, but these errors were encountered: