You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today I see 2 ways to provide the distance calculations when using the HNSW vectors in Lucene:
The existing VectorSimilarityFunction, which is encoded into the segment file itself.
Via a customer scorer through a custom KnnVectorsFormat.
IMO this is not a great experience because in order to provide my own scorer I need to implement at least 2 new classes but for the most part the code in those classes would be boilerplate/duplicated code. In fact really the only novel code there would be in RandomVectorScorer#score. I do see that we're a little bit stuck with this because the existing VectorSimilarityFunction class is implemented as an enum so we can't extend it (or really make any changes to it).
I see that adding bit/binary vector support (#13505) is also currently blocked on resolving this, so I wanted to ask:
Broadly speaking what is the vision here for allowing users to customize the distance calculations? For example does the current approach with implementing a custom format/scorer look like the longer term strategy or instead the long term plan look something like replacing VectorSimilarityFunction with an extensible interface instead?
The text was updated successfully, but these errors were encountered:
Description
Today I see 2 ways to provide the distance calculations when using the HNSW vectors in Lucene:
VectorSimilarityFunction
, which is encoded into the segment file itself.KnnVectorsFormat
.IMO this is not a great experience because in order to provide my own scorer I need to implement at least 2 new classes but for the most part the code in those classes would be boilerplate/duplicated code. In fact really the only novel code there would be in
RandomVectorScorer#score
. I do see that we're a little bit stuck with this because the existingVectorSimilarityFunction
class is implemented as an enum so we can't extend it (or really make any changes to it).I see that adding bit/binary vector support (#13505) is also currently blocked on resolving this, so I wanted to ask:
HnswBitVectorsFormat
class introduced in Add BitVectors format and make flat vectors format easier to extend #13288 into thelucene101
package.VectorSimilarityFunction
with an extensible interface instead?The text was updated successfully, but these errors were encountered: