Add a field type for high-dimensional bit vectors. #48322
Labels
>enhancement
:Search Relevance/Vectors
Vector search
Team:Search Relevance
Meta label for the Search Relevance team in Elasticsearch
The
dense_vector
type helps users work with vector 'embeddings' of unstructured data like text and images. This issue proposes to add a new 'bit vector' type and 'hamming distance' script function as part of supporting this use case.Dense vector fields allow for storing float vectors. For images, it also seems common to use bit vectors:
There has also been recent work on converting traditional text embeddings to bit vectors, for example Learning Compressed Sentence Representations for On-Device Text Processing.
Compared to using a
dense_vector
to represent the binary vectors, a dedicated 'bit vector' type would require less space and could support faster distance computations. Looking forward, it may also be possible to support retrieval based on bit vector distance through a specialized strategy (distinct from what we've considered for float vectors in #42326).The text was updated successfully, but these errors were encountered: