-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds new
bit
element_type for dense_vectors (#110059)
This commit adds `bit` vector support by adding `element_type: bit` for vectors. This new element type works for indexed and non-indexed vectors. Additionally, it works with `hnsw` and `flat` index types. No quantization based codec works with this element type, this is consistent with `byte` vectors. `bit` vectors accept up to `32768` dimensions in size and expect vectors that are being indexed to be encoded either as a hexidecimal string or a `byte[]` array where each element of the `byte` array represents `8` bits of the vector. `bit` vectors support script usage and regular query usage. When indexed, all comparisons done are `xor` and `popcount` summations (aka, hamming distance), and the scores are transformed and normalized given the vector dimensions. Note, indexed bit vectors require `l2_norm` to be the similarity. For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is `sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported. Note, the dimensions expected by this element_type are always to be divisible by `8`, and the `byte[]` vectors provided for index must be have size `dim/8` size, where each byte element represents `8` bits of the vectors. closes: #48322
- Loading branch information
Showing
38 changed files
with
2,711 additions
and
185 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
pr: 110059 | ||
summary: Adds new `bit` `element_type` for `dense_vectors` | ||
area: Vector Search | ||
type: feature | ||
issues: [] | ||
highlight: | ||
title: Adds new `bit` `element_type` for `dense_vectors` | ||
body: |- | ||
This adds `bit` vector support by adding `element_type: bit` for | ||
vectors. This new element type works for indexed and non-indexed | ||
vectors. Additionally, it works with `hnsw` and `flat` index types. No | ||
quantization based codec works with this element type, this is | ||
consistent with `byte` vectors. | ||
`bit` vectors accept up to `32768` dimensions in size and expect vectors | ||
that are being indexed to be encoded either as a hexidecimal string or a | ||
`byte[]` array where each element of the `byte` array represents `8` | ||
bits of the vector. | ||
`bit` vectors support script usage and regular query usage. When | ||
indexed, all comparisons done are `xor` and `popcount` summations (aka, | ||
hamming distance), and the scores are transformed and normalized given | ||
the vector dimensions. | ||
For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is | ||
`sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported. | ||
Note, the dimensions expected by this element_type are always to be | ||
divisible by `8`, and the `byte[]` vectors provided for index must be | ||
have size `dim/8` size, where each byte element represents `8` bits of | ||
the vectors. | ||
notable: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.