Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support IVF index #276

Open
wxyucs opened this issue Dec 27, 2024 · 0 comments
Open

support IVF index #276

wxyucs opened this issue Dec 27, 2024 · 0 comments
Assignees
Labels
kind/feature New feature or request version/0.14

Comments

@wxyucs
Copy link
Collaborator

wxyucs commented Dec 27, 2024

IVF index is a partition type index, which consists of a set of inverted buckets. In the retrieval stage, an inverted bucket with a certain amount of data is selected, and then scanned in the bucket to obtain several candidate points that finally meet the nearest neighbors.

Compared with HNSW type algorithms, IVF often requires a certain amount of data for training bucketing. The usual bucketing method is to use K-means clustering to generate K centroids. For the bucket selection strategy during query, we support indexing these K centroids, such as graph indexing for routing.
The vectors in the bucket support multiple encoding methods. Due to the continuous arrangement of data, the access overhead is relatively low, but the total computational effort is higher than that of graph algorithms with similar configurations.

Below we will introduce the basic design framework of IVF index

  1. Construct BucketDatacell to manage the data in the bucket (excluding the centroid)

  2. Construct a data structure called Router to manage the centroid and the corresponding routing method. A simple implementation of Router is composed of centroids. Its classic construction method is k-means. Of course, it also supports importing from the outside. It can contain an Index entity

  3. IVF also supports a reordering mechanism

The relevant Pull Requests are as follows:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request version/0.14
Projects
None yet
Development

No branches or pull requests

2 participants