support IVF index #276

wxyucs · 2024-12-27T08:20:02Z

IVF index is a partition type index, which consists of a set of inverted buckets. In the retrieval stage, an inverted bucket with a certain amount of data is selected, and then scanned in the bucket to obtain several candidate points that finally meet the nearest neighbors.

Compared with HNSW type algorithms, IVF often requires a certain amount of data for training bucketing. The usual bucketing method is to use K-means clustering to generate K centroids. For the bucket selection strategy during query, we support indexing these K centroids, such as graph indexing for routing.
The vectors in the bucket support multiple encoding methods. Due to the continuous arrangement of data, the access overhead is relatively low, but the total computational effort is higher than that of graph algorithms with similar configurations.

Below we will introduce the basic design framework of IVF index

Construct BucketDatacell to manage the data in the bucket (excluding the centroid)
Construct a data structure called Router to manage the centroid and the corresponding routing method. A simple implementation of Router is composed of centroids. Its classic construction method is k-means. Of course, it also supports importing from the outside. It can contain an Index entity
IVF also supports a reordering mechanism

The relevant Pull Requests are as follows:

The text was updated successfully, but these errors were encountered:

wxyucs added version/0.14 kind/feature New feature or request labels Dec 27, 2024

wxyucs assigned LHT129 Jan 13, 2025

This was referenced Jan 22, 2025

add batch scan for quantizer compute #359

Open

introduce bucket datacell parameter #362

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support IVF index #276

support IVF index #276

wxyucs commented Dec 27, 2024 •

edited by LHT129

Loading

support IVF index #276

support IVF index #276

Comments

wxyucs commented Dec 27, 2024 • edited by LHT129 Loading

wxyucs commented Dec 27, 2024 •

edited by LHT129

Loading