[Feature Request]: Secondary index #360

yingfeng · 2023-12-25T03:16:53Z

Secondary index is used for numeric filtering. It is composed of two parts:

The data of each numeric column is stored in an inverted sorted form, with a compressed format.
Another in-memory part of index data which is based on pgm, it could provide very fast approximate range query with bounded error, which has already been added into the repository.

The mechanism of range filtering of secondary index is as follows:

Query the pgm index to get the bounded range.
Scan the raw index data according to bounded range to get the RowIDs of the query filter.

yangzq50 · 2024-01-18T03:41:11Z

My proposal for the implementation:

When building secondary index for a numeric column in the table, we can build one secondary index structure for each segment of the table.
Every secondary index structure should contain 3 parts:
2.1. pgm: a data structure which provides approximate range
2.2. vector_value: sorted array of all the values of the chosen column in the segment (necessary for getting exact range)
2.3. vector_offset: array of segment offsets for each value in vector_value (necessary for getting filtering results)
vector_value and vector_offset can be split into parts with max size DEFAULT_BLOCK_CAPACITY
Steps for range search of one column with secondary index:
4.1. Get an approximate range from pgm
4.2. Get the exact range by binary search in vector_value
4.3. Generate a bitmask (when the size of the exact range is too big) or a vector of chosen offsets

yingfeng · 2024-01-18T03:50:29Z

4.2 is not necessary because pgm has already told the approximate range, as a result 4.2 is acturally a linear scan

4. Steps for range search of one column with secondary index:
   4.1. Get an approximate range from pgm
   4.2. Get the exact range by binary search in vector_value
   4.3. Generate a bitmask (when the size of the exact range is too big) or a vector of chosen offsets

yangzq50 · 2024-02-22T05:36:19Z

subtask #476 and subtask #477 are deferred to #637

yingfeng mentioned this issue Dec 25, 2023

ROADMAP 2024 #338

Open

78 tasks

yingfeng assigned KKould and unassigned KKould Jan 2, 2024

JinHai-CN assigned yangzq50 Jan 18, 2024

JinHai-CN added the feature request New feature or request label Jan 18, 2024

yangzq50 mentioned this issue Jan 24, 2024

[Subtask]: Support CREATE, DROP and DESCRIBE statement for secondary index #472

Closed

This was referenced Jan 24, 2024

[Subtask]: Provide compression for secondary index #476

Open

[Subtask]: Implement real time and near real time secondary index building #477

Closed

yangzq50 mentioned this issue Feb 5, 2024

[Subtask]: Push filter down to IndexScan when secondary index is available #542

Closed

yangzq50 closed this as completed Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Secondary index #360

[Feature Request]: Secondary index #360

yingfeng commented Dec 25, 2023

yangzq50 commented Jan 18, 2024 •

edited

Loading

yingfeng commented Jan 18, 2024

yangzq50 commented Feb 22, 2024

[Feature Request]: Secondary index #360

[Feature Request]: Secondary index #360

Comments

yingfeng commented Dec 25, 2023

yangzq50 commented Jan 18, 2024 • edited Loading

yingfeng commented Jan 18, 2024

yangzq50 commented Feb 22, 2024

yangzq50 commented Jan 18, 2024 •

edited

Loading