feat(query): inverted index use empty position data when query not contain phrase terms #15362
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
In the files of the inverted index, the position file records the position of each term in the original text, which is used to judge whether the terms are adjacent to each other when searching for phrases. Position files are usually large in size, which leads to a long time of reading the data of the inverted index and affects the query speed. For example, the size of each file of a 55M inverted index data is as follows:
We can see that the positions file takes up 68% of the total size of all the files, which is the main reason for the slow speed of reading the index data. Since the positions file is only used when querying for phrases, it is not used for other queries. We can choose not to read the positions file when the querying don't contain phrase terms, and use an empty positions file instead, which can greatly speed up the query. After testing, we found that the query time was reduced by about 50%.
for example
part of #14825
Tests
Type of change
This change is