This is a source code for the algorithm described in the paper Towards Efficient Index Construction and Approximate Nearest Neighbor Search in High-Dimensional Spaces (Accepted by PVLDB 2023). We call it as LG project.
LG project is written by C++ and can be complied by g++ in Linux and MSVC in Windows. It adopt openMP
for parallelism.
We can use Visual Studio 2019 to build the project with importing all the files in the directory ./cppCode/LSH-APG/src/
.
cd ./cppCode/LSH-APG
make
The excutable file is then in dbLSH directory, called as lgo
lgo datasetName
(the first parameter specifies the procedure be executed and change)
- datasetName : dataset name
FOR EXAMPLE, YOU CAN RUN THE FOLLOWING CODE IN COMMAND LINE AFTER BUILD ALL THE TOOLS:
cd ./cppCode/LSH-APG
./lgo audio
In our project, the format of the input file (such as audio.data_new
, which is in float
data type) is the same as that in LSHBOX. It is a binary file, which is organized as the following format:
{Bytes of the data type (int)} {The size of the vectors (int)} {The dimension of the vectors (int)} {All of the binary vector, arranged in turn (float)}
For your application, you should also transform your dataset into this binary format, then rename it as [datasetName].data_new
and put it in the directory ./dataset
.
A sample dataset audio.data_new
has been put in the directory ./dataset
.
Also, you can get it, audio.data
, from here(if so, rename it as audio.data_new
). If the link is invalid, you can also get it from data.
For the datasets we use, you can get the raw data from following links: MNIST, Deep1M, GIST, TinyImages80M, SIFT. Next, you should transform your raw dataset into the mentioned binary format, then rename it is [datasetName].data_new
and put it in the directory ./dataset
.
Please use the following bibtex to cite this work when you use LSH-APG in your paper.
@article{DBLP:journals/pvldb/ZhaoTHZZ23,
author = {Xi Zhao and
Yao Tian and
Kai Huang and
Bolong Zheng and
Xiaofang Zhou},
title = {Towards Efficient Index Construction and Approximate Nearest Neighbor
Search in High-Dimensional Spaces},
journal = {Proc. {VLDB} Endow.},
volume = {16},
number = {8},
pages = {1979--1991},
year = {2023}
}