Project is written in Python 2 using Tensorflow framework for feature extraction. Dataset we tested on is the OpenImage dataset. Due to the time it required to download the images we stopped after downloading a little bit over 900.000 images. Features were extracted using a pretrained Inception-v4 net trained on ImageNet dataset. Feature vector is created by concatenating a max pool with a kernel of image size for each layer. That means if the output of a convolution is 12x12x1024 for each image we used a max pool layer with kernels [1, 12, 12, 1] as defined in Tensorflow. The result of the max pool for the given image is a vector of size [1, 1, 1024] which was then flattened and concatenated for every convolution layer. Resulting vector has ~16000 features which is then reduced with principal component analysis (PCA) to size of just 300 elements later used for similarity comparisons. Angular and Euclidean distance was used as a metric during vector comparisons with numpy queries. Second implementation for querying was done using a library called NMSLIB which creates indexes from vectors for efficient querying. Instructions on how to setup everything can be found below.
Requirements
Getting the dataset
Getting Tensorflow models
Python packages
PCA model
NMSLIB installation
Example results
Python 2
The code will work on any dataset, it expects to get a root folder which will contain only images and/or folders with images.
The dataset we used was the OpenImage dataset and we downloaded it using this paralelized downloader.
After u clone the repository and get the CSV files you execute this in your terminal:
python2 download.py [CSV_PATH] [OUTPUT_FOLDER_PATH]
Open the terminal and change the directory to the project root folder and run this in the terminal:
git clone https://github.com/tensorflow/models/
When the repository downloads, get the required checkpoint by running:
wget -O InceptionV4.tar.gz http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz
Untar the downloaded file by executing:
tar -xvf InceptionV4.tar.gz
If you don't have an NVIDIA GPU then change the tensorflow-gpu in requirements.txt to just tensorflow before running the installation.
Now position yourself in the project root folder.
To use the code we suggest creating a virtual environment.
First install virtual environment with:
pip install virtualenv
After that create your virtual environment:
virtualenv -p /usr/bin/python2 env
Activate the environment with:
source ./env/bin/activate
Install the prerequisite packages:
pip install -r requirements.txt
After this install either ipython2 or jupyter notebook and install the required kernel so you can access the environment You should be ready to go with just executing this line in the terminal:
python -m ipykernel install --user --name=[Name you desire]
- Our kernel name is AVSP. If you name it differently then you will get an error saying there is no kernel named AVSP. To resolve that just choose the kernel you just created from the dropdown list and click set.
To get the progress bar to work in the notebook run this command in your terminal:
jupyter nbextension enable --py --sys-prefix widgetsnbextension
Now you're ready to run the notebook, in the terminal enter:
ipython2 notebook
We used the first batch of 100.000 images to train our PCA model. Our model can be downloaded by executing this command in your terminal:
wget -O PCA_model_300.pkl https://www.dropbox.com/s/8rqrxyiffx2hxvs/PCA_model_300.pkl?dl=0
The program will expect the model to be in the same folder as the full_vectors.npy and full_paths.npy so after downloading it place it there.
Clone the github repository with this command:
git clone https://github.com/searchivarius/nmslib
Install prerequisite packages:
sudo apt-get install libboost-all-dev libgsl0-dev libeigen3-dev
Change directory to similarity_search and run the following commands:
cmake .
make
Activate your virtual environment if you haven't done so already. Change directory to python bindings and run these commands:
python setup.py build
sudo python setup.py install