Skip to content

TVlaic/AVSP

Repository files navigation

AVSP

Analysis of Massive Data Sets

Project theme: Extracting Deep Features for Image Recommendation

Project is written in Python 2 using Tensorflow framework for feature extraction. Dataset we tested on is the OpenImage dataset. Due to the time it required to download the images we stopped after downloading a little bit over 900.000 images. Features were extracted using a pretrained Inception-v4 net trained on ImageNet dataset. Feature vector is created by concatenating a max pool with a kernel of image size for each layer. That means if the output of a convolution is 12x12x1024 for each image we used a max pool layer with kernels [1, 12, 12, 1] as defined in Tensorflow. The result of the max pool for the given image is a vector of size [1, 1, 1024] which was then flattened and concatenated for every convolution layer. Resulting vector has ~16000 features which is then reduced with principal component analysis (PCA) to size of just 300 elements later used for similarity comparisons. Angular and Euclidean distance was used as a metric during vector comparisons with numpy queries. Second implementation for querying was done using a library called NMSLIB which creates indexes from vectors for efficient querying. Instructions on how to setup everything can be found below.

Table of contents

Requirements
Getting the dataset
Getting Tensorflow models
Python packages
PCA model
NMSLIB installation
Example results

Requirements

Python 2

Getting the dataset

The code will work on any dataset, it expects to get a root folder which will contain only images and/or folders with images.

The dataset we used was the OpenImage dataset and we downloaded it using this paralelized downloader.

After u clone the repository and get the CSV files you execute this in your terminal:
python2 download.py [CSV_PATH] [OUTPUT_FOLDER_PATH]

Getting Tensorflow models setup

Open the terminal and change the directory to the project root folder and run this in the terminal:

git clone https://github.com/tensorflow/models/

When the repository downloads, get the required checkpoint by running:

wget -O InceptionV4.tar.gz http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz

Untar the downloaded file by executing:

tar -xvf InceptionV4.tar.gz

Python packages

If you don't have an NVIDIA GPU then change the tensorflow-gpu in requirements.txt to just tensorflow before running the installation.

Now position yourself in the project root folder.

To use the code we suggest creating a virtual environment.
First install virtual environment with:

pip install virtualenv

After that create your virtual environment:

virtualenv -p /usr/bin/python2 env

Activate the environment with:

source ./env/bin/activate

Install the prerequisite packages:

pip install -r requirements.txt

After this install either ipython2 or jupyter notebook and install the required kernel so you can access the environment You should be ready to go with just executing this line in the terminal:

python -m ipykernel install --user --name=[Name you desire]
  • Our kernel name is AVSP. If you name it differently then you will get an error saying there is no kernel named AVSP. To resolve that just choose the kernel you just created from the dropdown list and click set.

To get the progress bar to work in the notebook run this command in your terminal:

jupyter nbextension enable --py --sys-prefix widgetsnbextension

Now you're ready to run the notebook, in the terminal enter:

ipython2 notebook

PCA model

We used the first batch of 100.000 images to train our PCA model. Our model can be downloaded by executing this command in your terminal:

wget -O PCA_model_300.pkl https://www.dropbox.com/s/8rqrxyiffx2hxvs/PCA_model_300.pkl?dl=0

The program will expect the model to be in the same folder as the full_vectors.npy and full_paths.npy so after downloading it place it there.

NMSLIB installation

Clone the github repository with this command:

git clone https://github.com/searchivarius/nmslib

Install prerequisite packages:

sudo apt-get install libboost-all-dev libgsl0-dev libeigen3-dev

Change directory to similarity_search and run the following commands:

cmake .
make  

Activate your virtual environment if you haven't done so already. Change directory to python bindings and run these commands:

python setup.py build
sudo python setup.py install

Example results

Query Image: Top 12 results:
Query Image: Top 12 results:
Query Image: Top 12 results:
Query Image: Top 12 results:
Query Image: Top 12 results:
Query Image: Top 12 results:
Query Image: Top 12 results:
Query Image: Top 12 results:
Query Image: Top 12 results:
Query Image: Top 12 results:
Query Image: Top 12 results:

About

Analysis of Big Datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published