Skip to content

lbrejon/Compute-similarity-between-images-using-CNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Compute-similarity-between-images-using-CNN

Comparison of cosine similarity performances between VGG16 and ResNet50

Table of contents 📝

Estimated reading time : ⏱️ 5min

My goals 🎯

  • Learn how to extract extract feature vector
  • Compute similarity between images
  • Make data augmentation to increase dataset

Technologies 🖥️

Programming languages:

- Python (framework TensorFlow)

Project composition 📂

.
├── README.md
│
├── data
│   ├── flowerpot.jpg
│   ├── vase.jpg
│   └── vase2.jpg.csv
│
├── notebooks
│   └── extract_features.ipynb
│
└── report
    ├── augmented_img
    │    ├── vaseAI0.jpg
    │    ├── vaseAI1.jpg
    │    └──  ..
    │
    └── cos_sim
         ├── resnet50
         │    ├── vase_flowerpot.jpg
         │    ├── vase_vase.jpg
         │    └── vase_vase2.jpg
         │
         └── vgg16
              ├── vase_flowerpot.jpg
              ├── vase_vase.jpg
              ├── vase_vase2.jpg
              ├── vase_vaseAI0.jpg
              ├── vase_vaseAI1.jpg
              └── ..

Description 📋

This project aims to deepen knowledges in CNNs, especially in features extraction and images similarity computation. I decided to work with 2 pre-trained CNN (on ImageNet): the VGG16 and the ResNet50 and to compare their cosine similarity performances. You can choose to load models:
- to make predictions ( include_top = True: the model will be composed of all layers: 'feature learning block' + 'classification block')
- to extract features (include_top = False: the classification block is omitted)



[Figure 1]: Architecture of the VGG16 (left) and ResNet50 (right)

In a first time, I wondered which model could predict an image whith the most accuracy. Here I chose to compare their performances for a vase image: the ResNet50 was the best with 99.89% accuracy against 95.06% for the VGG16. The idea in this part was to manipulate and to understand how prediction works.



[Figure 2]: Comparison of predictions (VGG16/ResNet18)

Then I decided to visualize features maps from main blocks in the VGG16. These feature maps output from each block are collected in a single pass to create an image. There are 5 main blocks in the image (e.g. block1, block2, etc.) that end in a pooling layer for the VGG16. You can choose blocks to visualize by the layers index: idx = [2, 5, 9, 13, 17] # [block1, block2, block3, block4, block5]. Figure 3 highlights that quality-level features extraction is proportional with the network depth



[Figure 3]: Visualization of the 5 main blocks from the VGG16

Now let's focus on features vector extraction. Removing the last layer of the model enables to extract a feature vector as explained previously. Then, the input images is preprocessed (reshaping, RGB->BGR conversion, zero-centering with dataset). The global process on the Figure 4 depicts how to compute similarity between two images. Images were stored on AWS S3 and I used an notebook instance in AWS SageMaker. A features vector was extracted for each image, then the latter compared with cosine similarity. It computes the cosine of the angle between both features vectors with the compute_similarity_img() function.



[Figure 4]: Computation similarity process

Here are the obtained results for cosine similarity with the VGG16



[Figure 5]: Cosine similarity using VGG16

I decided to increase the dataset and to compare results with data augmentation as shown in Figure 6. For the data augmentation, I used a ImageDataGenerator object to set up data augmentation parameters. It generated batches of tensor image data with real-time data augmentation:

gen = ImageDataGenerator(
    rotation_range=30, # Int: degree range for random rotations
    width_shift_range=0.1, # Float: fraction of total width, if < 1, or pixels if >= 1
    height_shift_range=0.1, # Float: fraction of total height, if < 1, or pixels if >= 1
    shear_range=0.15, # Float: shear Intensity (shear angle in counter-clockwise direction in degrees)
    zoom_range=0.1, # Float: range for random zoom
    channel_shift_range=10., # Float: range for random channel shifts
    horizontal_flip=True # Boolean: randomly flip inputs horizontally
)



[Figure 6]: Cosine similarity with augmented images using VGG16

Then I compared cosine similarity performances between both models:



[Figure 7]: Comparison of cosine similarity between VGG16 and ResNet50

Sources ⚙️

  • Help for image classification here
  • Help for data augmentation here