Skip to content

Saliency maps

Misa Ogura edited this page Sep 9, 2019 · 1 revision

Saliency maps

Saliency in human visual perception is a subjective quality that makes certain things within the field of view stand out from the rest and grabs our attention.

Saliency maps in computer vision provide indications of the most salient regions within images. By creating a saliency map for neural networks, we can gain some intuition on "where the network is paying the most attention to" in an imput image.

Example notebooks

AlexNet visualizaion

Using flashtorch.saliency module, let's visualize image-specific class saliency maps of AlexNet pre-trained on ImageNet classification tasks.

Great gray owl (class index 24): The network is focusing on the sunken eyes and the round head for this owl.

Saliency map of great grey owl in Alexnet

Peacock (class index 84): But it doesn't always focus on the eyes and head of an animal. In its world's view, what makes peacock a peacock is the eyespots on its tail!

Saliency map of peacock in Alexnet

Toucan (class index 96): And in case of a toucan, the network is paying an intense attention on its beak.

Saliency map of toucan in Alexnet

Do you agree?

Insights on transfer learning

In the example above, we've visualized saliency maps for a network that has been trained on ImageNet and used images of objects which it already knows.

We can take a step further and investigate how the network's perception changes before and after the training, when presented by a new object.

This time, I'm going to use DenseNet, which is again pre-trained on ImageNet (1000 classes), and train it into a flower classifier to recognize 102 species of flowers (dataset).

With no additional training, and just by swapping out the last fully-connected layer, the model performs very poorly (0.1% test accuracy). By plotting the gradients, we can see that the network is mainly focusing on the shape of the flower.

Foxgloves as an example:

Transfer learning pre

With training, the model now achieves 98.7% test accuracy. But why? What is it that it's seeing now, that it wasn't before?

The network has learnt to shift its focus on the mottled patten within flower cups! In its world's view, that is the most distinguishing things about this object, which I think closely align with what we deem the most unique trait of this flower.

Transfer learning post

Clone this wiki locally