Skip to content
/ BvS Public

An Image classifier to identify whether the given image is Batman or Superman using a CNN with high accuracy. (From getting images from google to saving our trained model for reuse.)

License

Notifications You must be signed in to change notification settings

perseus784/BvS

Repository files navigation

BvS

Dawn of AI
An Image classifier to identify whether the given image is Batman or Superman using a CNN with high accuracy. (Without using Dogs Vs Cats, From getting images from google to saving our trained model for reuse.)

What are we gonna do:

  • We will build a 3 layered Community Standard CNN Image classifier to classify whether the given image is a image of Batman or Superman.
  • Learn how to build a model from scratch in Tensoflow which is accurate.
  • How to train and test it.
  • How to save and use it further.

Setup:

  • Python 3.5
  • Tensorflow 1.5.0
  • CUDA 9.0
  • CUDANN 7.0.5

Indepth explanation of each section:
Medium post with detailed step by step explanation for deeper understanding of CNNs and architecture of the network.

Data:

Collect data:

  • Google Images Downloader.It's fast, easy, simple and efficient.
  • I've collected 300 images each for Supes and Batsy respectively, But more data is highly preferable. Try to collect as much clean data as possible.

Augmentation:

  • 300 is not a number at all in Deep learning. So, we must Augment the images to get more images from whatever we collected.
  • You can use the following to do it easily, Augmentor.
  • This is the code I've used for augmenting my images.
  • Same image, augmented using various transformations. I had 3500 images each after augmentation for each class.
    Careful: While Augmenting, be careful about what kind of transformation you use. You can mirror flip a Bat Logo but cannot make it upside down.

Standardize:

  • After Augmentation, Make a folder named rawdata in the current working directory.
  • Create folders with their respective class names and put all the images in their respective folders.
  • Run this file in the same directory as rawdata.
  • This will resize all the images to a standard resolution and same format and put it in a new folder named data.
    Note: As I embedded it in trainer.py, it is unnecessary to run it explicitly.
    Update :You can get the data folder itself from here(50mb). Just download and extract!.

Architecture:

A Simple Architecture:

For detailed explanation of Architecture and CNNs please read the medium post.
I've explained CNNs in depth over there, I highly recommend reading it.

In code:

      #level 1 convolution
      network=model.conv_layer(images_ph,5,3,16,1)
      network=model.pooling_layer(network,5,2)
      network=model.activation_layer(network)

      #level 2 convolution
      network=model.conv_layer(network,4,16,32,1)
      network=model.pooling_layer(network,4,2)
      network=model.activation_layer(network)

      #level 3 convolution
      network=model.conv_layer(network,3,32,64,1)
      network=model.pooling_layer(network,3,2)
      network=model.activation_layer(network)

      #flattening layer
      network,features=model.flattening_layer(network)

      #fully connected layer
      network=model.fully_connected_layer(network,features,1024)
      network=model.activation_layer(network)
      
      #output layer      
      network=model.fully_connected_layer(network,1024,no_of_classes)

A Brief Architecture:

With dimentional informations:

Training:

  • Clone this repo.

  • Do the Augmentation.

  • Put the images in thier respective folders in rawdata.

     rawdata/batman: 3810 images
     rawdata/superman: 3810 images
    

Update :You can get the data folder itself from here(50mb). Just download and extract!.

Our file structure should look like this,

data folder will be generated automatically by trainer.py from raw_data if data folder does not exist.

  • Configuration: If you want to edit something, you can do it using this file.

        raw_data='rawdata'
        data_path='data'
        height=100
        width=100
        all_classes = os.listdir(data_path)
        number_of_classes = len(all_classes)
        color_channels=3
        epochs=300
        batch_size=10
        model_save_name='checkpoints\\'
    
  • Run trainer.py.

  • Wait for few hours.

  • For me it took 8 hrs for 300 epochs. I did it in my laptop which has i5 processors, 8 Gigabytes of RAM, Nvidia geforce 930M 2GB setup. You can end the process anytime if saturated, as the model will be saved frequently.

Saving our model:

Once training is over, we can see a folder named checkpoints is created which contains our model for which we trained. These two simple lines does that for us in tensorflow:

      saver = tf.train.Saver(max_to_keep=4)
      saver.save(session, model_save_name)  

You can get my pretrained model here.

We have three files in our checkpoints folder,

  • .meta file - it has your graph structure saved.
  • .index - it identifies the respective checkpoint file.
  • .data - it stores the values of all the variables.

How to use it? Tensorflow is so well built that, it does all the heavy lifting for us. We just have to write four simple lines to load and infer our model.

      #Create a saver object to load the model
      saver = tf.train.import_meta_graph
                                      (os.path.join(model_folder,'.meta'))
      #restore the model from our checkpoints folder
      saver.restore(session,os.path.join('checkpoints','.\\'))
      #Create graph object for getting the same network architecture
      graph = tf.get_default_graph()
      #Get the last layer of the network by it's name which includes all the previous layers too
      network = graph.get_tensor_by_name("add_4:0")

Yeah, simple. Now that we got our network as well as the tuned values, we have to pass an image to it using the same placeholders(Image, labels).

im_ph= graph.get_tensor_by_name("