Skip to content

Images Classification Recipe

CENL-AI-WG edited this page Mar 25, 2021 · 73 revisions

Images Classification Statut

Convolutional Neural Networks (CNN) and Transfer Learning for Heritage Images Classification

Keywords: image classification

Approaches: convolutional neural networks, transfert learning

Tools: IBM Watson Studio, Google Cloud AutoML, Google Colab, Tensorflow, Python, IIIF


This recipe exposes an image classification scenario, aiming to deduce the technique or function of heritage images (picture, drawing, map...) using a Convolutional Neural Networks pretrained model (i.e. a supervised approach with transfer learning). We leverage the classification ability of these models, trained to detect objects, to use them on our documents classification scenario.

image classification principle

This recipe may have various library use cases, particularly for cataloguing and information retrieval systems.

This recipe is based on BnF materials

Goals

This recipes includes a basic introduction to neural networks and deep learning (formal neuron model, neural networks, convolutional neural networks, transfer learning) and a hands-on session:

  1. Creation of an images dataset for training
  2. Training of a classification model with commercial AI platforms or open source IA frameworks
  3. Application of the model to the heritage images to be classified

The IIIF standard API is used to extract images from digital repositories, but raw files may also be processed.

Educational resources

Introduction to the core technology used

For the theory, see this github for a 45 mn introduction course (FR and EN versions, direct link).

Implementation notes

Plan

A. Images classification using AI SaaS platforms: IBM Watson, Google Cloud AutoML

B. Images classification using AI frameworks: TensorFlow


A. Images classification using AI SaaS platforms

Prerequisites: IBM Watson Studio account or Google Cloud AutoML account (see the setup documents, FR and EN versions)

Note: IBM Watson Visual Recognition is discontinued. Existing instances are supported until 1 December 2021.

1. Use case definition: choice of the source images and the model classes

A four classes scenario dataset (picture/drawing/map/noise) can be downloaded here, but it's up to you to build your own use case.

The dataset illustrates this scenario:

  • filtering of "noisy" illustrations (blank pages, text pages)
  • illustrations classification in 3 categories (picture, drawing, map)

noise picture drawing map

2. Choice of the SaaS platform

IBM Watson Studio and Google Cloud AutoML have been tested and this howto documents the setup of a new user account and the creation of a visual recognition project for both platforms.

The following steps suppose you are using Watson Studio, but the Google AutoML case is very similar. This howto is also documented in a presentation (FR and EN versions).

3. Creation of a Watson Studio Visual Recognition project

Once Watson Studio web app is launched, choose the "Classify Images" custom model to create your new classification project, as described in the howto.

Classify images model

4. Downloading the training dataset

Now, you can download your images dataset, each class being a .zip archive.

image classification principle

Classes can be renamed, their content may be updated.

5. Training of the model

When all the classes are ingested, the training process can start ("Train Model" button).

Training process

5. Test of the classification model

On the Watson platform

Local images can be droped on the test page to launch an inference and test the performance model. Watson studio outputs the confidence scores for all the model's classes.

Testing the model

At this point, the model could be deployed using SDKs or APIs. The next section demonstrates the API case.

Outside the platform, using API and code

Before implementing the model in your code, you need to obtain two pieces of information: the Watson API key and the model ID.

  1. Watson API key

This information is available in your resources list, under the Services category.

Access to the API key

After choosing the right service, the API key can then be copied/pasted in your code or downloaded.

  1. Model ID

The IBM Watson model ID can be found on your project page, under the Assets tab.

Access to the model ID

You can now use the Watson REST APIs or the corresponding SDKs to develop applications that interact with the service.

  • curl commands

These two basic curl command lines show a way to interact with the API in a very simple way. Open a Terminal window and type this command, taking care to replace the fields your_api_key and your_model_ID with the values you just got.

To classify an image through its URL:

> curl -u "apikey:your_api_key" "https://gateway.watsonplatform.net/visual-recognition/api/v3/classify?url=your_URL&version=2018-03-19& classifier_ids=your_model_ID"

Access to the model ID

This example displays the classification result of a Gallica IIIF image:

  • infered class = "photo"
  • confidence score = 0,843

To classify a local image:

> curl -X POST -u "apikey:your_api_key" –F "images_file=@file_path" -F "classifier_ids=your_model_ID""https://gateway.watsonplatform.net/visual-recognition/api/v3/classify?version=2018-03-19"
  • Python script

The very same curl commands may be integrated in an application script similar to this one, which extract some document metadata thanks to the digital library APIs (Gallica), then extracts the images from the DLs repositories (Gallica and the Welcome Collection) using the IIIF Image protocol, and finally calls a Watson model to classify the illustrations.

The script makes a heavy use of a Python wrapper for the Gallica APIs, which make easier to integrate the Gallica APIs in your code.

Mind to insert your Watson API key and Model ID!

A final step consists in the performances evaluation of the model, using sklearn and matplotlib. Confusion matrix

This associated Jupyter notebook demonstrates the whole process and allows you to play with the code and to call your personal Watson models on your images. (To learn how to launch a Jupyter notebook, look at this documentation.)

The notebook can be displayed with nbviewer. It can also be run in your browser with Binder, without any installation burden: launch the notebook in Binder

After a few minutes, Binder is displayed in your browser. You must see this page:

Binder is launched

Click on the "binder" folder and then on the classify-img-with-iiif-and-watson.ipynb notebook. The notebook opens in a new browser window and you may want to run it, step by step (Shift-Return keys):

The notebook is running


B. Images classification using AI frameworks

Prerequisites: basic scripting and command line skills (Python scripts are used)

Using command lines

An AI framework must be used: TensorFlow (Google), PyTorch (Facebook), CNTK (Microsoft), Caffe2, Keras, etc.

This implementation leverages the Inception-v3 model and applies a transfert learning method: the Inception-v3 model last layer is retrained on the images ground truth dataset.

First, TensorFlow must be installed (if you don't want to install TensorFlow, go to the next section).

Three Python scripts (within the Tensorflow framework) are used to train (and evaluate) a local model:

  • split.py: the GT dataset is splitted in a training set (e.g. 2/3) and an evaluation set (1/3). The GT local dataset directory and the training/evaluation ratio must be defined in the script.
  • retrain.py: the training set is used to train the last layer of the Inception-v3 model. The training dataset path and the generated model path must be defined. The Inception model is downloaded from the retrain.py script.
  • label_image.py: the evaluation set is labeled by the model. The model path and the input images path must be defined.

To classify a set of images and output the results in a CSV file:

>python3 label_image.py > out.csv

Running the script outputs a line per classified image:

bd carte dessin filtrecouv filtretxt gravure photo foundClass realClass success imgTest

0.01 0.00 0.96 0.00 0.00 0.03 0.00 drawing OUT_img 0 btv1b10100491m-1-1.jpg

0.09 0.10 0.34 0.03 0.01 0.40 0.03 engraving OUT_img 0 btv1b10100495d-1-1.jpg ...

Each line describes the best classified class (according to its probability) and also the probability for all the other classes.

Other resources

Implementations