This repository holds the code which was developed for the ICFHR2016 Competition on the Classification of Medieval Handwritings in Latin Script, of which the results are described in the following paper:
Florence Cloppet, Véronique Eglin, Van Cuong Kieu, Dominique Stutzmann, Nicole Vincent. 'ICFHR2016 Competition on the Classification of Medieval Handwritings in Latin Script'. In: Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (2016), 590-595. DOI 10.1109/ICFHR.2016.106.
The main task for this competition was to correctly predict the script type of samples of medieval handwriting, as represented by single-page, grescale photographic reproductions of codices. A random selection of examples goes below:
The 'DeepScript' approach described here scored best in the second task for this competition ("fuzzy classification"). This system uses a ‘vanilla’ neural network model, i.e. a single-color channel variation of the popular VGG-architecture. The model takes the form of a stack of convolutional layers, each with a 3x3 perceptive field an increasingly large number of filters at each block of layers (2 x 64 > 3 x 128 > 3 x 256). This convolutional stack feeds into two fully-connected dense layers with a dimensionality of 1048, before feeding into the final softmax layer where the normalized scores for each class label get predicted. Because of the small size of the datasets, DeepScript borrowed the augmentation from the work by Sander Dieleman and colleagues which is described in this awesome blog post. The original code is available from this repository. We would like to thank Sander for his informal help and advice on this project. Below goes an example of the augmented patches on which the model was trained (see example_crops.py
):
The following top-level scripts included in the repository are useful at a higher level:
prepare_data.py
: prepare and preprocess train/dev/test sets of the original imagestrain.py
: train a new modeltest.py
: test/apply a previously trained modelfilter_viz.py
: visualize what filters were learned during trainingexample_crops.py
: generate examples of the sort of augmented crops used in trainingcrop_activations.py
: find out which patches from a test set maximally activate a particular class
By default, new models are stored in a directory under the models
directory in the repository. A pretrained model can be downloaded as a ZIP archive from Google Drive: unzip it and place it under a models
directory in the top-level directory of the repository. The original data can be obtained via registering on the competition's website.
Of special interest was the ability to visualize the knowledge inferred by a trained neural network (i.e. the question: what does the network 'see' after training? Which sort of features has it become sensitive to?). For this visualization, we heavily drew from the excellent set of example scripts offered in the keras library. Below, we show examples of randomly initialized images which were ajusted via the principle of gradient ascent to maximally activate single neurons on the final convolutional layer (see filter_viz.py
). The generated images have been annotated with a couple of interesting paleographic features that seem to emerge:
Additionally, it is possible to select the patches from the unseen test images which maximally activated the response of a certain class out the output layer. Examples of top-activating patches (without augmentation) are given below.
The confusion matrix obtained for the development data shows that the model generally makes solid predictions, and mostly makes understandable errors (e.g. the confusion between different types of textualis variants):
Major dependencies for this code include:
These packages can be easily installed via pip. I recommend Continuum's Anaconda environment with ships with most of these dependencies. The code has been tested on Mac OX S and Linux Ubuntu under Python 2.7. Note that this version number might affect your ability to load some of the pretrained model's components as pickled objects.
The preliminary results of our specific approach have been presented at the 2016 ESTS conference in Antwerp. A dedicated publication describing our approach is on the way. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X used for this research.