Copyright (C) 2013 Sergey Demyanov
contact: sergey@demyanov.net
This library has been written as a part of my project on facial expression analysis. It contains the implementation of convolitional neural nets for Matlab, both on Matlab and C++. The C++ version works about 2 times faster. Both implementations work identically.
GENERAL INFORMATION
Convolitional neural net is a type of deep learning classification algorithms, that can learn useful features from raw data by themselves. Learning is performed by tuning its weighs. CNNs consist of several layers, that are usually convolutional and subsampling layers following each other. Convolution layer performs filtering of its input with a small matrix of weights and applies some non-linear function to the result. Subsampling layer does not contain weights and simply reduces the size of its input by averaging of max-pooling operation. The last layer is fully connected by weights with all outputs of the previous layer. The output is also modified by a non-linear function.
Learning process consists of 2 steps: forward and backward passes, that repeat for all objects in a training set. On the forward pass each layer transforms the output from the previous layer according to its function. The output of the last layer is compared with the label values and the total error is computed. On the backward pass the corresponding transformation happens with the derivatives of error with respect to outputs and weights of this layer. After the backward pass finished, the weights are changed in the direction that decreases the total error.
This process is performed for a batch of objects simultaneously, in order to decrease the sample bias. After all the object have been processed, the process might repeat for different batch splits.
DESCRIPTION
The library was written for Matlab and its functions can be called only from Matlab scripts. It operates with 2-dimensional objects, like images, that are stored as a 3-dimensional array. The last index represents the object number. The labels must be in a 2-dimensional array where the first index represents the class label (0 or 1) for each object.
The library contains 3 main functions to call:
- [weights, trainerr] = cnntrain(layers, params, train_x, train_y, funtype, weights_in(optionally)) Performs neural net training. Returns weights from all layers as a single vector.
- [pred, err] = cnntest(layers, weights, test_x, test_y, funtype) Calculates the test error. Based on cnnclassify, that returns only the predictions.
- [weights_in] = genweights(layers, funtype); Returns randomly generated weights for neural net. If you need to get repeatable results, just pass these weights to the cnntrain or cnntest.
Parameters:
layers - the structure of CNN. Sets up as cell array, with each element representing an independent layer. Layers can be one of 4 types:
- i - input layer. Must be the first and only first. Must contain the "mapsize" field, that is a vector with 2 integer values, representing the objects size.
- c - convolutional layer. Must contain the "kernelsize" field, that identifies the filter size. Must not be greater than the size of maps on the previous layer. Must also contain the "outputmaps" field, that is the number of maps for each objects on this layer. If the previous layer has "m" maps and the current one has "n" maps, the total number of filters on it is m * n. Despite that it is called convolutional, it performs filtering, that is a convolution operation with flipped dimensions.
- s - scaling layer. Reduces the map size by pooling. Must contain the "scale" field, that is also a vector with integer 2 values.
- f - fully connected layer. Must contain the "length" field that defines the number of its outputs. Must be the last one. For the last layer it must coincide with the number of classes.
All layers except "i" may contain the "function" field, that defines their action. For:
- c and f - it defines the non-linear function. It can be either "sigm" or "relu", for sigmoids and rectified linear units respectively. The default value is "sigm".
- f - it can also be "SVM", that calculates the SVM error function. See www.cs.toronto.edu/~tang/papers/dlsvm.pdf for details. Has been tested only for final layer.
- s - it defines the pooling procedure, that can be either "mean" or "max". The default value is "mean".
params - define the learning process. It is a cell with the following fields. If some of them are absent, the value by default is taken.
- alpha - defines the learning rate speed. Default is 1, for "SVM" on the last layer should be about 10 times lower.
- batchsize - defines the number of batches. Default is 50.
- numepochs - the number of repeats the training procedure with different batch splits. Default is 1.
- momentum - defines the actual direction of weight change according to the formula m * dp + (1-m) * d, where m is momentum, dp is the previous change and d is the current derivative. Default is 0.
- adjustrate - defines how much we change the learning rate for a particular weight. If the signs of previous and current updates coincide we and it to learning rate. If not, we divide the learning rate on (1 - adjustrate). Default is 0.05.
- maxcoef - defines the maximum and minimum learning rates, that are alpha * maxcoef and alpha / maxcoef respectively.
- balance - was supposed to balance errors for highly unbalanced datasets but was not fully implemented.
weights - the weights vector obtained from genweights or cnntrain, that is used for weights initialization. Can be used for testing, repeating the results or continuing the training procedure.
funtype - defines the actual function that is used. Can be either "mexfun" or "matlab". "Mexfun" is faster, but in "matlab" it is easier to do some debugging and see the intermediate results.
TECHNICAL DETAILS
-
For compilation the C++ version you need the have the Boost library. Just modify the paths in compile.m and run it. I tried it only in Windows, but it should work in Linux as well. You can also download the binaries from my website.
-
The "cnnexamples.m" file requires "mnist_uint8.mat" file to be performed. You can download it from Matlab Central File Exchange, just google it and save in ./data folder. You also need to create ./workspaces folder to save your weights.
-
Uncertainty comes not only from weights but also from batch shuffling. Therefore, when weights are passed to the cnntrain function, the batches are created in a natural order: first "batchsize" objects become the first batch and so on.
SOME COMMENTS
-
The library was developed for Matlab, but probably works in Octave as well. In case the matlab "imdilate" function does not work, you can use the mex-function "maxscale" instead. Just uncomment it in the corresponding block and compile by 'mex maxscale' if necessary.
-
In order to achieve compatibility with mex, there are some unnecessary transpose operations in the matlab code. If you do need it, you can remove them.
ACKNOLEDGEMENTS
-
The original Matlab code and the "mnist_uint8.mat" workspace was created by Rasmus Berg Palm and can be found in his DeepLearnToolbox(https://github.com/rasmusbergpalm/DeepLearnToolbox). The Matlab version basically remained the same structure as there.
-
The C++ version was inspired by Yichuan Tang(http://www.cs.toronto.edu/~tang) and his solution(http://code.google.com/p/deep-learning-faces/) for the Kaggle Facial Expression Recognition Challenge. The structure of C++ code was originated from there.